Jump directly to main navigation Jump directly to content Jump to sub navigation

Dataflow & -embargoes

From x509 certificates to Posix users

X509 certificates are currently (Nov. 21) the most common method to authenticate against Rucio. However, certificates are only used for authentication, not authorization, and cannot be grouped (as needed to authorize file access to comply with a possible data embargo). In addition, it must be ensured that access permissions to a file are considered regardless of the access method (via Rucio, native methods, NFS, etc.). 

At the lowest level, Rucio accesses files via dCache. Therefore, to ensure protocol independence, it is easiest to include access restrictions at this level, as all higher-level protocols (including Rucio) will be subject to them. 

For this purpose, all dCache nodes map each certificate to exactly one Posix UID, which can be combined into groups. Access rights to a file are thus ultimately reduced to ordinary Posix file access rights. Open, freely accessible files have read access for everyone, while files that are still subject to an embargo, for example, can only be read by a particular group. 

User registration and assignment to campaigns

The image above shows the entire process schematically. Users register on a web page after presenting their certificate. A User-ID is created, mapped to the certificated, and both stored in an LDAP directory. Users of the KIS and regular users at the OT receive an account in the SDC by default. Nevertheless, they still have to obtain a certificate (LINK) and assign it to their account via the website. 

On another web page, users might register for observation campaigns. Ultimately, each campaign corresponds to a dedicated POSIX group that contains all campaign members. Files under embargo are owned by the campaign's PI and belong to the associated Posix group. Both have read permissions. 

Each dCache node maintains a local file with the mapping between user-IDs and the respective certificates (multimap). A cronjob updates this file periodically and asynchronously on each node. While not really elegant, this is a robust and straightforward way to ensure the mapping across all nodes. One might either want to modify an existing dCache plugin or write a new one in the future.

 

Data Ingress

Ingesting data by an instrument is a two-step process (see blue arrows in the graphic above). Data is put via NFS to the location expected by Rucio. Owner and groups are set to the PI and group of the campaign to which the data belongs. Global read permissions outside of that group are removed. The file is registered in Rucio, which makes the corresponding metadata accessible and the file searchable. 

The files contain the campaign affiliation in their metadata header., which is used to set the ownership and permissions as outlined when replicating any data to other nodes. This applies not only to raw data but also to data products derived from it. However, it excludes quick looks and other non-scientifically usable data products, as well as metadata. The latter allows files to be searchable (even during the embargo) and quick looks to be viewed.

Downloading Data

A certificate is needed to download non-free data (red and purple paths in the image above). This certificate is passed to dCache and mapped to a user-ID. The user must be authorized to download the file; if this is not the case, the user gets an error message. 

The download via the website is done with a standard certificate mapped to a generic user that requires global read rights for everyone to download.