This page summarizes some data privacy issues relevant to users of DESY scientific computing resources.
It is most notably of interest for users of the following infrastructures:
- BIRD and NAF
- Maxwell HPC cluster
- DESY Cloud Computing infrastructure
- BeeGFS, ASAP3, CFEL&XFEL-GPFS and DUST filesystem
- NFS storage (NetApp and others) and AFS mounts
- dCache storage system
In general, we (DESY IT) process personal data in order to provide high performance, stable, reliable and secure services. We use personal data to improve the services, wrt performance, stability, security, reliability and occupancy. We also use personal data to perform future planning of resource increases and technology decisions. We use personal data to provide high level reports to funding agencies. In general, we do not distribute any personal data outside of DESY. We might distribute aggregated data outside of DESY, but we make sure that these data cannot be tracked down to an individual user.
The basis of our data processing is usually three-fold, not all might apply to every user:
- Employment with DESY or associated partners
- DESY conditions of use, and/or conditions of use of affiliated partners (Grid, EGI)
- Considerations: Legitimate interests (GDPR article 6 (1)f )
In addition to normal system logs, some additional data is collected and processed.
- Login data to interactive nodes
- ssh and other logs store entries about failed and successful login attempts, with account and timestamp: Normal DESY log file retention policy applies (usually 10 days)
- Access to btmp and utmp has been restricted to privileged users only, whereas for this data, system default retention policy applies
- Submission to batch systems
- detailed logs are produced on different subsystems (e.g. scheduler and WN): As jobs can be as long as ~14 days, and a retention policy of logs of 10 days is not sufficient, a retention policy of one month applies.
- Accounting logs (summarizing data about every single job) are currently not being deleted. The data is needed to produce accounting reports going back over five years in the past. In addition, the deletion is technically challenging, if not impossible for some batch system tools. DESY IT will never publish personalized data out of accounting logs, always aggregated such that users are not identifiable.
- Some data is provided to EGI accounting for high level (non-user traceable) information about past usage and future utilization.
- Compute Cloud login and usage data:
- Login data from EGI check-in (for Cloud users): Some data is provided to EGI accounting for high level (non-user traceable) information about past usage and future utilization.
- OpenID connect user information is stored. Since virtual machines can run for an extended period of time, these data are currently not deleted automatically.
- VM metadata information are recored (start and end time, project, user, used resources incl. IP adresses). Since virtual machines can run for an extended period of time, these data are currently not deleted automatically.
- Logs of BeeGFS, ASAP3, CFEL&XFEL-GPFS and DUST, as well as the dCache storage system
- Server processes collect health information that might also contains personal data in log files: Normal DESY log file retention policy applies (usually 10 days)
- Meta-Data of information stored in BeeGFS, ASAP3, CFEL&XFEL-GPFS and DUST, as well as the dCache storage system
- Metadata (such as username, group, timestamps, ACLs, filenames) are stored in the filesystem. These information by design are needed by all file services, and cannot be deleted or anonymized.
- It falls under the responsibility of the user to manage ACLs such that he/she controls the access of other non-privileged users to his/her data and metadata.
- DESY IT creates accounting and billing reports out of this metadata. These reports include, but are not limited to, information on space usage by user and group, age of file distribution by user and group, number of files by user and group.
Please be reminded, that for the content of batch jobs, the content of Cloud virtual machines, as well as the content and management of data by users, the usual DESY computing and data privacy rules apply.