Beamline filesystem pass-through mode means that files in the beamline filesystem are truncated to zero length some time after they are copied to the core file system.

This is done in order to save valuable space in the beamline file system.

The truncated files are marked with several extended attributes:

  • user.truncated with the time of the trunc operation in seconds (with fraction) since the epoch (1.1.1970)
  • user.origsize with the size of the file in bytes before the truncation
  • user.origctime with the  metadata change time of the file before the trunation in seconds with fraction since the epoch

In the raw folder of an affected beam time a file named raw/__TruncatedSomeDataFiles.log will appear which lists all files that were truncated.

Immutability:

Also the truncated files will be immutable to prevent accidental appends and other ill-advised changes to them. Rationale: This is needed to prevent cases where one tries to append to a file after truncation, which would produce a file with obviously wrong content: Only the appended part would be in, the old contend would be missing and the file could not be syynced to the core, even worse would be a change at some arbitrary place in the file, not just an append. Such data corruption might go unnoticed until to late, so it's better to have the attempt to change a file fail immediately. Also removing of the truncated files is bad, because if the files are generated by a script that generates new file names based on directory content this would lead to uninteded overwriting of data in the core fs.


Tools used:

The truncation is done using the truncate_file tool from the ewmscp suite.

The truncation is done in hourly cron jobs which consider only beam times that consume more than 10TiB in the beamline filesystem.

File choice:

Files to be truncated are chosen according to the following rules (policy rule, executing a script):

  • file was last accessed > 6 hours ago
  • applies to raw  folder only
  • file was suscessfully copied and is unchanged, i.e. has the same mtime and size as were stored by the copy process in the user.mtime and user .file_size extended attributes.
  • file has blocks allocated on the file system, i.e. small files (less than 3kiB), which are stored inside the inode (a gpfs specialty) are never considered.
  • the name ends with one of the following extensions (case insensitive), which were chosen by looking a list of files that use more than 1 TiB in the core fs for one kind of extension. The reason to limit the truncation to these file types is to avoid to touch log files and similar stuff that gets updated regularly but seldom.
    • .h5 .gz .nx
    • .tif .cbf .nxs .raw .xtc .img .bin .hdf .edf .rbf .ref .wfm .lst .mat .png .hkl .dat
    • .mccd .tiff .hdf5 .proj

Note A.R., tbd.:
* 1 TiB? or 1 MiB? at least tif can not become larger than ca. 8GB in tiff spec 6.   Comment by JH: The 1 TiB is the sum of all files with the same extension.
*  shall we also add .bz2, .tbz2, .zip, .7z being popular archive files (at least online scan software can produce zip containers for MCA data)
* any kind of video format files shown up in core / to check  (avi, mp4, mjpeg)
* comment on large files given the sparse file with ginix group / m. osterhoff - are they excluded from passthrough to prevent issues with dCache?
* shall we then recommend to put very specific files like mask or background image files into e.g. shared to prevent removal if not frequently accessed?




Comments:

Juergen , Andre Regarding the pass through, we typically save masks and other "beamtime static" data in shared, as intend by shared. The access frequency might be >6h but might be needed for the measurements. Could there be a rule for these directories that if the total amount of data of for "shared" directory is not above a certain threshold that it is excluded from the pass through?

Posted by garrej at 12. Apr. 2021 17:52

shared and scratch_bl are excluded from the passthrough mode:

RULE 'ExcludeScratch'
  LIST 'BeamtimeTruncator'
  EXCLUDE 
    WHERE
      PATH_NAME LIKE '/beamline/%/%/scratch_bl/%' OR
      PATH_NAME LIKE '/beamline/%/%/shared/%'



Posted by hannappj at 12. Apr. 2021 17:57

cool thanks

Posted by garrej at 12. Apr. 2021 18:02