When files are closed after a write operation in the beamline file system the fast copy process will immediately copy them over to the core file system. This creates unnecesary copy actions if files are opened/written/closed in many small steps instead of one open/close per file. The fast copy process will recognize this (if the file changes while he reads it) and will delay the processing of files with the same extension for future files to avoid this situation. In order not to let this happen and keep the effort needed to copy the files to a minimum with the smallest possible latency please follow the following rules:
- If possible write files with only one open/close pair.
- If not possible to write a file with only one open/close write to a temporary file first and then rename that to the final file name, the copy process will copy the file also when it is renamed. In order to not copy such temporary files files whose name ends in .tmp are ignored by the fast copy process.
- In C(++) use
mkstemps
()
with a template like "myfileXXXXXX.tmp
" to create the file, notmkstemp()
, because that creates files with a suffix that cannot be automatically recognized (there are some file types that have a 6 character suffix ....). The downside of using mkstemps instead of mkstemp is that mkstemps is a gnu extension and not pure posix. In python create the temporary file with the tmpfile library (https://docs.python.org/3/library/tempfile.html or https://docs.python.org/2/library/tempfile.html) and use the suffix parameter with the value ".tmp" (note the leading dot)
- In shell scripts use mktemp
myfileXXXXXX.tmp
- If your file is created by a library that does many open/close calls like e.g. libtiff try to give it a temporary file name and rename after. The safest way to do that is probably to create an empty temporary file like described above and let the library overwrite that, and finally do the rename.
- If you are unsure what a library does: use strace to check which system calls are really issued.
- In C(++) use
If you periodically re-create files that change only occasionally it is advisable to write the new version to a temporary file, compare to the old one and rename to the old name if different. That has the advantage that no unnecessary copy is triggered and also that any application looking at the files never sees the files in an intermediate unfinished state, it either sees the old version or the new version because the rename is guaranteed to be atomic (see man 2 rename). This technique can also prevent uneccessary actions in build systems, e.g. make. To do so you could use (in python)
- Avoid bad patterns in file names. While it is technically possible to use any UTF-8 encoded string (without the '/' character) of up to 255 bytes in length as a file name, some names should be avoided. These include (examples of all of them can be found in our data, so this is not a theoretical idea but based on real-world findings) :
- Names with control characters in them, like 'ub%01.mat' (where the %01 is the URL-Encoded rendering of the SOH ASCII charecter), this also includes file names with newlines or carriage return in them.
- Names with spaces at the beginning or end. While spaces in filenames are not nice anyway, at the begining or end of a name they are plain evil.
- Names with any of the more fancy UTF items like Non-Characters, Items from the private use pages, direction marks or invisible spaces.
- Names that are too long, best keep the names shorter than 200 bytes. 'dummy_00011___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________m02.nxs' is not a good name!
- Avoid colons : in file names, theymay case problems with some tools because they are sometimes used to separate node names from file names, e.g. when using scp...
- Create files with reasonable sizes. See Files on the storage infrastructure.
Do NOT use the following pattern:
bad creation patternopen(tempname, O_CREAT,O_WRITE) write(data) close() link(tempname, finalname) unlink(tempname)
That will create a file that is not copied from beamline to core fs. Unfortunately sftp-servers seem to follow this pattern, so avoid using sftp to write files into the beamline fs, but only if a windows client is used.
Following these hints for other files, e.g. when doing analysis on the core filesystem is also appreciated and may be rewarded with better performance and less problems.
It turned out that files written to the beamline filesystem over NFS and bigger than 1MiB produce more than one IN_CLOSE_WRITE event on the NFS server, if the mount is using NFS version 3 even one such event per MiB of file size. Until this broken behaviour is fixed he best remedy is to create the file with a name ending on .tmp (seee above) and rename when finished.