• Further discussions on automated tools for runtime accounting.


Notes by: Nicholas John Walker (written post-meeting from memory)


  • Focus of the meeting was on definition of machine states and how they could / should be used for downtime accounting
  • Nick presented some thoughts in XFEL as a state machine.pdf mind map.
  • Discussions followed on how best to define states, and how the control system could use them in an automated fashion.
  • Much of the meeting focused on the underlying philosophy (policy) of how to do the accounting.

Detailed discussions

  1. There are three aspects to downtime accounting:
    1. Calculation of actual downtime, i.e. when is the accelerator considered down (fault state) and when is it running (operational state).
    2. What operation mode (state) is the accelerator in during the period of accounting (e.g. Set-up & Tuning, X-Ray Delivery)
    3. The root-cause category for the downtime, important for statistical analysis
  2. Primary focus was on (a) and (b). How (a) is determined could depend on (b).
  3. Recovery time is sometimes difficult to determine. For X-Ray delivery, "threshold" approach to SASE pulse energy history appears to work.
    1. Importance of correct categorisation of "recovery time" from a hard trip (i.e. hardware recovery, excessive tuning time etc.) was stressed.
  4. Difficulties in addressing 'tuning time' were discussed. When exactly do we formally switch back and forth from "Tuning" to "Delivery" state? Formally based on agreed delivery parameters (Friday meetings) but these can be modified during the user runs on request, sometimes on an hourly basis (x3 SASE experiments). Keeping track of this for accounting purposes is problematic.
  5. Current approach is manual change by operations. However this is not robust (ill-defined?). Operators occasionally forget to reset back to "Delivery" after a period of tuning. Operators should review and manually adjust times at end of their shift but this often doesn't happen.
    1. Elena's approach using SASE energy signal could be used to determine directly from the measured KPI when the machine is in tuning state. This will likely not capture all scenarios and needs some thought.
  6. Further exacerbated by experiments requiring tuning "beyond" the previously specified KPI levels. Two aspects arose during this discussion:
    1. For formal accounting, the accelerator should be considered "Operational" when the (conservative?) previously agreed KPI thresholds are exceeded (irrespective of further requests from experiments).
    2. However, it is important to keep track of 'higher-performance' tuning time, as this will provide feedback to future planning.
    3. Allotted (scheduled) tuning time needs to be accounted for and categorised, to justify the time scheduled. Especially important when/if users start to request more hours per year.
  7. To further automate accounting, operations states and required KPI "set points" need to be in the control system. 
    1. Winni suggested that for KPI, only wavelength should be a 'free parameter'. Other SASE-related KPIs would then be set to "guaranteed" performance levels based on previous experience. These configurations should be published so that users understand what's on offer.
    2. Tuning above these published thresholds would still happen but be accounted for in a different way (see point 6).
    3. How this should/could be implemented was not discussed (waiting for clarification of requirements).
  8. As a complementary approach to actually using measured SASE KPI for the accounting, the control system should know what "readiness state" the machine is in. This requires a systematic (and high level) check of all major sub-systems (eg. Vacuum OK, Cryo OK, Magnets OK, RF OK and Injector OK). If all identified states are green then the machined should be able to deliver beam. Note that this does not include performance tuning, which must rely on measured SASE KPIs.
    1. Noted that perhaps the only state actually missing is  LINAC RF OK. The rest should be already available.
  9. Fault categorisation:
    1. For highest-level reporting, fault categories should map cleanly to the XFEL Operations Package structure.
    2. Subcategories  for more details analysis (drill down) should also be considered (likely to also map to OP structure).
    3. Again, tuning and other operator-related time needs special categorisation.
  10. Implementation
    1. Most of the discussions remain at the conceptual level, so not yet quite at the stage to discuss concrete implementation proposals until some specific requirements exist (i.e. concrete ideas)
    2. However, concepts need to be considered upfront for real implementation
      1. Concept of "required" and "actual" (sollwert, istwert) is important also in defining states. For example the "scheduled machine state" (sollwert) and the current machine state (istwert) basically need to be compared. Therefore both these properties should be known to the control system (or tool that will sit on top of the control system). Similarly for SASE KPI thresholds (specified and derived --see 7a).
      2. It should be straightforward to implement Elena's "threshold-based" algorithms in the control system, providing a real-time status flag of the actual machine state.
    3. Probably cannot automate every state change, and some operator intervention/response/correction will likely be necessary. This will require written procedures which mandate these actives. How this could be enforced remains to be seen.
    4. Possible use of PETRA-III tool  was discussed. May work but XFEL is much more complex. Defined (automated) machine states in the control system are a prerequisite to using this tool.
  11. Winni's Excel sheet for long-term planning appears to be as good as any approach right now. There seems to be no obvious advantage to moving to a different (possible purpose-built) tool. This will likely get reviewed in the future as things develop.
  12. It was agreed to meet every two weeks to keep the momentum going.

Action items