This troubleshooting guide aims to provide basic REFM-OPT recovery instructions.
Getting an Overview
Open the REFM-OPT overview panel (MainTaskbar → Diagnostics → Laser-Based Synchronization → REFM-OPT Overview)
Observe the panel for a few seconds to check whether any station is frozen. Check for any red status flags. Unfold the plot on the bottom. Check for jumps of the phase shifters.
In case of problems, please inform firstname.lastname@example.org and print information to the elogbook.
If a station is offline in the overview panel:
- FRED operation). Don't switch anything off unless you absolutely have to. Check the FRED (FRED.xml, XFEL.RF/LLRF.REFMOPT/* or FRED_overview.xml, linked on the LbSyncMain panel). The first seven voltages are available and must be switched on. The screenshot on the right shows how the panel should look like. You can also check the FRED status manually (see
Make sure the REFM-OPT server is running. REFM-OPT servers are installed on the LLRF crate with the same name (e.g. xfelcpulla2m).
- If the FRED is OK but the REFM-OPT server does not start or does not update try to perform a TMCB and server recovery.
- If you experienced only a server glitch you can try feedback recovery next, however if you suspect a power glitch of any kind, if voltages in the FRED where switched off or if the TMCB has been switched off, skip the feedback recovery and directly perform a manual recovery.
Start troubleshooting the topmost (upstream) broken station first. Then continue with the next lower (downstream) station.
- Open the REFM-OPT panel. You can press expand to get a more detailed view.
- Check the RF output power, it should be around 30 dBm (+/- 1dB). If it's not, continue with manual recovery.
- Check if the RF phase shifter jumped. If it did, continue with manual recovery.
- Otherwise try feedback recovery.
Always remember: The protection mode is supposed to disable the phase feedback and freeze the REFM-OPT phase shifter in the last known working point. This state is safe for XFEL operation. However the phase feedback is disabled in this state and should be restored as soon as possible.
Check the overview sub-panel of a REFM-OPT:
- If RF output power is OK and the phase shifter did not jump you should work your way through the protection panel and see which protection modules are triggered. Protection flags are not automatically reset. Check the preview flags to see which errors are still present:
- optical power / fiberlink status: The optical reference is disturbed. LbSync expert recovery only. No feedback operation possible.
- MZM temperature: try TEC recovery, then continue here
- phase error / phase shifter rate of change / bias voltage / REFMPS phase can be recovered by the automatic recovery routine
- Try the automatic recovery: operation → protection recovery → start recovery
- If the automatic recovery fails continue with manual recovery.
The protection recovery will reset some protection modules. The modules are re-enabled at the end of the routine. The maximum allowed phase jump during recovery is defined in operation → protection recovery → acceptable phase deviation. If the routine fails, the phase shifter is restored to its old value.
This routine provides steps to restore basic operation without active feedback. It should be used when no or insufficient RF output power is provided, when a phase jump has occurred or when feedback recovery failed. The procedure is meant for general disaster recovery.
First recover RF output power. Daisy chained REFM-OPTs can be cross-checked with the next station in line (a block diagram is on the REFM-OPT overview panel). Check the RF input power on these stations. Be aware that there might be additional RF components like REFMs between two REFM-OPTs.
- Check the RF output power, it should be around 30 dBm (+/- 1dB). If not:
- Check if RF is switched on (first RF switch on the overview panel, controlled by modules → REACT → enable RF).
- Check the RF input power, it should be 15 dBm or more, if its not try to go back in the RF chain, e.g. to a REFM or to the RF-MO.
- Check if the attenuator shows a reasonable value (typically ~ 3 dB ... 6 dB)
- If you suspect the RF power feedback in the REFM-OPT to malfunction you can go to power ctrl, enable freeze DAC and dial in a fixed value. Be aware that the DAC output limit applies at all times.
- If you suspect the RF amplifier in the REFM-OPT to be broken go to modules → REACT and switch to the other amplifier by either checking or unchecking switch to backup RF chain. Go back to point 1, check the RF output power and if required adjust the RF output power feedback.
Second recover the correct RF phase. If the REFM-OPT is in protection mode you must first leave this mode in order to gain manual control:
- Go to the protection panel.
- Disable at least the phase error / phase shifter rate of change / bias voltage / REFMPS phase modules.
- Try to go through TEC recovery if the MZM temperature preview flag is red. Its very important for the REFM-OPT that the TEC is active at all times.
- Disable any other protection module with a red preview flag.
- Press reset. Protection status should no longer be red.
To recover the correct RF phase you should first go to the overview panel.There is a fixed phase offset between both RF channels. This offset needs to be taken into account if the REFM-OPT RF amplifiers where switched recently. You can click onto the RF switches to check their history. Also check the phase shifter history to see if there where any jumps and to find your reference value.
- Finally go to the phase ctrl panel.
- Enable freeze DAC and dial in the correct phase shifter setpoint. Take the phase offset from the overview panel into account if necessary. Be aware that the DAC output limit applies at all times.
TMCB and Server Recovery
If the server is running but not updating, check the x2timer status first. The trigger source on the system panel must be set to external (x2timer). The macropulse # below should update regularly. The x2timer panel is linked for further analysis under system → x2timer.
The logfile is accessible under system → logreader. Use it to gain further information.
The Ethernet connection status is monitored under system → TMCB. The timeout is defined by the operating system of the server host and in the order of 15 minutes. Try to ping the TMCB (e.g. "
ping xfelrfoptcba2m", adapt the hostname). If its reachable you can try to recover the connection using the resume button.
If you think that just the server is malfunctioning but the REFM-OPT is still in operation you can leave it in this state for expert recovery. You can for instance check with the next REFM-OPT in line that the RF power is still OK and no phase jump occurred.
If you absolutely need access to the REFM-OPT, for example to adjust a wrongly set phase shifter and you think the FPGA crashed such that the server can not connect to it anymore you can power cycle the TMCB as a last resort. The procedure is laid out under FRED operation. This is emergency only and will cause a phase jump.
The MZM temperature is or has been outside its predefined temperature window if the MZM temperature protection module is triggered. In this case
- Check monitoring → status supervision → temperature → optics (MZM)
- Observe value, target value and epsilon. Check the history of the actual temperature.
- If the temperature was just slowly drifting out of the predefined window you can adjust the target value (max 0.1K).
- If the deviation is larger check the temperature controller (modules → Meerstetter TEC)
- Autoreset should reset the TEC in case of failures. Check the status flag and the history of the reset counter.
- Open the Meerstetter TEC subpanel. Values with selected checkboxes should update periodically.
- Temperature should be stable and the driver status should be running (monitoring tab, top right corner).
- If not, try to manually reset the TEC (general control tab→ reset device).
- The last resort is a power cycle of the Meerstetter TEC via the FRED.
- Go back to step 4.
All this will cause a phase jump as the DAC for the phase shifter will also be cycled. Don't switch anything off unless you absolutely have to.The FRED server allows to monitor the FRED status but by intention offers only rudimentary control over the FRED. The FRED Panel is linked under modules → FRED. You can switch power on and power off for the complete REFM-OPT at once. You can also power cycle only the TMCB by switching to on, TMCB off and then back to power on.
For more advanced control you need to directly connect to the FRED via telnet:
- You can only open a direct connection to the FRED when the server is not connected. Therefore open the FRED Panel (modules → FRED) and press disconnect.
- Connect via telnet to the FRED ("
telnet xfelrfoptfra2m 10001", adapt the hostname).
- Type "
help" to see a list of supported commands. For example check the FRED state ("
- You can also send a reset signal to the TMCB ("
TMCBHardReset"). Use with caution, this will cause a phase jump.
- When you are done, close the telnet connection ("
ctrl + ]"), press connect on the FRED panel to reconnect the server.