Page tree

The monitoring module for ASAPO should improve debugging and monitoring capabilities for normal ASAPO users,
by visualizing the traffic in a topology view and precise diagrams.

It is also a great tool to uncover the inner workings of the "ASAPO-Backbox".

Metrics:

The following services provide these metrics:

  • Broker
    • File requests (only successful ones from get_next/get_last/get_by_id. Need to think what to do with datasets - single request to broker and potentially many to RDS/FDS )
  • FTS (File transfer service)
    • Acknowledgment that the file was resolved by FTS (and bandwidth for data transfer/time to read file from disk)
  • Receiver.RDS (Receiver DataCache service)
    • Acknologement that the file was resolved by RDS or if it was a miss (and bandwidth)
  • Receiver.Incomming(The normal receiver)
    • Infomration about incoming files (amount, bandwidth, shares for io/network/database)

Grouped by:

  • Beamtime
  • Source
  • Stream
  • (Consumer Group)

Not tracked:
Cache misses on write on a receiver. (This is probably super rare, log it as a warning).

Layout:

There will be two modes in WebClient:

UserView:
The user view allows the user to see the basic diagrams like images/s and gb/s; incoming and outgoing.
The outgoing traffic is stacked in the following categories: FromRDS, FromFTS, Unknown(for files requested via Broker but unknown if ever really requested, e.g., resolved via local FS)
There will also be an advanced view, where the user can see:

  • RDS Hit/Miss rate (important to know if a file was requested but was not in the cache anymore)
  • RDS cache memory usage {Own..., Others} (Own... = an array of all selected Beamtime, Source, Stream groups)


AdminView:
The admin view will mostly be the same as the user's view but has access to all Beamtimes. 

Foreseeable Problems:

How to correlate DataBroker requests with FTS or RDS requests?

> What about the consumer delay between the requests on the broker and the actual requests to the RDS (if ever (mind the possibility of local FS)) - ignore it

Setup:

The receiver and broker are sending updates to the monitoring server in 1 seconds intervals.
The monitoring server will store this metric in an InfluxDB measurement will all relevant Tags  (Beamtime, Source, Stream).
The WebClient can request data from the monitoring server within a user-specified interval (e.g., 10 sec or 1 sec).

  • No labels