Documentation for Axxon One 2.0. Documentation for other versions of Axxon One is available too.

Previous page Self-diagnostics service  Starting and stopping the self-diagnostics service Next page

The self-diagnostics service collects information about the system metrics and checks the operation of all Axxon One components. The data obtained are compared with the indicators of the normal operation of the system. When deviations occur, system health alerts are generated, which can be tracked in the web interface (see Viewing metrics in the self-diagnostics service).

Examples of errors that are tracked:

  • Hight CPU load.
  • Low OS virtual memory.
  • Camera is connected, but doesn't send data (ipint_no_samples).
  • Archive doesn't work (archive_source_not_activated).
  • Storage cannot handle record load (archive_overloaded).
  • No video is recorded on events from a detector (archive_no_ping_from_detector).

The self-diagnostics service implements rules that allow you to monitor various system statuses. You can see a full list of rules available for a particular server in the web interface of the self-diagnostics service in the StatusRule section: http://127.0.0.1:20040/rules, where:

  • alert is a rule name;
  • expr is a rule triggering condition; 
  • actions are actions of the rule;
  • summary is a rule description.

Note

There are rules that generate alarms, but don't perform any actions. Such rules are labeled disabled: true.

Examples of rules:

alertexpractionssummary
Low disk free space (logs)

If free space on the system disk is less than 20 GB, all server logs, including archived logs, are deleted to free up space:

wmi_logical_disk_free_bytes{volume="C:"} / (1024 * 1024) < 20480
ACTION_CLEANUP_LOGS

Clean up of the logs directory when there is insufficient space on the system disk

Low disk free space (database)

If free disk space for the database is less than 15 GB, all events older than one week are deleted:

wmi_logical_disk_free_bytes{volume="C:"} / (1024 * 1024) < 15360


ACTION_CLEANUP_DB

Clean up of Postgres database when there is insufficient space on the disk. If free disk space for the database is less than:

  • 10 GB—all events older than one day are deleted.
  • 5 GB—all events older than one hour are deleted.
  • 3 GB—all events are deleted
archive_no_samples

Rule checks if new frames go to the archive. If new frames don't go to the archive within five minutes, the archive process is restarted:

((changes(ngp_archive_channel_state_change
{ep_name="hosts/SERVER/MultimediaStorage"}[5m]) + ngp_archive_channel_current_state
{ep_name="hosts/SERVER/MultimediaStorage"} > 0) unless (changes(ngp_input_sample_counter
{ep_name="hosts/SERVER/MultimediaStorage"}[5m]) > 0)) 
and ignoring(ep_name) ngp_fps{ep_name="hosts/SERVER/DeviceIpint"}
ACTION_RESTART_NGP_UNIT

Restart of the archive service if new frames don't go to the archive

detector_no_sample

Rule checks if frames go to the detector. If new frames don't go to the detector, the detector service is restarted:

(absent(ngp_fps{ep_name="hosts/SERVER/AVDetector"}) * scalar(ngp_fps{ep_name="hosts/SERVER/DeviceIpint"}) * scalar(changes(ngp_fps
{ep_name="hosts/SERVER/AVDetector"}[3m])) * scalar((ngp_service_desired_state
{ep_name="hosts/SERVER/AVDetector"} == 0) + 1)) > 0
ACTION_RESTART_NGP_UNIT

Restart of the detector service if the active detector doesn't receive new frames

statistics_server_unhealthy

If the statistics server doesn't update the item counter or becomes unavailable, the statistics services are automatically restarted:

absent(changes(ngp_work_item_counter
{ep_name="hosts/SERVER/StatisticsServer"}[5m])) or absent(up{job="node.SERVER"})
ACTION_RESTART_NGP_UNIT

Restart of the statistics service if there are no statistics events


  • No labels