General information about the self-diagnostics service

The self-diagnostics service collects information about the system metrics and checks the operation of all Axxon One components. The data obtained are compared with the indicators of the normal operation of the system. When deviations occur, system health alerts are generated, which can be tracked in the web interface (see Viewing metrics in the self-diagnostics service).

Examples of errors that are tracked:

Hight CPU load.
Low OS virtual memory.
Camera is connected, but doesn't send data (ipint_no_samples).
Archive doesn't work (archive_source_not_activated).
Storage cannot handle record load (archive_overloaded).
No video is recorded on events from a detector (archive_no_ping_from_detector).

The self-diagnostics service implements rules that allow you to monitor various system statuses. You can see a full list of rules available for a particular server in the web interface of the self-diagnostics service in the Status → Rule section: http://127.0.0.1:20040/rules, where:

alert is a rule name;
expr is a rule triggering condition;
actions are actions of the rule;
summary is a rule description.

Note

There are rules that generate alarms, but don't perform any actions. Such rules are labeled disabled: true.

Examples of rules:

alert	expr	actions	summary
Low disk free space (logs)	If free space on the system disk is less than 20 GB, all server logs, including archived logs, are deleted to free up space: wmi_logical_disk_free_bytes{volume="C:"} / (1024 * 1024) < 20480	ACTION_CLEANUP_LOGS	Clean up of the logs directory when there is insufficient space on the system disk
Low disk free space (database)	If free disk space for the database is less than 15 GB, all events older than one week are deleted: wmi_logical_disk_free_bytes{volume="C:"} / (1024 * 1024) < 15360	ACTION_CLEANUP_DB	Clean up of Postgres database when there is insufficient space on the disk. If free disk space for the database is less than: 10 GB—all events older than one day are deleted. 5 GB—all events older than one hour are deleted. 3 GB—all events are deleted
archive_no_samples	Rule checks if new frames go to the archive. If new frames don't go to the archive within five minutes, the archive process is restarted: ((changes(ngp_archive_channel_state_change {ep_name="hosts/SERVER/MultimediaStorage"}[5m]) + ngp_archive_channel_current_state {ep_name="hosts/SERVER/MultimediaStorage"} > 0) unless (changes(ngp_input_sample_counter {ep_name="hosts/SERVER/MultimediaStorage"}[5m]) > 0)) and ignoring(ep_name) ngp_fps{ep_name="hosts/SERVER/DeviceIpint"}	ACTION_RESTART_NGP_UNIT	Restart of the archive service if new frames don't go to the archive
detector_no_sample	Rule checks if frames go to the detector. If new frames don't go to the detector, the detector service is restarted: (absent(ngp_fps{ep_name="hosts/SERVER/AVDetector"}) * scalar(ngp_fps{ep_name="hosts/SERVER/DeviceIpint"}) * scalar(changes(ngp_fps {ep_name="hosts/SERVER/AVDetector"}[3m])) * scalar((ngp_service_desired_state {ep_name="hosts/SERVER/AVDetector"} == 0) + 1)) > 0	ACTION_RESTART_NGP_UNIT	Restart of the detector service if the active detector doesn't receive new frames
statistics_server_unhealthy	If the statistics server doesn't update the item counter or becomes unavailable, the statistics services are automatically restarted: absent(changes(ngp_work_item_counter {ep_name="hosts/SERVER/StatisticsServer"}[5m])) or absent(up{job="node.SERVER"})	ACTION_RESTART_NGP_UNIT	Restart of the statistics service if there are no statistics events

Page tree

Documentation for Axxon One 2.0. Documentation for other versions of Axxon One is available too.