View Source

To train neural networks, it is necessary to collect and submit to AxxonSoft video recordings and images from your actual cameras taken in the same resolution and under the same conditions as in your future application.

For example, if your neural network is intended to analyze outdoor video feeds, your videos must contain all range of weather conditions (sun, rain, snow, fog, and so on) in different times of day (daytime, twilight, night).

General requirements for collected data:

when collecting video recordings and images, specific requirements for object images, scene, angle, illumination and video stream must be met for those detection tools that you plan to use (see Configuring detection modules);

if it is required to train the neural network in different conditions of time of day, lighting, angle, object types or weather, then the video material must be collected in equal shares for each condition, that is, it must be balanced.

Example. It is necessary to detect a person in the surveillance area at night and during the day.

Data collected correctly:

four video recordings of the surveillance area, each five minutes long;
the object of interest appears in the frame in each video fragment;
two fragments must be recorded in night conditions, two fragments—in daytime conditions.

Data collected incorrectly:

three video recordings of the surveillance area, each five minutes long;
the object of interest appears in the frame in each video fragment;
two fragments were recorded in night conditions, one fragment—in daytime conditions.

Extra requirements for videos for each neural analytics tool are listed in the following table:

Tool	Requirements
Neurofilter	No less than 1000 frames containing objects of interest in given scene conditions, and the same amount of footage containing no objects (background footage)
Neurotracker	three to five minutes of video containing objects of interest in given scene conditions. The more the number and variability of the situations in the scene, the better
Pose detection tools	10 seconds of video of a scene with no people. No less than 100 different persons in given scene conditions. Attention! Different conditions mean, among others, different poses of an individual in scene (tilting, different limbs patterns, and so on)
Equipment detection tool (PPE)	A list of all reference equipment with examples must be collected from the facility and coordinated with the analytics manufacturer (see Example of providing a list of valid equipment at the facility). Several videos 3-5 minutes each with personnel in the given scene conditions. Personnel must move and change poses in the recorded videos, as well as remove and put on equipment at intervals of 30 seconds. Since the Equipment detection tool (PPE) is designed for artificial constant lighting, videos in other lighting conditions are not required
Fire detection and Smoke detection	At least 1000 frames with various objects of the class of interest in the given scene conditions and the same number of frames without the objects of interest in the frame (noise frames)
Food recognition*	Images of at least 80% of the actual menu items must be provided. Each menu item requires 20 to 40 images in different conditions

If the above requirements for the collection of data transmitted for training the neural network model are met, and if the neural network is operated in conditions that are as similar as possible to the conditions in which the material for its training was collected, then the overall accuracy** of neural network analytics is guaranteed from 90% to 97% and the percentage of false positives is 5-7%. For general networks***, an overall accuracy of 80-95% and a false positive rate of 5-20% are guaranteed.

* This analytics will be available in future versions of Axxon PSIM.

** Accuracy is indicated for a neural network model, which was trained under operating conditions.

*** A general network is a network that was not trained under operating conditions.

The requirements may be changed or added to at any time.