Artificial intelligence for video surveillance utilizes computer software programs that analyze the images from video surveillance cameras in order to recognize humans, vehicles or objects. Security contractors program is the software to define restricted areas within the camera’s view (such as a fenced off area, a parking lot but not the sidewalk or public street outside the lot) and program for times of day (such as after the close of business) for the property being protected by the camera surveillance. The artificial intelligence (“A.I.”) sends an alert if it detects a trespasser breaking the “rule” set that no person is allowed in that area during that time of day.

The A.I. program functions by using machine vision. Machine vision is a series of algorithms, or mathematical procedures, which work like a flow-chart or series of questions to compare the object seen with hundreds of thousands of stored reference images of humans in different postures, angles, positions and movements. The A.I. asks itself if the observed object moves like the reference images, whether it is approximately the same size height relative to width, if it has the characteristic two arms and two legs, if it moves with similar speed, and if it is vertical instead of horizontal. Many other questions are possible, such as the degree to which the object is reflective, the degree to which it is steady or vibrating, and the smoothness with which it moves. Combining all of the values from the various questions, an overall ranking is derived which gives the A.I. the probability that the object is or is not a human. If the value exceeds a limit that is set, then the alert is sent. It is characteristic of such programs that they are self-learning to a degree, learning, for example that humans or vehicles appear bigger in certain portions of the monitored image – those areas near the camera – than in other portions, those being the areas farthest from the camera.

In addition to the simple rule restricting humans or vehicles from certain areas at certain times of day, more complex rules can be set. The user of the system may wish to know if vehicles drive in one direction but not the other. Users may wish to know that there are more than a certain preset number of people within a particular area. The A.I. is capable of maintaining surveillance of hundreds of cameras simultaneously. Its ability to spot a trespasser in the distance or in rain or glare is superior to humans’ ability to do so.

This type of A.I. for security is known as “rule-based” because a human programmer must set rules for all of the things for which the user wishes to be alerted. This is the most prevalent form of A.I. for security. Many video surveillance camera systems today include this type of A.I. capability. The hard-drive that houses the program can either be located in the cameras themselves or can be in a separate device that receives the input from the cameras.

A newer, non-rule based form of A.I. for security called “behavioral analytics” has been developed. This software is fully self-learning with no initial programming input by the user or security contractor. In this type of analytics, the A.I. learns what is normal behavior for people, vehicles, machines, and the environment based on its own observation of patterns of various characteristics such as size, speed, reflectivity, color, grouping, vertical or horizontal orientation and so forth. The A.I. normalizes the visual data, meaning that it classifies and tags the objects and patterns it observes, building up continuously refined definitions of what is normal or average behavior for the various observed objects. After several weeks of learning in this fashion it can recognize when things break the pattern. When it observes such anomalies it sends an alert. For example, it is normal for cars to drive in the street. A car seen driving up onto a sidewalk would be an anomaly. If a fenced yard is normally empty at night, then a person entering that area would be an anomaly.

Statement of the problem

Limitations in the ability of humans to vigilantly monitor video surveillance live footage led to the demand for artificial intelligence that could better serve the task. Humans watching a single video monitor for more than twenty minutes lose 95% of their ability to maintain attention sufficient to discern significant events. With two monitors this is cut in half again. Given that many facilities have dozens or even hundreds of cameras, the task is clearly beyond human ability. In general, the camera views of empty hallways, storage facilities, parking lots or structures are exceedingly boring and thus attention is quickly attenuated. When multiple cameras are monitored, typically employing a wall monitor or bank of monitors with split screen views and rotating every several seconds between one set of cameras and the next, the visual tedium is quickly overwhelming. While video surveillance cameras proliferated with great adoption by users ranging from car dealerships and shopping plazas to schools and businesses to highly secured facilities such as nuclear plants, it was recognized in hindsight that video surveillance by human officers (also called “operators”) was impractical and ineffective. Extensive video surveillance systems were relegated to merely recording for possible forensic use to identify someone, after the fact of a theft, arson, attack or incident. Where wide angle camera views were employed, particularly for large outdoor areas, severe limitations were discovered even for this purpose due to insufficient resolution. In these cases it is impossible to identify the trespasser or perpetrator because their image is too tiny on the monitor.

Earlier attempts at solution

Motion detection cameras

In response to the shortcomings of human guards to watch surveillance monitors long-term, the first solution was to add motion detectors to cameras. It was reasoned that an intruder’s or perpetrator’s motion would send an alert to the remote monitoring officer obviating the need for constant human vigilance. The problem was that in an outdoor environment there is constant motion or changes of pixels that comprise the total viewed image on screen. The motion of leaves on trees blowing in the wind, litter along the ground, insects, birds, dogs, shadows, headlights, sunbeams and so forth all comprise motion. This caused hundreds or even thousands of false alerts per day, rendering this solution inoperable except in indoor environments during times of non-operating hours.

Advanced video motion detection

The next evolution reduced false alerts to a degree but at the cost of complicated and time-consuming manual calibration. Here, changes of a target such as a person or vehicle relative to a fixed background are detected. Where the background changes seasonally or due to other changes, the reliability deteriorates over time. The economics of responding to too many false alerts again proved to be an obstacle and this solution was insufficient.

Advent of true video analytics

Machine learning of visual recognition relates to patterns and their classification. True video analytics can distinguish the human form, vehicles and boats or selected objects from the general movement of all other objects and visual static or changes in pixels on the monitor. It does this by recognizing patterns. When the object of interest, for example a human, violates a preset rule, for example that the number of people shall not exceed zero in a pre-defined area during a defined time interval, then an alert is sent. A red rectangle or so-called “bounding box” will typically automatically follow the detected intruder, and a short video clip of this is sent as the alert.

Practical application

Real-time preventative action

The detection of intruders using video surveillance has limitations based on economics and the nature of video cameras. Typically, cameras outdoors are set to a wide angle view and yet look out over a long distance. Frame rate per second and dynamic range to handle brightly lit areas and dimly lit ones further challenge the camera to actually be adequate to see a moving human intruder. At night, even in illuminated outdoor areas, a moving subject does not gather enough light per frame per second and so, unless quite close to the camera, will appear as a thin wisp or barely discernible ghost or completely invisible. Conditions of glare, partial obscuration, rain, snow, fog, and darkness all compound the problem. Even when a human is directed to look at the actual location on a monitor of a subject in these conditions, the subject will usually not be detected. The A.I. is able to impartially look at the entire image and all cameras’ images simultaneously. Using statistical models of degrees of deviation from its learned pattern of what constitutes the human form it will detect an intruder with high reliability and a low false alert rate even in adverse conditions. Its learning is based on approximately a quarter million images of humans in various positions, angles, postures, and so forth.

A one megapixel camera with the onboard video analytics was able to detect a human at a distance of about 350′ and an angle of view of about 30 degrees in non-ideal conditions. Rules could be set for a “virtual fence” or intrusion into a pre-defined area. Rules could be set for directional travel, object left behind, crowd formation and some other conditions.

Talk-down

One of the most powerful features of the system is that a human officer or operator, receiving an alert from the A.I., could immediately talk down over outdoor public address loudspeakers to the intruder. This had high deterrence value as most crimes are opportunistic and the risk of capture to the intruder becomes so pronounced when a live person is talking to them that they are very likely to desist from intrusion and to retreat. The security officer would describe the actions of the intruder so that the intruder had no doubt that a real person was watching them. The officer would announce that the intruder was breaking the law and that law enforcement was being contacted and that they were being video-recorded.

Verified breach report

The police receive a tremendous number of false alarms from burglar alarms. In fact the security industry reports that over 98% of such alarms are false ones. Accordingly, the police give very low priority response to burglar alarms and can take from twenty minutes to two hours to respond to the site. By contrast, the video analytic-detected crime is reported to the central monitoring officer, who verifies with his or her own eyes that it is a real crime in progress. He or she then dispatches to the police who give such calls their highest priority.

Continuing advances

Analytics work with digital cameras or analog cameras that have analog-to-digital converters. In many cases the software samples down to standard definition or less, regardless of the camera’s resolution.

Source: Wikipedia

Translator: Sarah Karimi

Artificial Intelligence for Video Surveillance (Part I)