Sample images from VisA dataset
Introduction to Visual Anomaly Detection
When we think of deep learning applied to images or videos, our minds immediately jump to classification or object detection, having an image or video, the ability to recognize an object or an action and where it is happening. But, what about scenarios where the goal is not just to spot a specific object or situation but to identify when something changed in an object or something deviates from the expected behavior? Consider, for instance, an industrial factory looking to detect a defect in its products or a traffic surveillance system seeking to detect an accident in a normal traffic flow. In these cases, the defects and the accidents can take numerous forms and show in unexpected ways, making anomaly detection a handy solution.
We understand anomaly detection as the identification of objects, data, or events that fall outside the “normal/typical” patterns. Anomaly detection is widely used in traditional statistics for data analysis of structured datasets and time data series where a group of data can be clustered together and any data outside it can be called an outlier. However, when it comes to anomaly detection in images and videos the challenges are considerably more complex, even though the principle is the same, find outlier images or frames with an anomaly. As you know visual features are more difficult to identify and extract than numerical ones, and often demands a substantial amount of data where statistical and processing machine learning approaches fall behind but with the help of deep learning techniques for automatic feature extraction, anomaly detection for image and video is expanding requiring less handcrafting and improving the generalization capabilities.
General concept of anomaly detection, candies are normal samples
while lollipops are anomalies, outliers
Although we can understand visual anomaly detection as a binary classification between normal and anomalous classes, anomalous samples for its nature are difficult to obtain. Usually datasets will have an imbalance of normal and anomalous samples, making supervised approaches insufficient for real live cases. For this reason anomaly detection makes use of unsupervised or weakly supervised methods.
In unsupervised methods a large set of normal samples is used during training to learn the boundary of normal patterns and during testing any pattern that falls outside this boundary is considered an anomaly. Thus certain events or data that may be considered normal by humans but not represented in the train dataset may be labeled as abnormal.
On the other hand, weakly supervised methods use a large set of normal samples alongside a small number of anomaly samples to learn a normal distribution of data but limited by the characteristics of the anomalous samples. Anomalies are then identified by comparing the samples against the normal behavior, anything out of that normality can be considered an anomaly.
Unsupervised/weakly supervised anomaly detection are really useful when the normal behavior is well-defined but the potential anomalies can be diverse and unknown in advance. As well, anomaly detection is highly dependent on the context, an object or an activity that is considered normal in one scenario may be considered an anomaly in a different scenario.
Detecting anomalies in visual data can be achieved through either image anomaly detection or video anomaly detection. Image anomaly detection comes into play when anomalies can be identified from a single input image, for instance it can be used to detect product defects or to identify unusual objects within an environment.
Image anomaly detection representation. Receiving a pcb’s image input the anomaly detector will “stamp” normal on pcb’s image without defects and anomaly on pcb’s image with a defect.
(Sample images from VisA dataset)
However, there are situations where an image alone may not provide sufficient information to identify anomalies accurately. In such cases, temporal information becomes crucial. Video anomaly detection methods are then utilized to detect anomalous actions, activities, or temporal relationships of actions. For instance, identifying a fight in a public space or detecting an unattended bag left for an extended period at an airport requires the application of video anomaly detection techniques.
Video anomaly detection representation. Receiving a frames sequence the video anomaly detector will calculate the anomaly score, resulting in a low score for normal frames and high score for
anomalous frames. (Sample frames from dataset)
Performance Evaluation
A model performance can be evaluated quantitatively by comparing the predicted results with the ground truth labels. However for anomaly detection some predictions may not be discrete values but continuous values in a range of [0, 1], so a threshold must be selected first to define the range of normality and the range of abnormality, usually scores below the threshold are considered normal and above the threshold considered abnormal.
Once normality and abnormality is well defined, a performance metric can be calculated. Even though there is not a standard metric for anomaly detection and different anomaly detection methods use different metrics, like precision, recall, F1 score, true positive rate and false positive rate, the most common evaluation metric for image and video anomaly detection is ROC-AUC.
The ROC-AUC is the area under the curve (AUC) of the receiver operating characteristic (ROC) curve, a plot of the true positive rate(TPR) against the false positive rate(FPR). The curve is obtained by calculating the FPR and TPR under multiple sets of thresholds. The value of ROC-AUC can range from 0 to 1. A model with higher value of ROC-AUC is considered better than a model with a lower ROC-AUC.
To get the values for the ROC-AUC for an specific threshold, you need the number of abnormal samples correctly detected called true positives(TP), the abnormal samples mistakenly detected as normal called false negatives(FN), the normal samples mistakenly detected as abnormal called false positives(FP) and the normal samples correctly detected as normal called true negatives(TN). Take a look at the confusion matrix below to have a better visualization of the described metrics.
Anomaly detection confusion matrix representation
With that you can calculate the true positive rate (TPR), and the false positive rate (FPR) which are used to calculate the ROC-AUC as follows:
TPR = TP / (TP + FN)
FPR = FP / (FP + TN)
Some researchers think that average precision (AP) may be a more suitable metric to evaluate anomaly detection due to the high imbalance nature of the samples given that TN is usually larger than TP. AP is the area under the curve of the Precision-Recall plot, the horizontal axis coordinate is recall and the vertical axis coordinate is precision. Equivalent to the ROC curve, a point on the PR curve corresponds to the precision and recall values at a certain threshold. Precision and recall can be calculated as follows:
Recall = TPR = TP / (TP + FN)
Precision = TP / (TP + FP)
Detection Methods
In the upcoming blogs, we will explore deeper the world of image and video anomaly detection, unraveling various approaches and methodologies. Our exploration will follow the insightful categorization proposed by researchers, as depicted in the figure below:
Visual anomaly detection categorization
The next blog will be dedicated to image anomaly detection, where we'll describe semi-supervised methods, unsupervised feature embedding-based techniques, and unsupervised reconstruction-based methodologies. Following that, we will release another blog solely focusing on video anomaly detection. This piece will elaborate on unsupervised methods, categorized by detection logic and input usage, and delve into weekly supervised methods, further divided by input modality.
Anomaly detection is closely related with anomaly localization, where detection focuses on determining whether a sample exhibits an anomaly or not; while localization focuses on finding the bounding box where the anomalies are present. These blogs will pay attention to anomaly detection mechanisms even though most of the deep learning base methods can take care of both in a single end-to-end pipeline.
Stay tuned for an in-depth exploration of the intricacies of anomaly detection in images and videos.
Commentaires