Abstract
Few-shot anomaly detection for video surveillance is challenging due to the diverse nature of target domains. Existing methodologies treat it as a one-class classification problem, training on a reduced sample of nominal scenes. The focus is on either reconstructive or predictive frame methodologies to learn a manifold against which outliers can be detected during inference. We posit that the quality of image reconstruction or future frame prediction is inherently important in identifying anomalous pixels in video frames. In this paper, we enhance the image synthesis and mode coverage for video anomaly detection (VAD) by integrating a Denoising Diffusion model with a future frame prediction model. Our novel VAD pipeline includes a Generative Adversarial Network combined with denoising diffusion to learn the underlying non-anomalous data distribution and generate in one-step high fidelity future-frame samples. We further regularize the image reconstruction with perceptual quality metrics such as Multi-scale Structural Similarity Index Measure and Peak Signal-to-Noise Ratio, ensuring high-quality output under few episodic training iterations. Extensive experiments demonstrate that our method outperforms state-of-the-art techniques across multiple benchmarks, validating that high-quality image synthesis in frame prediction leads to robust anomaly detection in videos.
Original language | English |
---|---|
Article number | 128796 |
Number of pages | 10 |
Journal | Neurocomputing |
Volume | 614 |
Early online date | 05 Nov 2024 |
DOIs | |
Publication status | E-pub ahead of print - 05 Nov 2024 |