Abstract
Human Activity Recognition (HAR) is a rapidly evolving field in computer vision and artificial intelligence, which aims to automatically detect and interpret human actions from video data. Despite extensive research, significant challenges persist in designing models that effectively capture both spatial and temporal dependencies while remaining computationally efficient. Existing approaches often emphasize either spatial or temporal features and typically rely on complex architectures, limiting their applicability in real-world, resource-constrained settings. Moreover, training regularization techniques such as noise injection have not been systematically studied in the context of HAR, representing a clear gap in the literature. This thesis addresses these aspects through a structured investigation of lightweight, noiseregularized, deep learning architectures for HAR. A key contribution is the introduction of the Time-Distributed AlexNet—a novel adaptation of CNNs using time-distributed layers to preserve temporal order across video frames while maintaining low computational overheads. As described in Chapter 3, the model is rigorously benchmarked against advanced alternatives including ConvLSTM, LRCN and Vision Transformers on the UCF-50 dataset. The proposed architecture achieves competitive performance with superior efficiency, highlighting its. Chapter 4 presents a systematic study of training-time noise injection using Gaussian and Chaotic noise distributions. Unlike prior work, which overlooks the role of training dynamics, this research demonstrates that noise injection enhances model generalization and robustness.Empirical evaluations on multiple UCI benchmark datasets show marked improvements, with the Car dataset achieving 95.95% accuracy under optimized Gaussian noise conditions. Chapter 5 evaluates the Time-Distributed AlexNet on EduNet, an education-focused HAR dataset. The model achieves an accuracy of 91.40% and an F1 score of 92.77%, outperforming contemporary baselines. Importantly, the study justifies the switch from UCF-50 by demonstrating EduNet’s relevance to fine-grained, real-world-like classroom activity classification. Results further confirm the effectiveness of noise injection in improving model stability under subtle activity variations. By clearly identifying the limitations in existing HAR methodologies, particularly the underexplored area of training regularization and lightweight modelling, this thesis contributes a novel and practical approach to advancing robust activity recognition. It lays the groundwork for future research in multimodal fusion, personalized HAR and explainable AI systems.
| Date of Award | 2025 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Reyer Zwiggelaar (Supervisor) & Tossapon Boongoen (Supervisor) |
Keywords
- human activity recognition
- deep learning
- computer vision