NotesFAQContact Us
Collection
Advanced
Search Tips
Back to results
ERIC Number: ED639532
Record Type: Non-Journal
Publication Date: 2023
Pages: 117
Abstractor: As Provided
ISBN: 979-8-3806-0872-5
ISSN: N/A
EISSN: N/A
Long-Term Human Video Activity Quantification in Collaborative Learning Environments
Venkatesh Jatla
ProQuest LLC, Ph.D. Dissertation, The University of New Mexico
Research on video activity detection has primarily focused on identifying well-defined human activities in short video segments. The majority of the research on video activity recognition is focused on the development of large parameter systems that require training on large video datasets. This dissertation develops a low-parameter, modular system with rapid inferencing capabilities that can be trained entirely on limited datasets without requiring transfer learning from large-parameter systems. The system can accurately detect and associate specific activities with the students who perform the activities in real-life classroom videos. Additionally, the dissertation develops an interactive web-based application to visualize human activity maps over long real-life classroom videos. Long-term video activity detection in real-life classroom videos present unique challenges, such as the need to detect multiple simultaneous activities, rapid transitions between activities, long-term occlusions, durations exceeding 15 minutes, and numerous individuals performing similar activities in the background. Moreover, subtle hand movements further complicate the need to differentiate between actual typing and writing activities as opposed to unrelated hand movements. The system processes the input videos using fast activity initializations and current methods for object detection to determine the location and the the person performing the activities. These regions are then processed through an optimal low-parameter dyadic 3D-CNN classifier to identify the activity. The proposed system processes 1 hour of video in 15 minutes for typing and 50 minutes for writing activities. The system uses several methods to optimize the inference pipeline. For each activity, the system determines an optimal low-parameter 3D CNN architecture selected from a family of low-parameter architectures. The input video is broken into smaller video regions that are transcoded at an optimized frame rate. For inference, an optimal batch size is determined for processing input videos faster. Overall, the low-parameter separable activity classification model uses just 18.7K parameters, requiring 136.32 MB of memory and running at 4,620 (154 x 30) frames per second. Compared to current methods, the approach used at least 1,000 fewer parameters and 20 times less GPU memory, while outperforming in both inference speed and classification accuracy. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://bibliotheek.ehb.be:2222/en-US/products/dissertations/individuals.shtml.]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://bibliotheek.ehb.be:2222/en-US/products/dissertations/individuals.shtml
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: National Science Foundation (NSF)
Authoring Institution: N/A
Grant or Contract Numbers: 1613637; 1842220; 1949230