Physics Maths Engineering

Attention Inspiring Receptive-Fields Multi-Task Network via Self- supervised Learning for Violence Recognition

Abstract

Abstract Generally, a large amount of training data is essential to train deep learning model for obtaining more accurate detection performance in computer vision domain. However, to collect and annotate datasets will lead to extensive cost. In this letter, we propose a self-supervised auxiliary task to learn general videos features without adding any human-annotated labels, aiming at improving the performance of violence recognition. Firstly, we propose a violence recognition method based on convolutional neural network with self-supervised auxiliary task, which can learn visual feature for improving down-stream task (recognizing violence). Secondly, we establish a balance-weighting scheme to solve the crucial problem of balancing the self-supervised auxiliary task and violence recognition task. Thirdly, we develop an attention receptive-field module, indicating that the proper use of the spatial attention mechanism can effectively expand the receptive fields of the module, further improving semantically meaningful representation of the network. To evaluate the proposed method, two benchmark datasets have been used, and better performance is shown by the experimental results comparing with other state-of-the-art methods.