Physics Maths Engineering
Suyuan Li,
Xin Song
Abstract Generally, a large amount of training data is essential to train deep learning model for obtaining more accurate detection performance in computer vision domain. However, to collect and annotate datasets will lead to extensive cost. In this letter, we propose a self-supervised auxiliary task to learn general videos features without adding any human-annotated labels, aiming at improving the performance of violence recognition. Firstly, we propose a violence recognition method based on convolutional neural network with self-supervised auxiliary task, which can learn visual feature for improving down-stream task (recognizing violence). Secondly, we establish a balance-weighting scheme to solve the crucial problem of balancing the self-supervised auxiliary task and violence recognition task. Thirdly, we develop an attention receptive-field module, indicating that the proper use of the spatial attention mechanism can effectively expand the receptive fields of the module, further improving semantically meaningful representation of the network. To evaluate the proposed method, two benchmark datasets have been used, and better performance is shown by the experimental results comparing with other state-of-the-art methods.
Show by month | Manuscript | Video Summary |
---|---|---|
2024 November | 41 | 41 |
2024 October | 49 | 49 |
2024 September | 63 | 63 |
2024 August | 42 | 42 |
2024 July | 59 | 59 |
2024 June | 26 | 26 |
2024 May | 42 | 42 |
2024 April | 52 | 52 |
2024 March | 10 | 10 |
Total | 384 | 384 |
Show by month | Manuscript | Video Summary |
---|---|---|
2024 November | 41 | 41 |
2024 October | 49 | 49 |
2024 September | 63 | 63 |
2024 August | 42 | 42 |
2024 July | 59 | 59 |
2024 June | 26 | 26 |
2024 May | 42 | 42 |
2024 April | 52 | 52 |
2024 March | 10 | 10 |
Total | 384 | 384 |