Rnfinity - Attention Inspiring Receptive-Fields Multi-Task Network...

Abstract

Abstract Generally, a large amount of training data is essential to train deep learning model for obtaining more accurate detection performance in computer vision domain. However, to collect and annotate datasets will lead to extensive cost. In this letter, we propose a self-supervised auxiliary task to learn general videos features without adding any human-annotated labels, aiming at improving the performance of violence recognition. Firstly, we propose a violence recognition method based on convolutional neural network with self-supervised auxiliary task, which can learn visual feature for improving down-stream task (recognizing violence). Secondly, we establish a balance-weighting scheme to solve the crucial problem of balancing the self-supervised auxiliary task and violence recognition task. Thirdly, we develop an attention receptive-field module, indicating that the proper use of the spatial attention mechanism can effectively expand the receptive fields of the module, further improving semantically meaningful representation of the network. To evaluate the proposed method, two benchmark datasets have been used, and better performance is shown by the experimental results comparing with other state-of-the-art methods.

Key Questions

What is the focus of the study?

The study focuses on enhancing image-based malware detection by applying α-cuts to binary visualizations of malicious binaries, aiming to improve color and pattern segmentation and achieve sparse image representations.

How are α-cuts utilized in this research?

In this research, the R, G, and B color values of each pixel are considered as respective fuzzy sets. α-cuts are then applied as a defuzzification method across all pixels, converting them into sparse matrices of 0s and 1s, thereby enhancing color and pattern segmentation.

What methodology was used to evaluate the proposed approach?

The proposed method was tested on various dataset sizes and evaluated using hyperparameterized ResNet50 models. The performance metrics included accuracy, precision, recall, and f-score to assess the effectiveness of the approach.

What were the key findings of the study?

The study found that for larger datasets, the sparse representations of intelligently colored binary images achieved through α-cuts can surpass the performance of unprocessed images. Specifically, the method achieved 93.60% accuracy, 94.48% precision, 92.60% recall, and a 93.53% f-score.

What is the significance of this research in the field of image processing?

This research is significant as it is the first to apply α-cuts in image processing for malware detection. The findings suggest that α-cuts provide an important contribution to handling challenging datasets and can be integrated into image-based Intrusion Detection Systems (IDS) and other demanding real-time applications.

Summary Video Not Available

Review 0

ARTICLE USAGE

Article usage: May-2023 to Apr-2026

Show by month	Manuscript	Video Summary
2026 April	18	18
2026 March	58	58
2026 February	61	61
2026 January	98	98
2025 December	85	85
2025 November	70	70
2025 October	74	74
2025 September	89	89
2025 August	69	69
2025 July	61	61
2025 June	124	124
2025 May	97	97
2025 April	70	70
2025 March	67	67
2025 February	44	44
2025 January	48	48
2024 December	60	60
2024 November	53	53
2024 October	49	49
2024 September	63	63
2024 August	42	42
2024 July	59	59
2024 June	26	26
2024 May	42	42
2024 April	52	52
2024 March	10	10
Total	1589	1589

Show by month	Manuscript	Video Summary
2026 April	18	18
2026 March	58	58
2026 February	61	61
2026 January	98	98
2025 December	85	85
2025 November	70	70
2025 October	74	74
2025 September	89	89
2025 August	69	69
2025 July	61	61
2025 June	124	124
2025 May	97	97
2025 April	70	70
2025 March	67	67
2025 February	44	44
2025 January	48	48
2024 December	60	60
2024 November	53	53
2024 October	49	49
2024 September	63	63
2024 August	42	42
2024 July	59	59
2024 June	26	26
2024 May	42	42
2024 April	52	52
2024 March	10	10
Total	1589	1589

Attention Inspiring Receptive-Fields Multi-Task Network via Self- supervised Learning for Violence Recognition

Added on