Saeed Niksaz,
Saeed Niksaz
Institution: Shahid Bahonar University of Kerman
Email: info@rnfinity.com
Abstract Automatic medical report generation is the production of reports from radiology images that are grammatically correct and coherent. Encoder-decoder is the most common architecture for report generation, which has not achieved to a satisfactory performance because of the complexity of this t...
More
Abstract Automatic medical report generation is the production of reports from radiology images that are grammatically correct and coherent. Encoder-decoder is the most common architecture for report generation, which has not achieved to a satisfactory performance because of the complexity of this task. This paper presents an approach to improve the performance of report generation that can be easily added to any encoder-decoder architecture. In this approach, in addition to the features extracted from the image, the text related to the most similar image in the training data set is also provided as the input to the decoder. So, the decoder acquires additional knowledge for text production which helps to improve the performance and produce better reports. To demonstrate the efficiency of the proposed method, this technique was added to several different models for producing text from chest images. The results of evaluation demonstrated that the performance of all models improved. Also, different approaches for word embedding, including BioBert, and GloVe, were evaluated. Our result showed that BioBert, which is a language model based on the transformer, is a better approach for this task.
Less
Posted 1 year ago
Rong Lan,
Rong Lan
Institution: Xi’an University of Posts and Telecommunications
Email: info@rnfinity.com
Haowen Mi,
Haowen Mi
Institution: Xi’an University of Posts and Telecommunications
Email: 1466403072@qq.com
Na Qu,
Na Qu
Institution: Xi’an University of Posts and Telecommunications
Email: info@rnfinity.com
Feng Zhao,
Feng Zhao
Institution: Xi’an University of Posts and Telecommunications
Email: info@rnfinity.com
Haiyan Yu,
Haiyan Yu
Institution: Xi’an University of Posts and Telecommunications
Email: info@rnfinity.com
Lu Zhang
Lu Zhang
Institution: Xi’an University of Posts and Telecommunications
Email: info@rnfinity.com
Abstract Although evidence c-means clustering (ECM) based on evidence theory overcomes the limitations of fuzzy theory to some extent and improves the capability of fuzzy c-means clustering (FCM) to express and process the uncertainty of information, the ECM does not consider the spatial information...
More
Abstract Although evidence c-means clustering (ECM) based on evidence theory overcomes the limitations of fuzzy theory to some extent and improves the capability of fuzzy c-means clustering (FCM) to express and process the uncertainty of information, the ECM does not consider the spatial information of pixels, which makes it to be unable to effectively deal with noise pixels. Applying ECM directly to image segmentation cannot obtain satisfactory results. This paper proposes a robust evidence c-means clustering combining spatial information for image segmentation algorithm. Firstly, an adaptive noise distance is constructed by using the local information of pixels to improve the ability to detect noise points. Secondly, the pixel’s original, local and non-local information are introduced into the objective function through adaptive weights to enhance the robustness to noise. Then, the entropy of pixel membership degree is used to design an adaptive parameter to solve the problem of distance parameter selection in credal c-means clustering (CCM). Finally, the Dempster’s rule of combination was improved by introducing spatial neighborhood information, which is used to assign the pixels belonging to the meta-cluster and the noise cluster into the singleton cluster. Experiments on synthetic images, real images and remote sensing SAR images demonstrate that the proposed algorithm not only suppress noise effectively, but also retain the details of the image. Both the segmentation visual effect and evaluation indexes indicate its effectiveness in image segmentation.
Less
Posted 1 year ago
Abstract Generally, a large amount of training data is essential to train deep learning model for obtaining more accurate detection performance in computer vision domain. However, to collect and annotate datasets will lead to extensive cost. In this letter, we propose a self-supervised auxiliary tas...
More
Abstract Generally, a large amount of training data is essential to train deep learning model for obtaining more accurate detection performance in computer vision domain. However, to collect and annotate datasets will lead to extensive cost. In this letter, we propose a self-supervised auxiliary task to learn general videos features without adding any human-annotated labels, aiming at improving the performance of violence recognition. Firstly, we propose a violence recognition method based on convolutional neural network with self-supervised auxiliary task, which can learn visual feature for improving down-stream task (recognizing violence). Secondly, we establish a balance-weighting scheme to solve the crucial problem of balancing the self-supervised auxiliary task and violence recognition task. Thirdly, we develop an attention receptive-field module, indicating that the proper use of the spatial attention mechanism can effectively expand the receptive fields of the module, further improving semantically meaningful representation of the network. To evaluate the proposed method, two benchmark datasets have been used, and better performance is shown by the experimental results comparing with other state-of-the-art methods.
Less
Posted 1 year ago
Betty Saridou,
Betty Saridou
Institution: Lab of Mathematics and Informatics (ISCE), Faculty of Mathematics, Programming and General Courses, Department of Civil Engineering, School of Engineering, Democritus University of Thrace
Email: dsaridou@civil.duth.gr
Stavros Shiaeles,
Stavros Shiaeles
Institution: Centre for Cybercrime and Economic Crime, University of Portsmouth
Email: stavros.shiaeles@port.ac.uk
Basil Papadopoulos
Basil Papadopoulos
Institution: Lab of Mathematics and Informatics (ISCE), Faculty of Mathematics, Programming and General Courses, Department of Civil Engineering, School of Engineering, Democritus University of Thrace
Email: papadob@civil.duth.gr
Image conversion of malicious binaries, or binary visualisation, is a relevant approach in the security community. Recently, it has exceeded the role of a single-file malware analysis tool and has become a part of Intrusion Detection Systems (IDSs) thanks to the adoption of Convolutional Neural Netw...
More
Image conversion of malicious binaries, or binary visualisation, is a relevant approach in the security community. Recently, it has exceeded the role of a single-file malware analysis tool and has become a part of Intrusion Detection Systems (IDSs) thanks to the adoption of Convolutional Neural Networks (CNNs). However, there has been little effort toward image segmentation for the converted images. In this study, we propose a novel method that serves a dual purpose: (a) it enhances colour and pattern segmentation, and (b) it achieves a sparse representation of the images. According to this, we considered the R, G, and B colour values of each pixel as respective fuzzy sets. We then performed α-cuts as a defuzzification method across all pixels of the image, which converted them to sparse matrices of 0s and 1s. Our method was tested on a variety of dataset sizes and evaluated according to the detection rates of hyperparameterised ResNet50 models. Our findings demonstrated that for larger datasets, sparse representations of intelligently coloured binary images can exceed the model performance of unprocessed ones, with 93.60% accuracy, 94.48% precision, 92.60% recall, and 93.53% f-score. This is the first time that α-cuts were used in image processing and according to our results, we believe that they provide an important contribution to image processing for challenging datasets. Overall, it shows that it can become an integrated component of image-based IDS operations and other demanding real-time practices.
Less
Posted 1 year ago
Zilu Zhao,
Zilu Zhao
Institution: Aerospace Information Research Institute, Chinese Academy of Sciences
Email: info@rnfinity.com
Hui Long,
Hui Long
Institution: Aerospace Information Research Institute, Chinese Academy of Sciences
Email: longhui@aircas.ac.cn
Hongjian You
Hongjian You
Institution: Aerospace Information Research Institute, Chinese Academy of Sciences
Email: info@rnfinity.com
Satellite remote sensing has entered the era of big data due to the increase in the number of remote sensing satellites and imaging modes. This presents significant challenges for the processing of remote sensing systems and will result in extremely high real-time data processing requirements. The e...
More
Satellite remote sensing has entered the era of big data due to the increase in the number of remote sensing satellites and imaging modes. This presents significant challenges for the processing of remote sensing systems and will result in extremely high real-time data processing requirements. The effective and reliable geometric positioning of remote sensing images is the foundation of remote sensing applications. In this paper, we propose an optical remote sensing image matching method based on a simple stable feature database. This method entails building the stable feature database, extracting local invariant features that are comparatively stable from remote sensing images using an iterative matching strategy, and storing useful information about the features. Without reference images, the feature database-based matching approach potentially saves storage space for reference data while increasing image processing speed. To evaluate the performance of the feature database matching method, we train the feature database with various local invariant feature algorithms on different time phases of Gaofen-2 (GF-2) images. Furthermore, we carried out matching comparison experiments with various satellite images to confirm the viability and stability of the feature database-based matching method. In comparison with direct matching using the classical feature algorithm, the feature database-based matching method in this paper can essentially improve the correct rate of feature point matching by more than 30% and reduce the matching time by more than 40%. This method improves the accuracy and timeliness of image matching, potentially solves the problem of large storage space occupied by the reference data, and has great potential for fast matching of optical remote sensing images.
Less
Posted 1 year ago
P. Keith Kelly
P. Keith Kelly
Institution: Agile RF Systems LLC
Email: info@rnfinity.com
This article develops the applicability of non-linear processing techniques such as Compressed Sensing (CS), Principal Component Analysis (PCA), Iterative Adaptive Approach (IAA), and Multiple-input-multiple-output (MIMO) for the purpose of enhanced UAV detections using portable radar systems. The c...
More
This article develops the applicability of non-linear processing techniques such as Compressed Sensing (CS), Principal Component Analysis (PCA), Iterative Adaptive Approach (IAA), and Multiple-input-multiple-output (MIMO) for the purpose of enhanced UAV detections using portable radar systems. The combined scheme has many advantages and the potential for better detection and classification accuracy. Some of the benefits are discussed here with a phased array platform in mind, the novel portable phased array Radar (PWR) by Agile RF Systems (ARS), which offers quadrant outputs. CS and IAA both show promising results when applied to micro-Doppler processing of radar returns owing to the sparse nature of the target Doppler frequencies. This shows promise in reducing the dwell time and increases the rate at which a volume can be interrogated. Real-time processing of target information with iterative and non-linear solutions is possible now with the advent of GPU-based graphics processing hardware. Simulations show promising results.
Less
Posted 1 year ago
Youchen Fan,
Youchen Fan
Institution: School of Space Information, Space Engineering University
Email: love193777@sina.com
Mingyu Qin,
Mingyu Qin
Institution: Graduate School, Space Engineering University
Email: info@rnfinity.com
Huichao Guo,
Huichao Guo
Institution: Department of Electronic and Optical Engineering, Space Engineering University
Email: info@rnfinity.com
Laixian Zhang
Laixian Zhang
Institution: Department of Electronic and Optical Engineering, Space Engineering University
Email: info@rnfinity.com
The range-gated laser imaging instrument can capture face images in a dark environment, which provides a new idea for long-distance face recognition at night. However, the laser image has low contrast, low SNR and no color information, which affects observation and recognition. Therefore, it becomes...
More
The range-gated laser imaging instrument can capture face images in a dark environment, which provides a new idea for long-distance face recognition at night. However, the laser image has low contrast, low SNR and no color information, which affects observation and recognition. Therefore, it becomes important to convert laser images into visible images and then identify them. For image translation, we propose a laser-visible face image translation model combined with spectral normalization (SN-CycleGAN). We add spectral normalization layers to the discriminator to solve the problem of low image translation quality caused by the difficulty of training the generative adversarial network. The content reconstruction loss function based on the Y channel is added to reduce the error mapping. The face generated by the improved model on the self-built laser-visible face image dataset has better visual quality, which reduces the error mapping and basically retains the structural features of the target compared with other models. The FID value of evaluation index is 36.845, which is 16.902, 13.781, 10.056, 57.722, 62.598 and 0.761 lower than the CycleGAN, Pix2Pix, UNIT, UGATIT, StarGAN and DCLGAN models, respectively. For the face recognition of translated images, we propose a laser-visible face recognition model based on feature retention. The shallow feature maps with identity information are directly connected to the decoder to solve the problem of identity information loss in network transmission. The domain loss function based on triplet loss is added to constrain the style between domains. We use pre-trained FaceNet to recognize generated visible face images and obtain the recognition accuracy of Rank-1. The recognition accuracy of the images generated by the improved model reaches 76.9%, which is greatly improved compared with the above models and 19.2% higher than that of laser face recognition.
Less
Posted 1 year ago
Ilias Gialampoukidis,
Ilias Gialampoukidis
Institution: Information Technologies Institute, Centre for Research and Technology Hellas
Email: heliasgj@iti.gr
Thomas Papadimos,
Thomas Papadimos
Institution: Information Technologies Institute, Centre for Research and Technology Hellas
Email: info@rnfinity.com
Stelios Andreadis,
Stelios Andreadis
Institution: Information Technologies Institute, Centre for Research and Technology Hellas
Email: info@rnfinity.com
Stefanos Vrochidis,
Stefanos Vrochidis
Institution: Information Technologies Institute, Centre for Research and Technology Hellas
Email: info@rnfinity.com
Ioannis Kompatsiaris
Ioannis Kompatsiaris
Institution: Information Technologies Institute, Centre for Research and Technology Hellas
Email: info@rnfinity.com
This paper discusses the importance of detecting breaking events in real time to help emergency response workers, and how social media can be used to process large amounts of data quickly. Most event detection techniques have focused on either images or text, but combining the two can improve perfor...
More
This paper discusses the importance of detecting breaking events in real time to help emergency response workers, and how social media can be used to process large amounts of data quickly. Most event detection techniques have focused on either images or text, but combining the two can improve performance. The authors present lessons learned from the Flood-related multimedia task in MediaEval2020, provide a dataset for reproducibility, and propose a new multimodal fusion method that uses Graph Neural Networks to combine image, text, and time information. Their method outperforms state-of-the-art approaches and can handle low-sample labelled data.
Less
Posted 1 year ago
Wenbo Wan,
Wenbo Wan
Institution: School of Information Science and Engineering
Email: wanwenbo@sdnu.edu.cn
Jiande Sun,
Jiande Sun
Institution: School of Information Science and Engineering
Email: jiandesun@sdnu.edu.cn
Javier Del Ser,
Javier Del Ser
Institution: TECNALIA, Basque Research and Technology Alliance (BRTA)
Email: info@rnfinity.com
Panchromatic and multispectral image fusion, termed pan-sharpening, is to merge the spatial and spectral information of the source images into a fused one, which has a higher spatial and spectral resolution and is more reliable for downstream tasks compared with any of the source images. It has been...
More
Panchromatic and multispectral image fusion, termed pan-sharpening, is to merge the spatial and spectral information of the source images into a fused one, which has a higher spatial and spectral resolution and is more reliable for downstream tasks compared with any of the source images. It has been widely applied to image interpretation and pre-processing of various applications. A large number of methods have been proposed to achieve better fusion results by considering the spatial and spectral relationships among panchromatic and multispectral images. In recent years, the fast development of artificial intelligence (AI) and deep learning (DL) has significantly enhanced the development of pan-sharpening techniques. However, this field lacks a comprehensive overview of recent advances boosted by the rise of AI and DL. This paper provides a comprehensive review of a variety of pan-sharpening methods that adopt four different paradigms, i.e., component substitution, multiresolution analysis, degradation model, and deep neural networks. As an important aspect of pan-sharpening, the evaluation of the fused image is also outlined to present various assessment methods in terms of reduced-resolution and full-resolution quality measurement. Then, we conclude this paper by discussing the existing limitations, difficulties, and challenges of pan-sharpening techniques, datasets, and quality assessment. In addition, the survey summarizes the development trends in these areas, which provide useful methodological practices for researchers and professionals. Finally, the developments in pan-sharpening are summarized in the conclusion part. The aim of the survey is to serve as a referential starting point for newcomers and a common point of agreement around the research directions to be followed in this exciting area.
Less
Posted 1 year ago
Kunhao Yuan,
Kunhao Yuan
Institution: Loughborough University, UK
Email: info@rnfinity.com
Gerald Schaefer,
Gerald Schaefer
Institution: Loughborough University, UK
Email: info@rnfinity.com
Yifan Wang,
Yifan Wang
Institution: Loughborough University, UK
Email: info@rnfinity.com
Xiyao Liu
Xiyao Liu
Institution: Central South University, China
Email: info@rnfinity.com
Weakly supervised semantic segmentation (WSSS) has gained significant popularity as it relies only on weak labels such as image level annotations rather than the pixel level annotations required by supervised semantic segmentation (SSS) methods. Despite drastically reduced annotation costs, typical ...
More
Weakly supervised semantic segmentation (WSSS) has gained significant popularity as it relies only on weak labels such as image level annotations rather than the pixel level annotations required by supervised semantic segmentation (SSS) methods. Despite drastically reduced annotation costs, typical feature representations learned from WSSS are only representative of some salient parts of objects and less reliable compared to SSS due to the weak guidance during training. In this paper, we propose a novel Multi-Strategy Contrastive Learning (MuSCLe) framework to obtain enhanced feature representations and improve WSSS performance by exploiting similarity and dissimilarity of contrastive sample pairs at image, region, pixel and object boundary levels. Extensive experiments demonstrate the effectiveness of our method and show that MuSCLe outperforms current state-of-the-art methods on the widely used PASCAL VOC 2012 dataset.
Less
Posted 1 year ago