image features
Recently Published Documents


TOTAL DOCUMENTS

2618
(FIVE YEARS 1152)

H-INDEX

53
(FIVE YEARS 10)

Author(s):  
A. Pramod Reddy ◽  
Vijayarajan V.

Automatic emotion recognition from Speech (AERS) systems based on acoustical analysis reveal that some emotional classes persist with ambiguity. This study employed an alternative method aimed at providing deep understanding into the amplitude–frequency, impacts of various emotions in order to aid in the advancement of near term, more effectively in classifying AER approaches. The study was undertaken by converting narrow 20 ms frames of speech into RGB or grey-scale spectrogram images. The features have been used to fine-tune a feature selection system that had previously been trained to recognise emotions. Two different Linear and Mel spectral scales are used to demonstrate a spectrogram. An inductive approach for in sighting the amplitude and frequency features of various emotional classes. We propose a two-channel profound combination of deep fusion network model for the efficient categorization of images. Linear and Mel- spectrogram is acquired from Speech-signal, which is prepared in the recurrence area to input Deep Neural Network. The proposed model Alex-Net with five convolutional layers and two fully connected layers acquire most vital features form spectrogram images plotted on the amplitude-frequency scale. The state-of-the-art is compared with benchmark dataset (EMO-DB). RGB and saliency images are fed to pre-trained Alex-Net tested both EMO-DB and Telugu dataset with an accuracy of 72.18% and fused image features less computations reaching to an accuracy 75.12%. The proposed model show that Transfer learning predict efficiently than Fine-tune network. When tested on Emo-DB dataset, the propȯsed system adequately learns discriminant features from speech spectrȯgrams and outperforms many stȧte-of-the-art techniques.


Entropy ◽  
2022 ◽  
Vol 24 (1) ◽  
pp. 132
Author(s):  
Eyad Alsaghir ◽  
Xiyu Shi ◽  
Varuna De Silva ◽  
Ahmet Kondoz

Deep learning, in general, was built on input data transformation and presentation, model training with parameter tuning, and recognition of new observations using the trained model. However, this came with a high computation cost due to the extensive input database and the length of time required in training. Despite the model learning its parameters from the transformed input data, no direct research has been conducted to investigate the mathematical relationship between the transformed information (i.e., features, excitation) and the model’s learnt parameters (i.e., weights). This research aims to explore a mathematical relationship between the input excitations and the weights of a trained convolutional neural network. The objective is to investigate three aspects of this assumed feature-weight relationship: (1) the mathematical relationship between the training input images’ features and the model’s learnt parameters, (2) the mathematical relationship between the images’ features of a separate test dataset and a trained model’s learnt parameters, and (3) the mathematical relationship between the difference of training and testing images’ features and the model’s learnt parameters with a separate test dataset. The paper empirically demonstrated the existence of this mathematical relationship between the test image features and the model’s learnt weights by the ANOVA analysis.


Diagnostics ◽  
2022 ◽  
Vol 12 (1) ◽  
pp. 204
Author(s):  
Gergely Csány ◽  
László Hunor Gergely ◽  
Norbert Kiss ◽  
Klára Szalai ◽  
Kende Lőrincz ◽  
...  

A compact handheld skin ultrasound imaging device has been developed that uses co-registered optical and ultrasound imaging to provide diagnostic information about the full skin depth. The aim of the current work is to present the preliminary clinical results of this device. Using additional photographic, dermoscopic and ultrasonic images as reference, the images from the device were assessed in terms of the detectability of the main skin layer boundaries and characteristic image features. Combined optical-ultrasonic recordings of various types of skin lesions (melanoma, basal cell carcinoma, seborrheic keratosis, dermatofibroma, naevus, dermatitis and psoriasis) were taken with the device (N = 53) and compared with images captured with a reference portable skin ultrasound imager. The investigator and two additional independent experts performed the evaluation. The detectability of skin structures was over 90% for the epidermis, the dermis and the lesions. The morphological and echogenicity information observed for the different skin lesions were found consistent with those of the reference ultrasound device and relevant ultrasound images in the literature. The presented device was able to obtain simultaneous in-vivo optical and ultrasound images of various skin lesions. This has the potential for further investigations, including the preoperative planning of skin cancer treatment.


2022 ◽  
Vol 12 ◽  
Author(s):  
Chunshan Wang ◽  
Ji Zhou ◽  
Yan Zhang ◽  
Huarui Wu ◽  
Chunjiang Zhao ◽  
...  

The disease image recognition models based on deep learning have achieved relative success under limited and restricted conditions, but such models are generally subjected to the shortcoming of weak robustness. The model accuracy would decrease obviously when recognizing disease images with complex backgrounds under field conditions. Moreover, most of the models based on deep learning only involve characterization learning on visual information in the image form, while the expression of other modal information rather than the image form is often ignored. The present study targeted the main invasive diseases in tomato and cucumber as the research object. Firstly, in response to the problem of weak robustness, a feature decomposition and recombination method was proposed to allow the model to learn image features at different granularities so as to accurately recognize different test images. Secondly, by extracting the disease feature words from the disease text description information composed of continuous vectors and recombining them into the disease graph structure text, the graph convolutional neural network (GCN) was then applied for feature learning. Finally, a vegetable disease recognition model based on the fusion of images and graph structure text was constructed. The results show that the recognition accuracy, precision, sensitivity, and specificity of the proposed model were 97.62, 92.81, 98.54, and 93.57%, respectively. This study improved the model robustness to a certain extent, and provides ideas and references for the research on the fusion method of image information and graph structure information in disease recognition.


PLoS ONE ◽  
2022 ◽  
Vol 17 (1) ◽  
pp. e0261659
Author(s):  
Friska Natalia ◽  
Julio Christian Young ◽  
Nunik Afriliana ◽  
Hira Meidia ◽  
Reyhan Eddy Yunus ◽  
...  

Abnormalities and defects that can cause lumbar spinal stenosis often occur in the Intervertebral Disc (IVD) of the patient’s lumbar spine. Their automatic detection and classification require an application of an image analysis algorithm on suitable images, such as mid-sagittal images or traverse mid-height intervertebral disc slices, as inputs. Hence the process of selecting and separating these images from other medical images in the patient’s set of scans is necessary. However, the technological progress in making this process automated is still lagging behind other areas in medical image classification research. In this paper, we report the result of our investigation on the suitability and performance of different approaches of machine learning to automatically select the best traverse plane that cuts closest to the half-height of an IVD from a database of lumbar spine MRI images. This study considers images features extracted using eleven different pre-trained Deep Convolution Neural Network (DCNN) models. We investigate the effectiveness of three dimensionality-reduction techniques and three feature-selection techniques on the classification performance. We also investigate the performance of five different Machine Learning (ML) algorithms and three Fully Connected (FC) neural network learning optimizers which are used to train an image classifier with hyperparameter optimization using a wide range of hyperparameter options and values. The different combinations of methods are tested on a publicly available lumbar spine MRI dataset consisting of MRI studies of 515 patients with symptomatic back pain. Our experiment shows that applying the Support Vector Machine algorithm with a short Gaussian kernel on full-length image features extracted using a pre-trained DenseNet201 model is the best approach to use. This approach gives the minimum per-class classification performance of around 0.88 when measured using the precision and recall metrics. The median performance measured using the precision metric ranges from 0.95 to 0.99 whereas that using the recall metric ranges from 0.93 to 1.0. When only considering the L3/L4, L4/L5, and L5/S1 classes, the minimum F1-Scores range between 0.93 to 0.95, whereas the median F1-Scores range between 0.97 to 0.99.


2022 ◽  
Author(s):  
Yujia Peng ◽  
Joseph M Burling ◽  
Greta K Todorova ◽  
Catherine Neary ◽  
Frank E Pollick ◽  
...  

When viewing the actions of others, we not only see patterns of body movements, but we also "see" the intentions and social relations of people, enabling us to understand the surrounding social environment. Previous research has shown that experienced forensic examiners, Closed Circuit Television (CCTV) operators, convey superior performance in identifying and predicting hostile intentions from surveillance footages than novices. However, it remains largely unknown what visual content CCTV operators actively attend to when viewing surveillance footage, and whether CCTV operators develop different strategies for active information seeking from what novices do. In this study, we conducted computational analysis for the gaze-centered stimuli captured by experienced CCTV operators and novices' eye movements when they viewed the same surveillance footage. These analyses examined how low-level visual features and object-level semantic features contribute to attentive gaze patterns associated with the two groups of participants. Low-level image features were extracted by a visual saliency model, whereas object-level semantic features were extracted by a deep convolutional neural network (DCNN), AlexNet, from gaze-centered regions. We found that visual regions attended by CCTV operators versus by novices can be reliably classified by patterns of saliency features and DCNN features. Additionally, CCTV operators showed greater inter-subject correlation in attending to saliency features and DCNN features than did novices. These results suggest that the looking behavior of CCTV operators differs from novices by actively attending to different patterns of saliency and semantic features in both low-level and high-level visual processing. Expertise in selectively attending to informative features at different levels of visual hierarchy may play an important role in facilitating the efficient detection of social relationships between agents and the prediction of harmful intentions.


2022 ◽  
Vol 12 (2) ◽  
pp. 680
Author(s):  
Yanchi Li ◽  
Guanyu Chen ◽  
Xiang Li

The automated recognition of optical chemical structures, with the help of machine learning, could speed up research and development efforts. However, historical sources often have some level of image corruption, which reduces the performance to near zero. To solve this downside, we need a dependable algorithmic program to help chemists to further expand their research. This paper reports the results of research conducted for the Bristol-Myers Squibb-Molecular Translation competition, which was held on Kaggle and which invited participants to convert old chemical images to their underlying chemical structures, annotated as InChI text; we define this work as molecular translation. We proposed a model based on a transformer, which can be utilized in molecular translation. To better capture the details of the chemical structure, the image features we want to extract need to be accurate at the pixel level. TNT is one of the existing transformer models that can meet this requirement. This model was originally used for image classification, and is essentially a transformer-encoder, which cannot be utilized for generation tasks. On the other hand, we believe that TNT cannot integrate the local information of images well, so we improve the core module of TNT—TNT block—and propose a novel module—Deep TNT block—by stacking the module to form an encoder structure, and then use the vanilla transformer-decoder as a decoder, forming a chemical formula generation model based on the encoder–decoder structure. Since molecular translation is an image-captioning task, we named it the Image Captioning Model based on Deep TNT (ICMDT). A comparison with different models shows that our model has benefits in each convergence speed and final description accuracy. We have designed a complete process in the model inference and fusion phase to further enhance the final results.


2022 ◽  
Vol 12 ◽  
Author(s):  
Jing Zhou ◽  
Eduardo Beche ◽  
Caio Canella Vieira ◽  
Dennis Yungbluth ◽  
Jianfeng Zhou ◽  
...  

The efficiency of crop breeding programs is evaluated by the genetic gain of a primary trait of interest, e.g., yield, achieved in 1 year through artificial selection of advanced breeding materials. Conventional breeding programs select superior genotypes using the primary trait (yield) based on combine harvesters, which is labor-intensive and often unfeasible for single-row progeny trials (PTs) due to their large population, complex genetic behavior, and high genotype-environment interaction. The goal of this study was to investigate the performance of selecting superior soybean breeding lines using image-based secondary traits by comparing them with the selection of breeders. A total of 11,473 progeny rows (PT) were planted in 2018, of which 1,773 genotypes were selected for the preliminary yield trial (PYT) in 2019, and 238 genotypes advanced for the advanced yield trial (AYT) in 2020. Six agronomic traits were manually measured in both PYT and AYT trials. A UAV-based multispectral imaging system was used to collect aerial images at 30 m above ground every 2 weeks over the growing seasons. A group of image features was extracted to develop the secondary crop traits for selection. Results show that the soybean seed yield of the selected genotypes by breeders was significantly higher than that of the non-selected ones in both yield trials, indicating the superiority of the breeder's selection for advancing soybean yield. A least absolute shrinkage and selection operator model was used to select soybean lines with image features and identified 71 and 76% of the selection of breeders for the PT and PYT. The model-based selections had a significantly higher average yield than the selection of a breeder. The soybean yield selected by the model in PT and PYT was 4 and 5% higher than those selected by breeders, which indicates that the UAV-based high-throughput phenotyping system is promising in selecting high-yield soybean genotypes.


2022 ◽  
Vol 2022 ◽  
pp. 1-9
Author(s):  
Hui Li

Multilevel image edge repair results directly affect the follow-up image quality evaluation and recognition. Current edge detection algorithms have the problem of unclear edge detection. In order to detect more accurate edge contour information, a multilevel image edge detection algorithm based on visual perception is proposed. Firstly, the digital image is processed by double filtering and fuzzy threshold segmentation; Through the analysis of the contour features of the moving image, the threshold of the moving image features is set, and the latest membership function is obtained to complete the multithreshold optimization. Adaptive smoothing is used to process the contour of the object in the moving image, and the geometric center values of the two adjacent contour points within the contour range are calculated. According to the calculation results, the curvature angle is further calculated, and the curvature symbol is obtained. According to the curvature symbol, the contour features of the moving image are detected. The experimental results show that the proposed algorithm can effectively and accurately detect the edge contour of the image and shorten the reconstruction time, and the detection image resolution is high.


2022 ◽  
Vol 2022 ◽  
pp. 1-12
Author(s):  
Xiuye Yin ◽  
Liyong Chen

In view of the complexity of the multimodal environment and the existing shallow network structure that cannot achieve high-precision image and text retrieval, a cross-modal image and text retrieval method combining efficient feature extraction and interactive learning convolutional autoencoder (CAE) is proposed. First, the residual network convolution kernel is improved by incorporating two-dimensional principal component analysis (2DPCA) to extract image features and extracting text features through long short-term memory (LSTM) and word vectors to efficiently extract graphic features. Then, based on interactive learning CAE, cross-modal retrieval of images and text is realized. Among them, the image and text features are respectively input to the two input terminals of the dual-modal CAE, and the image-text relationship model is obtained through the interactive learning of the middle layer to realize the image-text retrieval. Finally, based on Flickr30K, MSCOCO, and Pascal VOC 2007 datasets, the proposed method is experimentally demonstrated. The results show that the proposed method can complete accurate image retrieval and text retrieval. Moreover, the mean average precision (MAP) has reached more than 0.3, the area of precision-recall rate (PR) curves are better than other comparison methods, and they are applicable.


Sign in / Sign up

Export Citation Format

Share Document