Rapid bacterial identification through volatile organic compound analysis and deep learning

Yan, Bowen; Zeng, Lin; Lu, Yanyi; Li, Min; Lu, Weiping; Zhou, Bangfu; He, Qinghua

doi:10.1186/s12859-024-05967-4

Research
Open access
Published: 06 November 2024

Rapid bacterial identification through volatile organic compound analysis and deep learning

Bowen Yan¹,
Lin Zeng¹,
Yanyi Lu¹,
Min Li²,
Weiping Lu²,
Bangfu Zhou¹ &
…
Qinghua He¹

BMC Bioinformatics volume 25, Article number: 347 (2024) Cite this article

1022 Accesses
Metrics details

Abstract

Background

The increasing antimicrobial resistance caused by the improper use of antibiotics poses a significant challenge to humanity. Rapid and accurate identification of microbial species in clinical settings is crucial for precise medication and reducing the development of antimicrobial resistance. This study aimed to explore a method for automatic identification of bacteria using Volatile Organic Compounds (VOCs) analysis and deep learning algorithms.

Results

AlexNet, where augmentation is applied, produces the best results. The average accuracy rate for single bacterial culture classification reached 99.24% using cross-validation, and the accuracy rates for identifying the three bacteria in randomly mixed cultures were SA:98.6%, EC:98.58% and PA:98.99%, respectively.

Conclusion

This work provides a new approach to quickly identify bacterial microorganisms. Using this method can automatically identify bacteria in GC-IMS detection results, helping clinical doctors quickly detect bacterial species, accurately prescribe medication, thereby controlling epidemics, and minimizing the negative impact of bacterial resistance on society.

Peer Review reports

Introduction

Bacterial infections can lead to various illnesses, such as common skin infections, respiratory tract infections, and wound infections, which may result in severe conditions such as chronic obstructive pulmonary disease and sepsis [1, 2]. In addition, bacterial infections may have a chance to cause cancer [3]. Recently, COVID-19 patients have had a chance of getting bacterial co-infections [4]. In clinical practice, the difficulty in classifying bacterial species often leads to the misuse of antibiotics, which increases the risk of bacterial resistance. According to the study conducted by CJL Murray et al., it is estimated that approximately 4.95 million people died from bacterial Antimicrobial Resistance (AMR) in 2019 [5]. Bacterial classification can not only be applied to clinical settings, but also be used in various fields, such as biodiversity conservation, biotechnology and bioengineering, as well as biological research.

In clinical practice, traditional laboratory phenotype analysis remains the main method for bacterial detection, which usually takes more than 48 h [6]. These methods involve culturing bacterial samples on agar plates followed by biochemical tests to identify specific metabolic characteristics of the bacteria. While effective, traditional techniques suffer from prolonged turnaround times, delaying crucial treatment decisions.

In recent years, significant advancements have been made in bacterial identification methodologies. These methods primarily include Molecular biology methods, Spectroscopic techniques, and mass spectrometry.

Molecular biology methods, which mainly use bacterial genomic information, such as 16 S rRNA, for bacterial classification through gene sequencing. This method involves Polymerase Chain Reaction (PCR) amplification of the 16 S rRNA gene, followed by sequencing and comparison with a known database for identification. PCR-based methods are not only faster than traditional culture-based methods but also helpful in identifying bacteria that are difficult to grow under laboratory conditions [7,8,9].

Spectroscopic techniques have also been widely used in bacterial classification, including infrared spectroscopy [10], Raman spectroscopy [11], and nuclear magnetic resonance [12]. Spectroscopy is a very simple and non-destructive accurate method where the spectrum changes based on the different molecular compositions of the sample, allowing for microbial detection.

Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry (MALDI-TOF-MS) technology is a widely used clinical technology in recent years, with many laboratories adopting it as the main microbial detection method. This method primarily relies on mass spectrometry to rapidly and efficiently analyze the biological molecule characteristics generated by bacterial strains [13, 14]. Although the above methods have to some extent solved the problem of long detection times in traditional methods, they still face issues such as complex operations and high costs.

With the development of deep learning algorithms, new opportunities have emerged for bacterial classification. In recent years, deep learning algorithms have achieved great success in image recognition, natural language processing, and other fields [15], and have begun to be applied in the field of bioinformatics [16,17,18,19,20,21,22]. Deep learning algorithms can extract features from bacterial images, whole genome sequencing data, or 16 S rRNA gene sequences for bacterial classification [23]. Deep learning algorithms have the advantages of high accuracy and speed, and can automatically learn the features and classify of bacteria from large amounts of data.

Ho et al [24] pioneered a method fusing Raman spectroscopy with deep learning for swift pathogenic bacteria identification. They utilized a convolutional neural network (CNN) on an extensive dataset of bacterial Raman spectra, achieving remarkable accuracy, even in low signal-to-noise scenarios. Their model distinguished 30 common bacterial pathogens with over 82% accuracy, and determined correct antibiotic treatment in around 97% of cases. Notably, it accurately differentiated between methicillin-resistant and susceptible Staphylococcus aureus with approximately 89% precision. Clinical validation on isolates from 50 patients demonstrated treatment identification accuracies of 99.7%, offering a promising avenue for culture-free pathogen detection and antibiotic susceptibility testing, potentially enhancing diagnostic speed and patient care.

Fiannaca et al. (2018) presented a deep learning approach for the taxonomic classification of bacteria from 16 S rRNA gene sequences, leveraging both Convolutional Neural Networks (CNN) and Deep Belief Networks (DBN). The researchers developed a pipeline that includes the simulation of 16 S rRNA gene short-reads, a k-mer representation for numerical vector mapping of sequences, and the training of deep learning models for each bacterial taxon. This approach aims to improve the classification of bacteria in metagenomic data, potentially benefiting both shotgun and amplicon sequencing technologies [23].

Christmann et al. (2024) utilized Escherichia coli, Saccharomyces cerevisiae, Levilactobacillus brevis, and Pseudomonas fluorescens as exemplary model organisms for cultivation. The headspace from these cultures was then analyzed using GC-IMS. The study also included an examination of mixed cultures, incorporating every possible pairing of the microorganisms. Through multivariate data analysis of the GC-IMS data, it was demonstrated that differentiation among the microorganisms could be achieved using Partial Least Squares Discriminant Analysis (PLS-DA), achieving a notable prediction accuracy of 92% [25]

Tasnim Ahmed et al. (2019) developed an AI system to classify bacteria images by combining a Deep Convolutional Neural Network (DCNN) with Support Vector Machine (SVM), achieving an accuracy level of around 96% [26]

He Zhu et al.( 2023) reported a hyperspectral transmission microscopic imaging system for rapid detection of pathogenic bacteria, which can achieve high classification accuracy of 93.6% using a simple PCA-SVM method [27].

Table 1 summary of relative work

Full size table

In recent years, there has been a notable increase in the application of deep learning models for the classification of bacterial species. However, these studies often face limitations due to the scarcity of available training data and the lack of high-quality bacterial image datasets. Additionally, many previous methodologies have overlooked essential techniques like data augmentation and cross-validation, which could significantly improve the models’ performance. Table 1 provides a comprehensive summary of the methodologies discussed in the existing literature.

The gap in those previous researches that motivated our study is the need for more efficient bacterial detection methods and the development of robust and accurate classification models. To address this gap, we proposed a model based on a deep learning convolutional neural network (CNN) for the classification of bacteria using the chromatographic peaks generated by GC-IMS. This innovative approach has the potential to improve microbial diagnostics by using the high sensitivity and specificity of GC-IMS for detecting volatile organic compounds associated with artificial intelligence. By integrating these capabilities with the powerful pattern recognition and learning abilities of CNNs, this model aims to enhance the accuracy, speed, and efficiency of bacterial identification processes. This could profoundly impact clinical diagnostics, environmental monitoring, and food safety, where rapid and reliable bacterial classification is essential.

Method

The methodology presented in this paper is articulated across three distinct phases: data preparation, image processing, and the training of deep learning models, as illustrated in Fig. 1.

Data sources

In this study, three standard bacterial strains were used: Escherichia coli (ATCC25922, EC), Pseudomonas aeruginosa (ATCC25923, SA), and Staphylococcus aureus (ATCC27853, PA). The standard strains were obtained from the Laboratory at Chongqing Daping Hospital. After obtaining the standard bacterial strains on culture dishes, they were used to prepare culture medium using Thioglycolate (TH) broth. TH medium, which was produced by Chongqing Pangtong Medical Instrument Co., Ltd, contains L-cystine, sodium chloride, glucose, yeast extract, casein, trypsin digest and sodium mercaptoglycolate.

The original bacterial culture solution was properly diluted, applied to the blood agar plate, and cultured for 12−18 h at 37 $^\circ$C, and then one bacterial colony of each bacterium was picked and cultured in Thioglycolate (TH) medium for 12–15 h at 37 $^\circ$C without agitation before sampling. Two methods were used for sample preparation in this study: one was to culture the bacteria in a 5 mL tube first, and then take 1 mL for detection; the other was to directly culture the bacteria in 1.5 mL of TH broth and then detect them. No distinction was made between the two types of samples prepared in this work.

Each sample was measured once, and the gas chromatogram of each sample was obtained. The samples were divided into single-strain and mixed-strain samples, and the number of samples is shown in Table 2.

Table 2 The number of bacteria strains

Full size table

GC-IMS detect

When repaired the bacteria samples, we use the GC-IMS to detect the samples. The GC-IMS detection technology has the following advantages:

High Resolution Separation: GC-IMS is capable of separating complex mixtures of VOCs based on their chemical properties. The gas chromatography (GC) part separates the compounds based on their affinity for the stationary phase in the chromatography column, while the ion mobility spectrometry (IMS) part further separates them based on the drift time of ions in an electric field.
Sensitivity and Specificity: GC-IMS offers high sensitivity for detecting low concentrations of VOCs, which is crucial for identifying bacteria that emit these compounds at trace levels. It also provides a high degree of specificity, meaning it can distinguish between very similar compounds.
Rapid Analysis: GC-IMS systems can provide rapid analysis times, which is beneficial for applications requiring quick results, such as clinical diagnostics or food safety testing.
Non-destructive Sampling: The technique allows for the analysis of samples without destroying them, which is important when dealing with valuable or limited sample materials.

The list of VOCs that GC-IMS can detect are showing in the Table. 3.

In GC-IMS, samples are first separated by a gas chromatography column to isolate volatile organic compounds. These compounds are then ionized by an ionization source and migrated in an electric field to produce a series of ion clouds. The arrival time and intensity of these ion clouds are recorded and converted into a two-dimensional image called an ion mobility spectrum. In the ion mobility spectrum, color represents the intensity of ions, the x-axis represents the ion migration time, and the y-axis represents the retention time. Ion migration time refers to the time required for ions to pass through the migration tube after entering from the ion source. Different ion molecules have different migration rates in the ion separation column, so they arrive at the ion detector at different times. By measuring the time it takes for ions to move from the migration tube to the ion detector, the ion migration time can be obtained.

Table 3 summary of relative work

Full size table

Figure 2 illustrates typical spectra of four different sample types. It can be observed from the figure that all four samples exhibit identifiable visual differences in characteristic points. Previous work in the project has demonstrated the ability to manually select characteristic points of different samples’ differential compounds [30]. This study aims to address automatic classification of spectra without manually selecting characteristic points using deep learning methods.

The GC-IMS system used in this paper included a core component (G.A.S, Dortmund, Germany) quipped with a wide bore GC column (mxt-5 15 m $\times$ 0.53mm $\times$ 1um, RESTEK, USA), an automatic sampler G.A.S, Dortmund, Germany) that integrated incubating, shaking, and heating functions for easier mVOCs sampling, and a nitrogen generator (G.A.S, Dortmund, Germany) to provide carrier gas. Experimental parameters used are shown in Table 4.

Table 4 Experimental parameters

Full size table

Previous research has shown that the color range of the main feature points is concentrated between 0.15 and 0.51. Therefore, the chromatogram is preprocessed by setting the color range of the image to between 0.15 and 0.51, resulting in a black background image. This approach makes the background of the chromatogram cleaner and the feature points more prominent, improving the signal-to-noise ratio. The processed spectrum is then exported as a PNG format image. Finally, because the input of the AlexNet model should be a 227*227*3 pixels image, the exported image is cropped and resampled to uniformly adjust the size of the chromatogram to 227*227*3 and the format to PNG.

Dataset Split

This work involves two parts. The first part is the classification of single-strains. The data is classified into four categories based on the species: thioglycolate broth (TH), Escherichia coli, Staphylococcus aureus, and Pseudomonas aeruginosa. The second part involves mixed bacterial strains. The goal is to detect whether a certain species of bacteria is present in the culture medium. The data is divided into three groups based on the species, and each group is divided into two categories, with and without bacteria. The detail datasets showing in the Fig. 3.

Escherichia coli (EC) group:

Categories with E. coli: EC, EC+PA, EC+SA, EC+PA+SA, Original Sample Size: 446 Categories without E. coli: TH, PA, SA, PA+SA, Original Sample Size: 555

Pseudomonas aeruginosa (PA) group:

Categories with P. aeruginosa: PA, EC+PA, PA+SA, EC+PA+SA, Original Sample Size: 429 Categories without P. aeruginosa: TH, EC, SA, EC+SA, Original Sample Size: 573

Staphylococcus aureus (SA) group:

Categories with S. aureus: SA, EC+SA, PA+SA, EC+PA+SA, Original Sample Size: 436 Categories without S. aureus: TH, EC, PA, EC+PA, Original Sample Size: 566

Data augmentation

To address the issue of overfitting that can arise from a limited dataset and to improve the model’s generalization capabilities, this study implemented data augmentation via random cropping. The center region of each original image was preserved and cropped into a 450x450 pixels image. Then, 10 random crops of size 224x224 were generated from each image. This approach increased the sample size of each dataset by a factor of 10. By using data augmentation, better results were obtained than by training only on the original data. The specific data numbers are provided in Table 5.

Table 5 The numbers of data in each dataset

Full size table

Deep learning algorithm

The choice of deep learning algorithms for bacterial classification stems from their ability to learn feature representations from complex data and classify them without explicit rules or guidance. Traditional classification methods may face challenges with the complex spectra data generated by GC-IMS. Deep learning algorithms can automatically extract features that help distinguish different bacterial categories by training on large amounts of data, thereby achieving efficient classification. Moreover, deep learning algorithms have advantages such as strong adaptability and ability to handle non-linear relationships, making them a suitable choice for bacterial classification using GC-IMS data.

Therefore, the combination of GC-IMS technology and deep learning algorithms can effectively address the challenges faced in bacterial classification and provide a feasible approach for rapid and accurate identification of different bacterial species.

Due to the small size of the dataset, this study uses a pre-trained Alexnet for transfer learning. First, the pre-trained Alexnet model is loaded, and the structure of the model is fine-tuned. The early convolutional layers identify prominent features of the images, such as spots, edges, and colors, while the later layers focus on more specific features to differentiate between different categories.

The AlexNet model is a convolutional neural network (CNN), significantly advanced the field of deep learning. Designed specifically for high-performance image recognition tasks, AlexNet comprises five convolutional layers followed by three fully connected layers. The architecture starts with the first convolutional layer that employs 96 kernels or filters, capturing basic image features. Subsequent convolutional layers progressively increase the number of filters, peaking at 256, to extract increasingly complex features from the images. To reduce the computational load while preserving essential features, max-pooling layers are strategically placed between convolutional layers, effectively reducing spatial dimensions. The network introduces non-linearity by using the Rectified Linear Unit (ReLU) as its activation function, enabling faster and more efficient training compared to networks utilizing traditional tanh or sigmoid functions. To combat overfitting, dropout layers are incorporated before the fully connected layers, randomly deactivating neurons during the training process. The final layer of AlexNet uses a softmax activation function, classifying inputs into one of 1000 categories, making it exceptionally suited for large-scale visual recognition challenges. AlexNet’s effectiveness was unequivocally demonstrated during the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC) competition, where it achieved groundbreaking results by substantially lowering the error rate compared to prior models and approaches.The AlexNet architecture is shown in Fig. 1C.

Besides AlexNte model, we also used other CNN models for comparison, which are GoogLeNet and ResNet-50. GoogLeNet, is a convolutional neural network that was developed by Google and introduced in 2014. It was the winner of the ILSVRC (ImageNet Large Scale Visual Recognition Challenge) competition in 2014. The key feature of GoogLeNet is its inception module, which allows the network to perform convolutions of different sizes (1x1, 3x3, and 5x5) simultaneously within the same layer. This helps the network to capture features at different scales and improves its performance.

ResNet-50 is a Convolutional Neural Network (CNN) architecture that is part of the ResNet (Residual Networks) family. This collection of models was designed to overcome the difficulties typically encountered when training deep neural networks. Developed by researchers from Microsoft Research Asia, ResNet-50 is particularly recognized for its depth and effectiveness in image classification tasks. The ResNet architectures are available in several depths, including ResNet-18, ResNet-32, among others, with ResNet-50 being an intermediate-sized version. Despite its introduction in 2015, ResNet-50 continues to be a significant model in the realm of image classification.

MobileNet-V2 introduces a new structural design where the depthwise separable convolutions are inverted. This means that the first convolution in the depthwise separable layer has a smaller number of channels compared to the second pointwise convolution, which is the opposite of the traditional approach.MobileNet-V2 is a lightweight deep neural network architecture designed for mobile and embedded vision applications. It was introduced by Andrew G. Howard et al. in the paper “MobileNetV2: Inverted Residuals and Linear Bottlenecks” [31]. Due to its lightweight nature, MobileNet-V2 is ideal for applications where computational resources are limited, such as on mobile devices, embedded systems, and IoT devices.

Inception-V3 is a deep convolutional neural network architecture developed by Google researchers and introduced in the paper “Rethinking the Inception Architecture for Computer Vision” by Christian Szegedy et al [32]. It is the successor to the Inception-V1 and Inception-V2 models. Inception-V3 is designed to improve upon the previous models by incorporating several enhancements and innovations.The core building block of Inception-V3 is the Inception module, which allows the network to perform convolutions of different sizes on the same input. This enables the model to learn more complex features and capture information at different scales. The contrast of parameters and layers for three models is showing in Table 6.

Table 6 The description of parameters and Layers for various CNN models

Full size table

In this study, we use the principle of knowledge transfer to retrain the network for bacterial classification recognition, replacing the new fully connected layer to output 4 categories and 2 categories respectively, to adapt to the two classification problems in this project. Finally, hyper parameter settings are made, with the same parameters used for both parts of the works: a learning rate of 0.0001, 15 training epochs, a minimum batch size of 100, and a validation frequency of once every 60 iterations. This study uses a 10-fold cross-validation method, randomly dividing the dataset into 10 parts and using 9 parts for training and 1 part for validation. This process is repeated 10 times, and the average of the 10 training results is used as the final result.

Evaluating indicator

This study chooses 4 indicators used for evaluate the model performance.Respectively,they are accuracy,precision,recall and $\mathrm {F1_{score}}$.

$$\begin{aligned} ACC=\frac{True_Positive+True_Negative}{True_Positive+True_Negative+False_Positive+False_Negative} \end{aligned}$$

(1)

$$\begin{aligned} Precision=\frac{True_Positive}{True_Positive+False_Positive} \end{aligned}$$

(2)

$$\begin{aligned} Recall=\frac{True_Positive}{True_Positve+False_Positive} \end{aligned}$$

(3)

$$\begin{aligned} F1_{score}=2\times \frac{Presision\times Recall}{Presision+Recall} \end{aligned}$$

(4)

Accuracy: Accuracy is a fundamental metric used to evaluate the performance of a CNN. It represents the proportion of correctly classified samples over the total number of samples in the dataset. A higher accuracy indicates that the CNN is making more accurate predictions.

Precision and Recall: Precision and recall are indicators used in binary classification problems or multi-class problems with imbalanced classes. Precision measures the proportion of correctly predicted positive instances out of all instances predicted as positive. Recall, on the other hand, measures the proportion of correctly predicted positive instances out of all actual positive instances. These indicators provide insights into the CNN’s ability to correctly identify positive instances and avoid false positives.

$\mathrm {F1_{score}}$: The $\mathrm {F1_{score}}$ is a metric that combines precision and recall into a single value. It is particularly useful when dealing with imbalanced datasets, where accuracy alone might be misleading. The $\mathrm {F1_{score}}$ is the harmonic mean of precision and recall, providing a balanced measure of a CNN’s performance.

Results

This study used a GC-IMS system to collect 3650 gas chromatography-ion mobility spectrometry images of bacterial solutions and sterile culture media, and constructed automatic classification and recognition systems using the AlexNet deep learning model with transfer learning.

Single-strains classification

In the task of classifying single-strain bacteria, after 10-fold cross-validation, the average accuracy using the original dataset was 97.8142%, which increased to 99.24% after data augmentation using AlexNet. The training accuracy for all cases is illustrated in Table 6. Figure 4A presents a typical training process in the single-strain classification task after data augmentation. From the graph, it is evident that the entire training process exhibits a rapid convergence rate. After 2 epochs of training, the accuracy had already exceeded 90%. After 15 epochs of training, an accuracy of 99.07% was achieved. Moreover, there is no apparent overfitting observed from the graph. Further analysis revealed high precision, recall, and F1 scores for each bacterial solution classification.

Precision measures the proportion of true positive predictions among all positive predictions made by the model. In the classification task with 10-fold cross-validation, the precision was overall stable, with the lowest value being 0.98. Recall, also known as sensitivity or true positive rate, measures the proportion of true positive predictions among all actual positive instances in the dataset. In this work, the Recall values were all above 0.97, with only one value falling slightly below 0.97, reaching a minimum of 0.96. The F1 score is the harmonic mean of precision and recall, providing a balanced measure of a model’s performance. The F1 score reaches its best value at 1 (perfect precision and recall) and worst at 0 (no precision or recall). A high F1 score indicates that the model has both high precision and recall, striking a good balance between them. Details about the evaluating indicator are shown in Fig. 4C.

Our model demonstrates strong adaptability after training. On our single-strain dataset, our model achieved a mean accuracy of 99.24% in 10-fold validation. Through cross-validation on unseen data, we observed the model’s excellent generalization ability and stable performance (Table 7).

Mixed-strains identification

In the task of identifying mixed-strain bacteria, the recognition rates for mixed strains of SA, EC, and PA trained using the original dataset were 96.12%, 95.7%, and 97.08%, respectively. These rates increased to 98.6%, 98.58%, and 98.99% after using the augmented dataset. The entire training process exhibited a fast convergence rate with minimal overfitting, achieving a final recognition rate of 97.80% and demonstrating good model adaptability. Detailed training results are shown in Table 6. The results for precision, recall, F1 score, and the ROC curve are consistent with those in the single-strain classification task.

Table 7 Accuracy of the all training

Full size table

Figure 4C shows the four evaluation indicators of the AlexNet model in the four solution classification tasks, with an overall score exceeding 0.96. From the results above, we can conclude that training on the dataset collected by GC-IMS gas chromatography on bacteria using a transfer learning algorithm based on AlexNet can achieve ideal models not only for the single-strain classification task of individually cultured bacteria but also for the recognition of mixed-strain bacteria.

Moreover, ROC curve and AUC analysis were conducted to evaluate the classification system’s performance across various classification thresholds. The area under the ROC curve (AUC) was calculated and is shown in Fig. 4C, indicating that the classifier exhibits excellent classification performance. The ROC curve’s proximity to the upper-left corner demonstrates that the classifier has excellent effectiveness and overall performance.

Different modles comparison

To validate the performance of different models on our dataset, we selected five deep learning models and trained them on the same dataset. These models are AlexNet, GoogleNet, ResNet-50, MobileNet-v2, and Inception-v3. We included all images, comprising three types of bacteria and blank control broth tubes, ensuring that the number of bacterial species was consistent across all models, thereby eliminating any potential biases caused by variations in class distribution. To ensure a fair evaluation, we divided the dataset into a 70:20:10 split, allocating 70% of the data for training, 20% for validation, and reserving the remaining 10% for testing. Additionally, we fine-tuned the comparison models, utilizing 15 epochs, a batch size of 100, and a learning rate of 0.0001, introducing only minor modifications to the fully connected layers or dropout layers as required by the models. By following this approach, our goal was to provide a comprehensive and fair comparison between our ensemble model and other classifiers, thereby enabling a more accurate assessment of their respective performances.

Table 8 Performance Comparison of Different Models on Bacterial Classification

Full size table

From Table 8, the following conclusions can be drawn regarding the performance of different deep learning models (AlexNet, GoogleNet, ResNet-50, MobileNet-v2, Inception-v3) on bacterial species classification tasks. Here are some key observations:

(a)
AlexNet is the most consistently high-performing model across all classification tasks, with the highest test and validation accuracies and the lowest validation losses. Inception-v3 is a strong performer but is slightly outperformed by AlexNet in most tasks. GoogleNet and ResNet-50 show lower accuracies and higher validation losses, indicating they may not be as effective for bacterial species classification. MobileNet-v2 performs reasonably well but does not lead in any category, indicating it is a balanced but not top-performing model.
(b)
Data augmentation is also crucial in improving the performance of this method. Table. 7 shows the accuracy of the ALEX model after performing 10-fold cross-validation on this dataset. Comparing the performance before and after data augmentation, the accuracy has increased by approximately 2% to 3%.
(c)
All models achieved an accuracy of over 96% on the augmented single bacteria classification dataset. In the task of single bacteria recognition, except for the recognition of EC strains, all models also achieved an accuracy of over 96%. Even for the EC recognition task, which had the lowest accuracy, the models still achieved over 94%. This demonstrates the high recognizability of the bacterial dataset collected using GC-IMS.

Furthermore, we investigated the performance of proposed model on test data-set.Based on the confusion matrices shown in Fig 5,We can draw the following conclusions:

(a)
AlexNet demonstrates exceptional performance across different classification tasks with overall accuracies of 99.07% for single bacteria classification, 97.50% for EC classification, 97.31% for PA classification, and 99.00% for SA classification.
(b)
The confusion matrix in panel A shows the performance of the model in classifying four classes: BrothTube, EC, PA, and SA. The overall accuracy is 99.07%.Broth Tube got the perfectly classified (100% accuracy).EC got almost perfectly classified with a minor error (99.4% accuracy).PA got high accuracy (97.5%) with a few misclassifications as SA.SA got almost perfectly classified with minor errors (99.4% accuracy).The model performs exceptionally well in classifying single bacteria with very high accuracy across all classes.
(c)
AlexNet shows consistent high performance and reliability in classifying various bacteria types, with minimal errors and high accuracy, making it a highly effective model for bacterial species classification.

Discussion

According to a report published in Lancet in 2022, antibiotic resistance resulted in the deaths of 1.27 million patients in 2019, with infections caused by antibiotic-resistant bacteria leading to about 4.95 million deaths [5, 33]. Bacterial resistance is one of the three primary causes of human mortality in modern society, making accurate testing and the proper use of antibiotics crucial for reducing resistance [34]. Inaccurate bacterial identification can have significant implications in various fields, including healthcare, food safety, and environmental monitoring. In healthcare settings, misidentification of bacterial pathogens can lead to incorrect treatment decisions, delayed treatment initiation, or the administration of ineffective antibiotics, which may exacerbate patient conditions and contribute to the development of antimicrobial resistance [1, 33]. In food safety, misidentification of pathogenic bacteria can result in contaminated food products reaching consumers, leading to foodborne illnesses and potential outbreaks. In environmental monitoring, misidentification of bacteria can skew assessments of ecosystem health, water quality, and bioremediation efforts [35]. Overall, inaccurate bacterial identification can compromise public health, food safety, and environmental integrity, highlighting the importance of reliable and precise identification methods.

In the past, Gas chromatography coupled with mass spectrometry (GC-MS) has been the gold standard for VOC detection [36]. Its large databases for the identification of substances and the capability of separating and accurately identifying compounds are noteworthy. Although GC-MS offers unparalleled separation capabilities and structural identification for complex sample analysis, GC-IMS is undoubtedly a strong alternative when considering factors such as portability, cost-effectiveness, ease of operation, and environmental friendliness [37]. This is particularly true in scenarios that require rapid analysis and on-site detection, where GC-IMS demonstrates its unique advantages. However, the GC-IMS database is not yet as complete as that of GC-MS, limiting its ability to accurately identify all compounds [30]. Due to this limitation, it is difficult to identify differential compounds among bacteria, making manual identification of different types of bacteria challenging. In such cases, using deep learning algorithm models can accurately identify bacteria without focusing on a specific compound.

With the advancement of computer technology and the generation of large volumes of data in medical research, the application of deep learning in smart healthcare is becoming increasingly sophisticated. Convolutional neural networks (CNNs) are one of the typical deep learning network models [38]. Compared to traditional machine learning methods like support vector machines (SVMs) [39], CNNs can automatically extract relevant features by layering. In traditional machine learning methods, researchers spend significant time and effort selecting and extracting specific image features, particularly in areas where features are not very clear [40]. However, the high accuracy and strong generalization ability of a CNN classification model rely on iterating and adjusting parameters with large amounts of image data during training [41]. In practice, clinical image data are often limited, and batch image annotation and screening require substantial manpower, resources, and time. Hence, the key challenge is how to train on small amounts of labeled data and establish a reliable model for target task classification and prediction. Transfer learning can address this problem [42], which is a machine learning method that transfers knowledge learned in other fields to different but related target fields to solve new problems. Transfer learning has produced promising results in computer vision [43] and drug discovery [44]. The basis for achieving transfer is that the two fields must share common elements. Zhou successfully applied a transfer learning model based on VGG-16 to diagnose thyroid nodule ultrasounds with high accuracy [45]. In a study on intelligent recognition of glaucoma optic neuropathy [46], the network model that used transfer learning showed better initial performance, converged faster, and had better predictive performance than the original model. However, it is worth noting that not all transfers can produce ideal results; the fewer common elements between two fields, the more challenging the transfer, and a reverse transfer effect may occur [47].

In this study, we employed GC-IMS images of standard bacterial strains and developed a deep learning method to construct a bacterial identification model. The model’s AUC for detecting the three types of bacteria was 0.99, indicating that the proposed method is effective in identifying bacteria and demonstrates the feasibility of using GC-IMS for rapid bacterial identification.

Comparison with previous work

We compared our work with previous studies related to bacterial strain classification, as shown in Table 9. The comparison was made based on the types of bacteria used, data collection methods, classification model methods, image numbers, and classification accuracies.Based on Tables 9 and 1, we can draw the following conclusions:

1.
Most methods utilize deep learning algorithms to classify bacterial data, while a few choose traditional machine learning algorithms and statistical analysis.
2.
Besides our work, there is another study that uses GC-IMS as the data collection method for bacterial classification, but it does not employ automatic algorithms for classification.
3.
Only a small number of studies use data augmentation and cross-validation to optimize their models.
4.
The proposed method achieves a higher classification accuracy (99.22%) compared to other studies.

Table 9 Comparison of different approaches based on bacterial type, data sources, method, sample numbers, and accuracy

Full size table

Limitations

This study has several limitations. Firstly, the samples used were laboratory-prepared standard bacterial strains, which contained volatile organic compound (VOC) substances that were purer than those typically found in clinical samples. These clinical samples often contain a mixture of VOC substances. Secondly, the study only included three common bacterial strains, a number that is fewer than in other similar works. For instance, in the work of CS Ho et al., Raman spectroscopy was used to classify 30 bacterial strains with an overall recognition rate of 89.1±0.1% [24]. Lastly, while GC-IMS offers advantages such as portability and low cost, its database is not yet comprehensive, limiting its ability to accurately identify all compounds. In such cases, deep learning algorithm models can be employed to accurately identify the bacteria of interest without the need to focus on specific compounds. However, the interpretability limitations of deep learning algorithms mean that it is not possible to precisely locate differential compounds between different bacteria. Consequently, there is a scarcity of publicly available datasets for validating the performance of classification models. This highlights the need to establish a clinical sample and GC-IMS image data database in future work.

Conclusion

This study explored a method for the automatic identification of bacteria using a GC-IMS device and deep learning algorithms, marking several key contributions to the field:

Innovative Integration: The study demonstrates an innovative integration of GC-IMS technology with deep learning algorithms for microbial diagnostics. This approach surpasses traditional methods by offering faster and more accurate identification processes.

High Accuracy and Reliability: The method can quickly and accurately identify multiple bacteria in both single and mixed cultures, achieving high accuracy and reliability. This level of precision is crucial for applications in clinical diagnosis, where accurate identification can significantly impact patient outcomes.

Significance for Clinical Practice: The significance of this method for clinical diagnosis, antibiotic treatment, and the control of bacterial resistance cannot be overstated. By providing rapid and precise identification, it enables more targeted antibiotic therapy, contributing to the fight against antibiotic resistance.

Foundation for Future Research: The study sets a solid foundation for future research, suggesting pathways for optimizing the algorithm and model to improve identification speed and accuracy further. It also opens up possibilities for exploring broader application areas, extending the method’s benefits beyond its current scope.

These key contributions underscore the study’s potential to revolutionize bacterial identification practices, offering a more efficient, accurate, and reliable method that could significantly enhance clinical diagnostics and the management of bacterial infections. Future research directions, as suggested, will aim to build on these achievements, optimizing and expanding the method’s applicability to meet the evolving needs of healthcare and research.

Building on the foundation laid by this study, future research aims to explore and expand the database and model generalization, develop the explainability and interpretability of the algorithms, and push the boundaries of accuracy, efficiency, and applicability in both medical diagnostics and broader scientific inquiry.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Abbreviations

VOCs:: Volatile organic compounds
GC-IMS:: Gas chromatography-Ion mobility spectrometry
EC:: Escherichia coli
PA:: Pseudomonas aeruginosa
SA:: Staphylococcus aureus
AMR:: Antimicrobial resistance
PCR:: Polymerase chain reaction
MALDI-TOF-MS:: Matrix-assisted laser desorption/Ionization time-of-flight mass spectrometry
CNN:: Convolutional neural network
DBN:: Deep belief networks
PLS-DA:: Partial least squares discriminant analysis
DCNN:: Deep convolutional neural network
SVM:: Support vector machine
ILSVRC:: Large scale visual recognition challenge

References

Hay RJ, Morris-Jones R. Bacterial Infections. In: Griffiths CEM, Barker J, Bleiker T, Chalmers R, Creamer D, editors. Rook’s textbook of dermatology, Ninth Edition. 1st ed. Wiley;. p. 1–100. Available from: https://onlinelibrary.wiley.com/doi/10.1002/9781118441213.rtd0026.
Fisher RA, Gollan B, Helaine S. Persistent bacterial infections and persister cells. Nat Rev Microbiol. 2017;15(8):453–64. https://doi.org/10.1038/nrmicro.2017.42.
Article CAS PubMed Google Scholar
Van Elsland D, Neefjes J. Bacterial infections and cancer. EMBO Reports. 2018;19(11): e46632. https://doi.org/10.15252/embr.201846632.
Article CAS PubMed PubMed Central Google Scholar
Langford BJ, So M, Raybardhan S, Leung V, Westwood D, MacFadden DR, et al. Bacterial co-infection and secondary infection in patients with COVID-19: a living rapid review and meta-analysis. Clin Microbiol Infect. 2020;26(12):1622–9. https://doi.org/10.1016/j.cmi.2020.07.016.
Article CAS PubMed PubMed Central Google Scholar
Murray CJL, Ikuta KS, Sharara F, Swetschinski L, Robles Aguilar G, Gray A. Global burden of bacterial antimicrobial resistance in, et al. a systematic analysis. Lancet. 2022;399(10325):629–55. https://doi.org/10.1016/S0140-6736(21)02724-0.
Article CAS Google Scholar
Bochner BR. Global phenotypic characterization of bacteria. FEMS Microbiol Rev. 2009;33(1):191–205. https://doi.org/10.1111/j.1574-6976.2008.00149.x.
Article CAS PubMed Google Scholar
Srinivasan R, Karaoz U, Volegova M, MacKichan J, Kato-Maeda M, Miller S, et al. Use of 16S rRNA gene for identification of a broad range of clinically relevant bacterial pathogens. PLOS ONE. 2015;10(2): e0117617. https://doi.org/10.1371/journal.pone.0117617.
Article CAS PubMed PubMed Central Google Scholar
Janda JM, Abbott SL. 16S rRNA gene sequencing for bacterial identification in the diagnostic laboratory: pluses, perils, and pitfalls. J Clin Microbiol. 2007;45(9):2761–4. https://doi.org/10.1128/JCM.01228-07.
Article CAS PubMed PubMed Central Google Scholar
Kai S, Matsuo Y, Nakagawa S, Kryukov K, Matsukawa S, Tanaka H, et al. Rapid bacterial identification by direct PCR amplification of 16S rRNA genes using the ${\rm MinION}^{{\rm TM}}$ nanopore sequencer. FEBS Open Bio. 2019;9(3):548–57. https://doi.org/10.1002/2211-5463.12590.
Article CAS PubMed PubMed Central Google Scholar
Wenning M, Scherer S. Identification of microorganisms by FTIR spectroscopy: perspectives and limitations of the method. Appl Microbiol Biotechnol. 2013;97(16):7111–20. https://doi.org/10.1007/s00253-013-5087-3.
Article CAS PubMed Google Scholar
Stöckel S, Kirchhoff J, Neugebauer U, Rösch P, Popp J. The application of Raman spectroscopy for the detection and identification of microorganisms. J Raman Spectrosc. 2016;47(1):89–109. https://doi.org/10.1002/jrs.4844.
Article CAS Google Scholar
Pan Z, Raftery D. Comparing and combining NMR spectroscopy and mass spectrometry in metabolomics. Anal Bioanal Chem. 2007;387(2):525–7. https://doi.org/10.1007/s00216-006-0687-8.
Article CAS PubMed Google Scholar
Hsieh SY, Tseng CL, Lee YS, Kuo AJ, Sun CF, Lin YH, et al. Highly efficient classification and identification of human pathogenic bacteria by MALDI-TOF MS. Mol Cell Proteom. 2008;7(2):448–56. https://doi.org/10.1074/mcp.M700339-MCP200.
Article CAS Google Scholar
Jang KS, Kim YH. Rapid and robust MALDI-TOF MS techniques for microbial identification: a brief overview of their diverse applications. J Microbiol. 2018;56(4):209–16. https://doi.org/10.1007/s12275-018-7457-0.
Article CAS PubMed Google Scholar
Deng L. Deep learning: methods and applications. Found Trends Signal Process. 2014;7(3):197–387. https://doi.org/10.1561/2000000039.
Article Google Scholar
Ahmad F, Farooq A, Ghani Khan MU, Shabbir MZ, Rabbani M, Hussain I. Identification of most relevant features for classification of Francisella Tularensis using machine learning. Curr Bioinform. 2021;15(10):1197–212. https://doi.org/10.2174/1574893615666200219113900.
Article CAS Google Scholar
Ahmad F, Farooq A, Khan MUG. Deep learning model for pathogen classification using feature fusion and data augmentation. Curr Bioinform. 2021;16(3):466–83. https://doi.org/10.2174/1574893615999200707143535.
Article CAS Google Scholar
Alsabban WH, Ahmad F, Al-Laith A, Kabrah SM, Boghdadi MA, Masud F. Deep dense model for classification of Covid-19 in X-ray images. Int J Comput Sci Netw Secur. 2022;22(1):429–42.
Google Scholar
Ahmad F, Khan MUG, Javed K. Deep learning model for distinguishing novel coronavirus from other chest related infections in X-ray images. Comput Biol Med. 2021;134: 104401.
Article CAS PubMed PubMed Central Google Scholar
Ahmad F, Ghani Khan MU, Tahir A, Tipu MY, Rabbani M, Shabbir MZ. Two phase feature-ranking for new soil dataset for Coxiella burnetii persistence and classification using machine learning models. Sci Reports. 2023;13(1):29.
CAS Google Scholar
Ahmad F, Khan MUG, Tahir A, Masud F. Deep ensemble approach for pathogen classification in large-scale images using patch-based training and hyper-parameter optimization. BMC Bioinform. 2023;24(1):273.
Article Google Scholar
Ahmad F, Javed K, Tahir A, Khan MUG, Abbas M, Rabbani M, et al. Identifying key soil characteristics for Francisella tularensis classification with optimized Machine learning models. Sci Reports. 2024;14(1):1743.
CAS Google Scholar
Fiannaca A, La Paglia L, La Rosa M, Lo Bosco G, Renda G, Rizzo R, et al. Deep learning models for bacteria taxonomic classification of metagenomic data. BMC Bioinform. 2018;19:198. https://doi.org/10.1186/s12859-018-2182-6.
Article CAS Google Scholar
Ho CS, Jean N, Hogan CA, Blackmon L, Jeffrey SS, Holodniy M, et al. Rapid identification of pathogenic bacteria using Raman spectroscopy and deep learning. Nat Commun. 2019;10(1):4927. https://doi.org/10.1038/s41467-019-12898-9.
Article CAS PubMed PubMed Central Google Scholar
Christmann J, Weber M, Rohn S, Weller P. Nontargeted volatile metabolite screening and microbial contamination detection in fermentation processes by headspace GC-IMS. Anal Chem. 2024;96(9):3794–801. https://doi.org/10.1021/acs.analchem.3c04857.
Article CAS PubMed Google Scholar
Ahmed T, Wahid MF, Hasan MJ. Combining deep convolutional neural network with support vector machine to classify microscopic bacteria images. In: 2019 International conference on electrical, computer and communication engineering (ECCE). Cox’sBazar, Bangladesh: IEEE; 2019. p. 1–5. Available from: https://ieeexplore.ieee.org/document/8679397/.
Zhu H, Luo J, Liao J, He S. High-accuracy rapid identification and classification of mixed bacteria using hyperspectral transmission microscopic imaging and machine learning. Progr Electromagn Res. 2023;178:49–62. https://doi.org/10.2528/PIER23082303.
Article CAS Google Scholar
Arora M, Zambrzycki SC, Levy JM, Esper A, Frediani JK, Quave CL, et al. Machine learning approaches to identify discriminative signatures of volatile organic compounds (VOCs) from bacteria and fungi using SPME-DART-MS. Metabolites. 2022;12(3):232. https://doi.org/10.3390/metabo12030232.
Article CAS PubMed PubMed Central Google Scholar
Beccaria M, Franchina FA, Nasir M, Mellors T, Hill JE, Purcaro G. Investigating bacterial volatilome for the classification and identification of mycobacterial species by HS-SPME-GC-MS and machine learning. Molecules. 2021;26(15):4600. https://doi.org/10.3390/molecules26154600.
Article CAS PubMed PubMed Central Google Scholar
Lu Y, Zeng L, Li M, Yan B, Gao D, Zhou B, et al. Use of GC-IMS for detection of volatile organic compounds to identify mixed bacterial culture medium. AMB Express. 2022;12(1):31. https://doi.org/10.1186/s13568-022-01367-0.
Article CAS PubMed PubMed Central Google Scholar
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC. Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2018. p. 4510–4520.
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 2818–2826.
World Health Organization. The evolving threat of antimicrobial resistance : options for action;. Place: Geneva Publisher: World Health Organization Section: ix, 119 p. Available from: https://iris.who.int/handle/10665/44812.
Maugeri G, Lychko I, Sobral R, Roque ACA. Identification and antibiotic-susceptibility profiling of infectious bacterial agents: a review of current and future trends. Biotechnol J. 2019;14(1):1700750. https://doi.org/10.1002/biot.201700750.
Article CAS Google Scholar
Wang S, Chen H, Sun B. Recent progress in food flavor analysis using gas chromatography-ion mobility spectrometry (GC-IMS). Food Chem. 2020;315: 126158. https://doi.org/10.1016/j.foodchem.2019.126158.
Article CAS PubMed Google Scholar
Tait E, Perry JD, Stanforth SP, Dean JR. Identification of Volatile Organic Compounds Produced by Bacteria Using HS-SPME-GC-MS. Journal of Chromatographic Science. 2014;52(4):363–73. https://doi.org/10.1093/chromsci/bmt042.
Article CAS PubMed Google Scholar
Faridha Begum I, Mohankumar R, Jeevan M, Ramani K. GC-MS analysis of bio-active molecules derived from Paracoccus pantotrophus FMR19 and the antimicrobial activity against bacterial pathogens and MDROs. Indian J Microbiol. 2016;56(4):426–32. https://doi.org/10.1007/s12088-016-0609-1.
Article CAS PubMed PubMed Central Google Scholar
Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, et al. Recent advances in convolutional neural networks. Pattern Recognit. 2018;77:354–77. https://doi.org/10.1016/j.patcog.2017.10.013.
Article Google Scholar
Sarker IH. Machine learning: algorithms, real-world applications and research directions. SN Comput Sci. 2021;2(3):160. https://doi.org/10.1007/s42979-021-00592-x.
Article PubMed PubMed Central Google Scholar
Lin Y, Lv F, Zhu S, Yang M, Cour T, Yu K, et al. Large-scale image classification: Fast feature extraction and SVM training. In: CVPR 2011. IEEE;. p. 1689–1696. Available from: http://ieeexplore.ieee.org/document/5995477/.
Albawi S, Mohammed TA, Al-Zawi S. Understanding of a convolutional neural network. In: 2017 International Conference on Engineering and Technology (ICET). IEEE;. p. 1–6. Available from: https://ieeexplore.ieee.org/document/8308186/.
Weiss K, Khoshgoftaar TM, Wang D. A survey of transfer learning. Journal of Big Data. 2016;3(1):9. https://doi.org/10.1186/s40537-016-0043-6.
Article Google Scholar
Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E. Deep learning for computer vision: a brief review. Comput Intell Neurosci. 2018. https://doi.org/10.1155/2018/7068349.
Article PubMed PubMed Central Google Scholar
Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T. The rise of deep learning in drug discovery. Drug Discov Today. 2018;23(6):1241–50. https://doi.org/10.1016/j.drudis.2018.01.039.
Article PubMed Google Scholar
Zhou H, Wang K, Tian J. Online transfer learning for differential diagnosis of benign and malignant thyroid nodules with ultrasound images. IEEE Trans Biomed Eng. 2020;67(10):2773–80. https://doi.org/10.1109/TBME.2020.2971065.
Article PubMed Google Scholar
Gómez-Valverde JJ, Antón A, Fatti G, Liefers B, Herranz A, Santos A, et al. Automatic glaucoma classification using color fundus images based on convolutional neural networks and transfer learning. Biomed Opt Express. 2019;10(2):892. https://doi.org/10.1364/BOE.10.000892.
Article PubMed PubMed Central Google Scholar
Zhuang F, Qi Z, Duan K, Xi D, Zhu Y, Zhu H, et al. A comprehensive survey on transfer learning. Proc IEEE. 2021;109(1):43–76. https://doi.org/10.1109/JPROC.2020.3004555.
Article Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

This work was supported by National International Science and Technology Cooperation Project of China [grant number: 2014DFA31560] and China-Canada Medical Intelligent E-nose Center of Chongqing [grant number:cstc2013gjhz10003] from Chongqing Municipal Science and Technology Bureau, and by Chongqing Municipal Health Commission(CN)[grant number: 2024QNXM021].

Author information

Authors and Affiliations

Research Department, Daping Hosipital, Army Medical University, Chongqing, 400042, China
Bowen Yan, Lin Zeng, Yanyi Lu, Bangfu Zhou & Qinghua He
Laboratory Department, Daping Hosipital, Army Medical University, Chongqing, 400042, China
Min Li & Weiping Lu

Authors

Bowen Yan
View author publications
You can also search for this author inPubMed Google Scholar
Lin Zeng
View author publications
You can also search for this author inPubMed Google Scholar
Yanyi Lu
View author publications
You can also search for this author inPubMed Google Scholar
Min Li
View author publications
You can also search for this author inPubMed Google Scholar
Weiping Lu
View author publications
You can also search for this author inPubMed Google Scholar
Bangfu Zhou
View author publications
You can also search for this author inPubMed Google Scholar
Qinghua He
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

BY wrote the initial draft of the manuscript. YL reviewed and edited the manuscript for intellectual content. ML and WL was responsible for collecting, organizing, and maintaining the integrity of the data used in the study. LZ and BZ participated in conducting the experiments and data collection. QH supervised the research process and provided guidance. All authors reviewed the manuscript.

Corresponding author

Correspondence to Qinghua He.

Ethics declarations

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Yan, B., Zeng, L., Lu, Y. et al. Rapid bacterial identification through volatile organic compound analysis and deep learning. BMC Bioinformatics 25, 347 (2024). https://doi.org/10.1186/s12859-024-05967-4

Download citation

Received: 26 February 2024
Accepted: 22 October 2024
Published: 06 November 2024
DOI: https://doi.org/10.1186/s12859-024-05967-4

Rapid bacterial identification through volatile organic compound analysis and deep learning

Abstract

Background

Results

Conclusion

Introduction

Method

Data sources

GC-IMS detect

Dataset Split

Data augmentation

Deep learning algorithm

Evaluating indicator

Results

Single-strains classification

Mixed-strains identification

Different modles comparison

Discussion

Comparison with previous work

Limitations

Conclusion

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us