A Weighted Deep Representation Learning Model for Imbalanced Fault Diagnosis in Cyber-Physical Systems
:1. Introduction
- A novel deep learning framework is proposed to learn both internal and external features of high-level patterns in an end-to-end way.
- Data-level sampling policies and weighted loss function are integrated to the deep learning model to optimize the imbalanced fault classification.
- The model is evaluated on a real-life datasets and proves its feasibility and effectiveness.
2. Problem Statement
2.1. Deep Learning and Class-Imbalance Learning
2.1.1. CNN, LSTM and DeepConvLSTM
2.1.2. Class-Imbalance Learning
2.2. Problem Formulation
- Deep learning model is suitable to automatically extract features from raw multi-channel sensor data, with both spatial and temporal features. Subsequences of original time series signals represent high-level patterns of the observed object, while the whole time series represent its temporal evolutions, so new deep learning models proposed in this paper should focus on capturing the spatial and temporal features inside a subsequence, as well as temporal features between subsequences. In our model, the sliding window of fixed length l with a step of is proposed to segment the original signal into subsequences;
- The data-level method is necessary to balance the faulty subsequences and normal subsequences with under-sampling and over-sampling policies. Sampling is the most straightforward method, which makes the imbalanced samples relatively balanced before training by a classifier. Thus, considered faulty samples are at a high imbalance ratio, and an under-sampling preprocess can be used to decrease the imbalance ratio before training the classifier in this paper;
- The algorithm-level method is necessary to optimize the baseline classifier to better adjust distributions of imbalanced faulty classes. Sampling methods sometimes lead to a distortion of feature distribution for both majority and minority classes. Thus, a weighted cost-sensitive methods can be used in this paper. Let be used to denote the weights on misclassification cost of classifying an instance belonging to a class i into a different class j. Given an input instance x and the weight matrix , the classifier seeks to minimize the expected loss function as Equation (7) shows, where i is the class prediction made by the classifier:
3. System Model
3.1. Pipeline Overview
3.2. Weighted Long-Term Recurrent Convolutional LSTM Network
4. Experiments and Results
4.1. Data Preparation and Experiment Settings
- time series of sensor measurements and control reference signals for each of a number of control components of the plant (e.g., six components);
- time series data representing additional measurements of a fixed number of plant zones over the same period of time (e.g., three zones), where a zone may cover one or more plant components;
- plant fault events, each characterized by a start time, an end time, and a failure code.
- XGBoost (abbreviated as XGB): It uses the entire dataset ( and ) to train an ensemble classifier. The number of iterations is 5000.
- EasyEnsemble+SMOTE+XGBoost (abbreviated as Easy-SMT): Number of subsets , for each subsets , we generate using SMOTE, a set of synthetic minority class examples with = − . Then, XGBoost is used to train a classifier using and . The number of iteration is 5000.
- CNN-D: four CNN layers with random under-sampling.
- DeepConvLSTM-D: four CNN layers stacked with two LSTM layers, as well as random under-sampling policy.
- LRCL-O: two CNN layers and two inner LSTM layers stacked with two outer LSTM layers, as well as SMOTE-based over-sampling policy.
- LRCL-W (abbreviated as wLRCL): two CNN layers and two inner LSTM layers stacked with two outer LSTM layers, as well as weight-based cost-sensitive policy.
- LRCL-D: two CNN layers and two inner LSTM layers stacked with two outer LSTM layers, as well as random under-sampling policy.
- LRCL-D-W (abbreviated as wLRCL-D): two CNN layers and two inner LSTM layers stacked with two outer LSTM layers, as well as random under-sampling and weight-based cost-sensitive policies.
4.2. Results and Evaluations
5. Conclusions
Author Contributions
Conflicts of Interest
- Hu, C.; Youn, B.D.; Wang, P.; Yoon, J.T. Ensemble of data-driven prognostic algorithms for robust prediction of remaining useful life. Reliab. Eng. Syst. Saf. 2012, 103, 120–135. [Google Scholar] [CrossRef]
- Chawla, N.V.; Japkowicz, N.; Kolcz, A. Editorial: Special issue on learning from imbalanced data sets. ACM SIGKDD Explor. Newslett. 2004, 6, 1–6. [Google Scholar] [CrossRef]
- Wu, Y.; Jiang, B.; Lu, N.Y.; Zhou, Y. Bayesian network based fault prognosis via Bond graph modeling of high-speed railway traction device. Math. Probl. Eng. 2015. [Google Scholar] [CrossRef]
- Korbicz, J.; Koscielny, J.M.; Kowalczuk, Z.; Cholewa, W. Fault Diagnosis: Models, Artificial Intelligence, Applications; Springer Science and Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
- Yin, S.; Ding, S.X.; Haghani, A.; Hao, H.; Zhang, P. A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark tennessee eastman process. J. Process Control 2012, 22, 1567–1581. [Google Scholar] [CrossRef]
- Wang, L.; Yu, J. Fault feature selection based on modified binary pso with mutation and its application in chemical process fault diagnosis. In Advances in Natural Computation; Springer: Berlin/Heidelberg, Germany, 2005; pp. 832–840. [Google Scholar]
- Lee, S.; Park, W.; Jung, S. Fault detection of aircraft system with random forest algorithm and similarity measure. Sci. World J. 2014, 2014, 727359. [Google Scholar] [CrossRef] [PubMed]
- LeCun, Y.; Bengio, Y. Chapter Convolutional Networks for Images, Speech, and Time Series. In The Handbook of Brain Theory and Neural Networks; MIT Press: Cambridge, MA, USA, 1998; pp. 255–258. [Google Scholar]
- Lee, H.; Grosse, R.; Ranganath, R.; Ng, A.Y. Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML), Montreal, QC, Canada, 14–18 June 2009; pp. 609–616. [Google Scholar]
- Dahl, G.E.; Yu, D.; Deng, L.; Acero, A. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE TASLP 2012, 20, 30–42. [Google Scholar] [CrossRef]
- Abdel-Hamid, O.; Mohamed, A.; Jiang, H.; Penn, G. Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition [C]//Acoustics. In Proceedings of the 2012 IEEE International Conference on Speech and Signal Processing (ICASSP), Kyoto, Japan, 25–30 March 2012; pp. 4277–4280. [Google Scholar]
- Graves, A.; Mohamed, A.R.; Hinton, G. Speech recognition with deep recurrent neural networks. In Proceedings of the 38th International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, USA, 26–31 May 2013; pp. 6645–6649. [Google Scholar]
- Malhotra, P.; Vig, L.; Shroff, G.; Agarwal, P. Long short term memory networks for anomaly detection in time series. In Proceedings of the 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium, 22–24 April 2015. [Google Scholar]
- Sainath, T.; Vinyals, O.; Senior, A.; Sak, H. Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks. In Proceedings of the 40th International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, 19–24 April 2015; pp. 4580–4584. [Google Scholar]
- Drummond, C.; Holte, R.C. C4.5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. In Proceedings of the Working Notes ICML Workshop Learning Imbalanced Data Sets, Washington, DC, USA, 21 July 2003. [Google Scholar]
- Weiss, G.M. Mining with rarity: A unifying framework. ACM SIGKDD Explor. Newslett. 2004, 6, 7–19. [Google Scholar] [CrossRef]
- Zhou, Z.-H.; Liu, X.-Y. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 2006, 18, 63–77. [Google Scholar] [CrossRef]
- Demidova, L.; Klyueva, I. SVM classification: Optimization with the SMOTE algorithm for the class imbalance problem. In Proceedings of the 2017 6th Mediterranean Conference on Embedded Computing (MECO), Bar, Montenegro, 11–15 June 2017; pp. 1–4. [Google Scholar]
- Chan, P.K.; Stolfo, S.J. Toward scalable learning with non-uniform class and cost distributions: A case study in credit card fraud detection. In Proceedings of the 4th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 27–31 August 1998; pp. 164–168. [Google Scholar]
- Liu, X.-Y.; Wu, J.; Zhou, Z.-H. Exploratory Undersampling for Class-Imbalance Learning. IEEE Syst. Man Cybern. Soc. 2009, 39, 539–550. [Google Scholar]
- He, H.; Garcia, E. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar]
- Pigou, L.; Oord, A.V.D.; Dieleman, S.; van Herreweghe, M.; Dambre, J. Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video. arXiv, 2015; arXiv:1506.01911. [Google Scholar]
- Zeng, M.; Nguyen, L.T.; Yu, B.; Mengshoel, O.J.; Zhu, J.; Wu, P.; Zhang, J. Convolutional Neural Networks for human activity recognition using mobile sensors. In Proceedings of the 6th IEEE International Conference on Mobile Computing, Applications and Services (MobiCASE), Austin, TX, USA, 6–7 November 2014; pp. 197–205. [Google Scholar]
- Ordonez, F.J.; Roggen, D. Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition. Sensors 2016, 16, 115. [Google Scholar] [CrossRef] [PubMed]
- Galar, M.; Fernandez, A.; Barrenechea, E.; Bustince, H.; Herrera, F. A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Syst. 2012, 42, 463–484. [Google Scholar] [CrossRef]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar]
- Japkowicz, N. The class imbalance problem: Significance and strategies. In Proceedings of the 2000 International Conference on Artificial Intelligence, Halifax, NS, Canada, June 2000; pp. 111–117. [Google Scholar]
- Mani, I.; Zhang, I. KNN approach to unbalanced data distributions: A case study involving information extraction. In Proceedings of the ICML 2003 Workshop on Learning from Imbalanced Datasets II, Washington, DC, USA, 21 August 2003. [Google Scholar]
- Han, H.; Wang, W.-Y.; Mao, B.-H. Borderline-smote: A new over-sampling method in imbalanced data sets learning. Adv. Intell. Comput. 2005, 3644, 878–887. [Google Scholar]
- Xie, J.; Qiu, Z. The effect of imbalanced data sets on lda: A theoretical and empirical analysis. Pattern Recognit. 2007, 40, 557–562. [Google Scholar] [CrossRef]
- Estabrooks, A.; Jo, T.; Japkowicz, N. A multiple resampling method for learning from imbalanced data sets. Comput. Intell. 2004, 20, 18–36. [Google Scholar] [CrossRef]
- Barandela, R.; Valdovinos, R.; Sanchez, J.; Ferri, F. The imbalanced training sample problem: under or over sampling. Struct. Syntactic Stat. Pattern Recognit. 2004, 3138, 806–814. [Google Scholar]
- Seiffert, C.; Khoshgoftaar, T.; van Hulse, J. Improving software-quality predictions with data sampling and boosting. IEEE Trans. Syst. Man Cybern. Syst. Hum. 2009, 39, 1283–1294. [Google Scholar] [CrossRef]
- Chawla, N.; Lazarevic, A.; Hall, L.; Bowyer, K. Smoteboost: Improving prediction of the minority class in boosting. Know. Discov. Databases PKDD 2003, 2838, 107–119. [Google Scholar]
- Seiffert, C.; Khoshgoftaar, T.; van Hulse, J.; Napolitano, A. Rusboost: A hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. A Syst. Hum. 2010, 40, 185–197. [Google Scholar] [CrossRef]
- Schapire, R.E. A brief introduction to boosting. In Proceedings of the 16th international joint conference on Artificial intelligence, Stockholm, Sweden, 31 July–6 August 1999; pp. 1401–1406. [Google Scholar]
- Kukar, M.; Kononenko, I. Cost-sensitive learning with neural networks. In Proceedings of the 13th European Conference on Artificial Intelligence, Bruges, Belgium, 24–26 April 2013; pp. 445–449. [Google Scholar]
- Chung, Y.-A.; Lin, H.-T.; Yang, S.-W. Cost-Aware Pre-Training for Multiclass Cost-Sensitive Deep Learning. Available online: https://arxiv.org/abs/1511.09337 (accessed on 27 November 2015).
- Wang, S.; Liu, W.; Wu, J.; Cao, L.; Meng, Q.; Kennedy, P.J. Training deep neural networks on imbalanced data sets. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; pp. 436–4374. [Google Scholar]
- Raj, V.; Magg, S.; Wermter, S. Towards effective classification of imbalanced data with convolutional neural networks. In IAPR Workshop on Artificial Neural Networks in Pattern Recognition (IAPR); Springer International Publishing: Cham, Switzerland, 2016; pp. 150–162. [Google Scholar]
- Rosca, J. PHM15 Challenge Competition and Data Set: Fault Prognostics. NASA Ames Prognostics Data Repository; 2015. Available online: http://ti.arc.nasa.gov/project/prognostic-data-repository (accessed on 10 October 2017).
Event Type | PF1 | PF2 | PF3 | PF4 | PF5 | PF6 | PN |
Ratio | 4.83% | 3.75% | 3.39% | 0.06% | 0.65% | 19.26% | 68.06% |
Suffix | Description | Parameters |
-W | Cost-sensitive Weight Method | —– |
-O | SMOTE-based Method | All faulty classes are over-sampled to 5000 |
-D | Random Under-sampling Method | Normal class is under-sampled to 10,000 |
Method | Precision | Recall | F1 |
XGBoost | 53.57% | 59.81% | 56.02% |
LRCL | 51.92% | 64.31% | 55.36% |
wLRCL | 66.80% | 75.04% | 69.87% |
CNN-D | 80.86% | 80.88% | 80.81% |
Easy-SMT | 84.19% | 84.38% | 84.0% |
LRCL-O | 84.94% | 89.23% | 86.95% |
DeepConvLSTM-D | 88.48% | 88.37% | 88.40% |
LRCL-D | 95.51% | 97.30% | 97.29% |
wLRCL-D | 98.42% | 98.46% | 98.46% |
Window_length | 24 | 48 | 100 |
F1(wLRCL) | 56.66% | 99.51% | 54.42% |
Recall(wLRCL) | 64.23% | 99.51% | 61.38% |
F1(wLRCL-D) | 98.24% | 98.40% | 98.75% |
Recall(wLRCL-D) | 98.24% | 98.40% | 98.75% |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, Z.; Guo, Y.; Lin, W.; Yu, S.; Ji, Y. A Weighted Deep Representation Learning Model for Imbalanced Fault Diagnosis in Cyber-Physical Systems. Sensors 2018, 18, 1096. https://doi.org/10.3390/s18041096
Wu Z, Guo Y, Lin W, Yu S, Ji Y. A Weighted Deep Representation Learning Model for Imbalanced Fault Diagnosis in Cyber-Physical Systems. Sensors. 2018; 18(4):1096. https://doi.org/10.3390/s18041096
Chicago/Turabian StyleWu, Zhenyu, Yang Guo, Wenfang Lin, Shuyang Yu, and Yang Ji. 2018. "A Weighted Deep Representation Learning Model for Imbalanced Fault Diagnosis in Cyber-Physical Systems" Sensors 18, no. 4: 1096. https://doi.org/10.3390/s18041096
APA StyleWu, Z., Guo, Y., Lin, W., Yu, S., & Ji, Y. (2018). A Weighted Deep Representation Learning Model for Imbalanced Fault Diagnosis in Cyber-Physical Systems. Sensors, 18(4), 1096. https://doi.org/10.3390/s18041096