Fault Detection for Wind Turbine Blade Bolts Based on GSG Combined with CS-LightGBM
Abstract
:1. Introduction
2. Materials and Methods
2.1. Wind Turbine Blade Bolts Failure Problem Description
2.2. GSG Oversampling Method
2.2.1. GMM
2.2.2. SMOTE
- Randomly select an initial sample in the minority class samples feature space as the nearest neighbor sample center Xcentre;
- According to the principle of K nearest neighbors, take Xcentre as the center to obtain k sample points, denoted as Xi (i = 1, 2, ..., k);
- Set a sampling strategy based on the total number of expanded samples;
- According to Formula (9), synthesize new samples Xinew.
2.2.3. GSG Sampling Method
- Perform data preprocessing. First, remove the missing values and abnormal values in the wind turbine blade bolts dataset, and then calculate the feature importance through XGboost and select the important features. Finally, divide the processed dataset is into a training set and validation set. In order to ensure the validity of the experiment, only the training set samples are oversampled. We denoted the training dataset of wind turbine blade bolts as Set(D).
- Set(D) contains two types of data samples. One is the normal data sample with a large number of samples, denoted as Set(max) = {X’1, X’2, …, X’M}; the other is the fault data sample with a small number of samples, denoted as Set(min) = {X1, X2, …, XN}.
- Based on the original dataset, calculate the class imbalance rate r (r = N/M), and generate a random floating point number b with an interval between (r, 1) as the sampling strategy. Then, obtain the number of samples to be expanded in Set(min), denoted as parameter a. (a = int (M*b − N), where M represents the number of samples of Set(max), and N represents the number of samples of Set(min)).
- Train Set(min) according to BIC to obtain an optimal number of components: C. Use GMM to cluster Set(min) into C clusters. We denoted each cluster as Gi (i = 1, 2, ..., C).
- Obtain Gi (i = 1, 2, ..., C) samples, and count the number of Gi samples, denoted as Num (Gi), which is used to determine the needed number of synthesized samples in the Gi cluster, denoted as int (a*Num (Gi)/N). This ensures that the sample density distribution of the original Set(min) tends to remain unchanged.
- Use SMOTE to synthesize new samples based on the Gi cluster samples to obtain a new sample set, denoted as Set (Xnew_k) (k = 1, 2, ..., a*Num (Gi)/N). Use GMM with the same initial center of the cluster to cluster the Set (Xnew_k) samples set and Gi cluster samples into C clusters again, denoting each cluster as G’i (i = 1, 2, ..., C).
- Create an empty dataset Set(G″i). If the Xnew_k sample is clustered into the G′i cluster, add the Xnew_k sample to the dataset Set(G″i), and calculate the number of Set(G″i) samples, denoted as Num (G″i). Otherwise, remove the Xnew_k sample. This ensures that the structural distribution characteristics of the original dataset Set(min) can be preserved to the greatest extent when synthesizing samples.
- If Num (G″i) < a*Num (Gi)/N, repeat steps (6) and (7); otherwise, repeat steps (5), (6) and (7), until each cluster is expanded.
- Concatenate the dataset Set(G″i) (i = 1, 2, ..., C), denoted as Set(N′).
- Concatenate Set(max), Set(min) and Set(N′). Then, return a training Set(D′) with more minority class samples than the original training Set(D).
Algorithm 1 GSG Pseudo-code |
Input: |
Training Set(D) = {Set(min), set(max)} |
# N is the number of fault class samples. |
# M is the number of normal class samples. |
Output: |
A training Set (D′) with more fault class samples than the original training Set(D). |
1: # Calculate the class imbalance rate of the original dataset. |
2: b = random(r,1) |
3: a = int(M*b-N) # Calculate the number of samples to be expanded in Set(min). |
4: C = BIC(Set(min)) # Determine an optimal number of components. |
5: Gi = GMM(Set(min), C) # Use GMM to cluster Set(min) into C clusters. |
6: for i ←1 to C do |
7: Set(G″i) = {} # Create C empty sample sets. |
8: if then |
9: |
# Use SMOTE to synthesize new samples based on the Gi cluster samples. |
10: G′I = GMM(Set(min) + Set(xnew_k), C) |
# Use GMM again to cluster Set(min) and Set(xnew_k) into C clusters. |
11: for k ←1 to int(a* num(Gi)/N) do |
12: if xnew_k in G′i then # The xnew_k is the k-th sample in Set(x_new). |
13: Add the xnew_k to Set(G″i) # Add the xnew_k sample to Set(G″i). |
14: if xnew_k not in G′i then |
15: Remove the xnew_k |
16: end if |
17: end if |
18: end for |
19: end if |
20: Set(N′) = concatenate (Set(G″i)) |
21: end for |
22: Set(D′) = concatenate (Set(max), Set(min), Set(N′)) |
23: return Set(D′) |
3. Wind Turbine Blade Bolt Fault Detection Model
3.1. Data Preprocessing
3.2. GSG Oversampling
3.3. CS-LightGBM Training and Evaluation Model
3.3.1. CS-LightGBM
3.3.2. Bayesian Optimization Algorithm
3.3.3. Model Evaluation Index
4. Results
4.1. Sampling Effect Verification Experiment
4.2. Wind Turbine Blade Bolt Fault Detection Experiment
4.2.1. Wind Turbine Blade Bolts Data Description
4.2.2. Experimental Results
5. Conclusions
- A new oversampling method GSG was proposed. GSG is based on the basic principle of GMM and SMOTE, which can retain the distribution characteristics of the original data to the greatest extent when the sample is expanded, avoiding the blindness of traditional SMOTE in performing oversampling. In addition, this method also effectively alleviated the influence of noise samples during oversampling, and effectively reduced the generation of overlapping samples.
- We combined GSG and CS-LightGBM for the fault detection of wind turbine blade bolts. The model starts from expanding the fault class sample dataset and introducing cost-sensitive learning methods to solve the problem of data classes unbalanced in wind turbine blade bolts. Specifically, we used the GSG proposed in this paper to expand the fault class samples in the wind turbine blade bolts training dataset to obtain a new training dataset. Then, we inputted the new training dataset to the LightGBM classifier introducing a cost-sensitive function for training, and the operational status of the wind turbine blade bolt was used as the output value.
- Both the proposed new sampling method and the fault diagnosis model were well experimentally verified. Analysis of the experimental results of GSG sampling on simulated datasets revealed that the sampling effect of GSG was better than that of traditional SMOTE, thus verifying the effectiveness of GSG. In addition, comparing the fault diagnosis model proposed in this paper with other models, the missing alarm rate and false alarm rate under the proposed model were lower, and the F1-score value was higher. Thus, the feasibility and superiority of the fault diagnosis model proposed in this paper were verified.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Song, D.R.; Li, Z.Q.; Wang, L.; Jin, F.J.; Huang, C.E.; Xia, E.; Rizk-Allah, R.M.; Yang, J.; Su, M.; Joo, Y.H. Energy capture efficiency enhancement of wind turbines via stochastic model predictive yaw control based on intelligent scenarios generation. Appl. Energy 2022, 312. [Google Scholar] [CrossRef]
- McLean, D.; Pope, K.; Duan, X. Start-up considerations for a small vertical-axis wind turbine for direct wind-to-heat applications. Energy Convers. Manag. 2022, 261, 115595. [Google Scholar] [CrossRef]
- Zhang, Y.; Gao, S.; Guo, L.H.; Qu, J.Y.; Wang, S.L. Ultimate tensile behavior of bolted T-stub connections with preload. J. Build. Eng. 2022, 47, 103833. [Google Scholar] [CrossRef]
- Teng, W.; Ding, X.; Tang, S.Y.; Xu, J.; Shi, B.S.; Liu, Y.B. Vibration Analysis for Fault Detection of Wind Turbine Drivetrains-A Comprehensive Investigation. Sensors. 2021, 21, 1686. [Google Scholar] [CrossRef]
- Ma, C.P.; Cao, L.Q.; Lin, Y.P. Energy Conserving Galerkin Finite Element Methods for the Maxwell-Klein-Gordon System. Siam J. Numer. Anal. 2020, 58, 1339–1366. [Google Scholar] [CrossRef]
- Liu, Y.L.; Wang, J.Y. Transfer learning based multi-layer extreme learning machine for probabilistic wind power forecasting. Appl. Energy 2022, 312, 118729. [Google Scholar] [CrossRef]
- Wang, F.R.; Chen, Z.; Song, G.B. Monitoring of multi-bolt connection looseness using entropy-based active sensing and genetic algorithm-based least square support vector machine. Mech. Syst. Signal Processing 2020, 136, 106507. [Google Scholar] [CrossRef]
- Park, J.H.; Huynh, T.C.; Choi, S.H.; Kim, J.T. Vision-based technique for bolt-loosening detection in wind turbine tower. Wind Struct. 2015, 21, 709–726. [Google Scholar] [CrossRef]
- Yang, X.Y.; Gao, Y.Q.; Fang, C.; Zheng, Y.; Wang, W. Deep learning-based bolt loosening detection for wind turbine towers. Struct. Control Health Monit. 2022, 29, e2943. [Google Scholar] [CrossRef]
- Zhang, C.S.; Bi, J.J.; Xu, S.X.; Ramentol, E.; Fan, G.J.; Qiao, B.J.; Fujita, H. Multi-Imbalance: An open-source software for multi-class imbalance learning. Knowl. Based Syst. 2019, 174, 137–143. [Google Scholar] [CrossRef]
- Wei, G.L.; Mu, W.M.; Song, Y.; Dou, J. An improved and random synthetic minority oversampling technique for imbalanced data. Knowl. Based Syst. 2022, 248, 108839. [Google Scholar] [CrossRef]
- Douzas, G.; Bacao, F. Geometric SMOTE a geometrically enhanced drop-in replacement for SMOTE. Inf. Sci. 2019, 501, 118–135. [Google Scholar] [CrossRef]
- Song, L.L.; Xu, Y.K.; Wang, M.H.; Leng, Y. PreCar_Deep: A deep learning framework for prediction of protein carbonylation sites based on Borderline-SMOTE strategy. Chemom. Intell. Lab. Syst. 2021, 218, 104428. [Google Scholar] [CrossRef]
- Liang, X.W.; Jiang, A.P.; Li, T.; Xue, Y.Y.; Wang, G.T. LR-SMOTE—An improved unbalanced data set oversampling based on K-means and SVM. Knowl. Based Syst. 2020, 196, 105845. [Google Scholar] [CrossRef]
- Li, K.; Ren, B.Y.; Guan, T.; Wang, J.J.; Yu, J.; Wang, K.X.; Huang, J.C. A hybrid cluster-borderline SMOTE method for imbalanced data of rock groutability classification. Bull. Eng. Geol. Environ. 2022, 81, 1–15. [Google Scholar] [CrossRef]
- Zhang, H.P.; Huang, L.L.; Wu, C.Q.; Li, Z.B. An effective convolutional neural network based on SMOTE and Gaussian mixture model for intrusion detection in imbalanced dataset. Comput. Netw. 2020, 177, 107315. [Google Scholar] [CrossRef]
- Yi, H.K.; Jiang, Q.C.; Yan, X.F.; Wang, B. Imbalanced Classification Based on Minority Clustering Synthetic Minority Oversampling Technique With Wind Turbine Fault Detection Application. Ieee Trans. Ind. Inform. 2021, 17, 5867–5875. [Google Scholar] [CrossRef]
- Tang, M.Z.; Zhao, Q.; Wu, H.W.; Wang, Z.M. Cost-Sensitive LightGBM-Based Online Fault Detection Method for Wind Turbine Gearboxes. Front. Energy Res. 2021, 9, 701574. [Google Scholar] [CrossRef]
- Pei, W.B.; Xue, B.; Shang, L.; Zhang, M.J. Developing Interval-Based Cost-Sensitive Classifiers by Genetic Programming for Binary High-Dimensional Unbalanced Classification. Ieee Comput. Intell. Mag. 2021, 16, 84–98. [Google Scholar] [CrossRef]
- Yan, J.; Xu, Y.T.; Cheng, Q.; Jiang, S.Q.; Wang, Q.; Xiao, Y.J.; Ma, C.; Yan, J.B.; Wang, X.F. LightGBM: Accelerated genomically designed crop breeding through ensemble learning. Genome Biol. 2021, 22, 1–24. [Google Scholar] [CrossRef]
- Tang, M.Z.; Zhao, Q.; Ding, S.X.; Wu, H.W.; Li, L.L.; Long, W.; Huang, B. An Improved LightGBM Algorithm for Online Fault Detection of Wind Turbine Gearboxes. Energies 2020, 13, 807. [Google Scholar] [CrossRef]
- Wang, L.S.; Jia, S.H.; Yan, X.H.; Ma, L.B.; Fang, J.L. A SCADA-Data-Driven Condition Monitoring Method of Wind Turbine Generators. IEEE Access 2022, 10, 67532–67540. [Google Scholar] [CrossRef]
- Song, D.R.; Tu, Y.P.; Wang, L.; Jin, F.J.; Li, Z.Q.; Huang, C.N.; Xia, E.; Rizk-Allah, R.M.; Yang, J.; Su, M.; et al. Coordinated optimization on energy capture and torque fluctuation of wind turbines via variable weight NMPC with fuzzy regulator. Appl. Energy 2022, 312, 118821. [Google Scholar] [CrossRef]
- Tang, M.Z.; Yi, J.B.A.; Wu, H.W.; Wang, Z.M. Fault Detection of Wind Turbine Electric Pitch System Based on IGWO-ERF. Sensors 2021, 21, 6215. [Google Scholar] [CrossRef]
- Gugliani, G.K.; Sarkar, A.; Ley, C.; Matsagar, V. Identification of optimum wind turbine parameters for varying wind climates using a novel month-based turbine performance index. Renew. Energy 2021, 171, 902–914. [Google Scholar] [CrossRef]
- Fotso, H.R.F.; Kaze, C.V.A.; Kenmoe, G.D. Real-time rolling bearing power loss in wind turbine gearbox modeling and prediction based on calculations and artificial neural network. Tribol. Int. 2021, 163, 107171. [Google Scholar] [CrossRef]
- Karimipour, A.; Ghalehnovi, M.; Golmohammadi, M.; de Brito, J. Experimental Investigation on the Shear Behaviour of Stud-Bolt Connectors of Steel-Concrete-Steel Fibre-Reinforced Recycled Aggregates Sandwich Panels. Materials 2021, 14, 5185. [Google Scholar] [CrossRef]
- Zhang, K.; Peng, K.X.; Ding, S.X.; Chen, Z.W.; Yang, X. A Correlation-Based Distributed Fault Detection Method and Its Application to a Hot Tandem Rolling Mill Process. IEEE Trans. Ind. Electron. 2020, 67, 2380–2390. [Google Scholar] [CrossRef]
- Zhang, K.; Peng, K.X.; Dong, J. A Common and Individual Feature Extraction-Based Multimode Process Monitoring Method With Application to the Finishing Mill Process. IEEE Trans. Ind. Inform. 2018, 14, 4841–4850. [Google Scholar] [CrossRef]
- Long, W.; Wu, T.B.; Xu, M.; Tang, M.Z.; Cai, S.H. Parameters identification of photovoltaic models by using an enhanced adaptive butterfly optimization algorithm. Energy. 2021, 229. [Google Scholar] [CrossRef]
- Cognet, V.; du Pont, S.C.; Thiria, B. Material optimization of flexible blades for wind turbines. Renew. Energy 2020, 160, 1373–1384. [Google Scholar] [CrossRef]
- Sangare, M.; Gupta, S.; Bouzefrane, S.; Banerjee, S.; Muhlethaler, P. Exploring the forecasting approach for road accidents: Analytical measures with hybrid machine learning. Expert Syst. Appl. 2021, 167, 113855. [Google Scholar] [CrossRef]
- Albahli, S.; Yar, G. Defect Prediction Using Akaike and Bayesian Information Criterion. Comput. Syst. Sci. Eng. 2022, 41, 1117–1127. [Google Scholar] [CrossRef]
- Bhattacharya, S.; Biswas, A.; Nandi, S.; Patra, S.K. Exhaustive model selection in b -> sll decays: Pitting cross-validation against the Akaike information criterion. Phys. Rev. D 2020, 101, 055025. [Google Scholar] [CrossRef]
- Zhang, A.M.; Yu, H.L.; Huan, Z.J.; Yang, X.B.; Zheng, S.; Gao, S. SMOTE-RkNN: A hybrid re-sampling method based on SMOTE and reverse k-nearest neighbors. Inf. Sci. 2022, 595, 70–88. [Google Scholar] [CrossRef]
- Long, W.; Jiao, J.J.; Liang, X.M.; Wu, T.B.; Xu, M.; Cai, S.H. Pinhole-imaging-based learning butterfly optimization algorithm for global optimization and feature selection. Appl. Soft Comput. 2021, 103, 107146. [Google Scholar] [CrossRef]
Feature | Importance | Feature | Importance | Feature | Importance |
---|---|---|---|---|---|
C-bolt-98.1° | 0.254579 | A-bolt-32.7° | 0.060742 | C-bolt-229.2° | 0.013685 |
C-blade-180° | 0.120046 | A-blade-180° | 0.023917 | A-bolt-294.6° | 0.009154 |
A-blade-90° | 0.113605 | A-bolt-196.5° | 0.020874 | B-bolt-294.6° | 0.004120 |
... | ... | ... | ... | ... | ... |
B-bolt-32.7° | 0.084388 | A-bolt-0° | 0.017649 | C-blade-temp | 0.0038916 |
B-bolt-98.1° | 0.060742 | C-bolt-98.1° | 0.014236 | C-blade-90° | 0.000112 |
Parameter | Range | Description |
---|---|---|
num_leaves | {1, 2, ···, n} | Number of leaves |
max_depth | [3, 10] | Maximum tree depth |
min_data_in_leaf | {1, 2, ···, n} | Minimum number of data in a leaf |
learning_rate | (0, 1] | Learning rate |
Actual Samples | Predicted Value | |
---|---|---|
Predicted to Be a Failure Class Sample | Predicted to Be a Normal Class Sample | |
Failure class samples | TP | FN |
Normal class samples | FP | TN |
Data Type | Dataset Name | Center Value | Variance Value | Total Number of Samples |
---|---|---|---|---|
Majority class dataset | 1 | [17, −4] | [2, 5] | 226 |
Minority class dataset | 2 | [14, −10] | [3, 6] | 20 |
3 | [13, −14] | [5, 0.4] | 20 |
Feature Parameters | Time | ||||||||
---|---|---|---|---|---|---|---|---|---|
03:27 | 03:28 | 03:29 | 03:30 | 03:31 | 03:32 | … | 07:24 | 07:25 | |
A-bolt-0° | 5.270 | 7.688 | 9.362 | 10.416 | 10.851 | 10.633 | … | 7.527 | 6.854 |
A-bolt-65.4° | 8.835 | 9.207 | 9.393 | 9.362 | 9.176 | 8.990 | … | 7.967 | 7.471 |
A-blade-65.4° | −776 | −772 | −768 | −752 | −732 | −720 | … | −736 | −752 |
… | … | … | … | … | … | … | … | … | … |
C-blade-Temp | 19.39 | 19.33 | 19.36 | 19.35 | 19.36 | 19.36 | … | 19.39 | 19.42 |
Dataset Name | Total Number of Samples | Number of Normal Class Samples | Number of Fault Class Samples | Class Ratio |
---|---|---|---|---|
Dataset 1 | 4639 | 4282 | 357 | 12.994:1 |
Dataset 2 | 4203 | 3857 | 346 | 12.147:1 |
Dataset Name | Model | FAR (%) | MAR (%) | F1-Score |
---|---|---|---|---|
Data 1 | CS-LightGBM | 6.04 ± 0.024 | 0.467 ± 0.0029 | 0.959 ± 0.0025 |
SMOTE-LightGBM | 7.66 ± 0.028 | 0.682 ± 0.0041 | 0.945 ± 0.0040 | |
K-means-SMOTE-LightGBM | 7.08 ± 0.022 | 0.605 ± 0.0038 | 0.950 ± 0.0029 | |
Borderline-SMOTE-LightGBM | 6.84 ± 0.031 | 0.525 ± 0.0036 | 0.952 ± 0.0032 | |
SMOTE-CS-LightGBM | 6.14 ± 0.019 | 0.465 ± 0.0022 | 0.956 ± 0.0027 | |
GSG-LightGBM | 6.53 ± 0.023 | 0.528 ± 0.0046 | 0.953 ± 0.0036 | |
GSG-CS-LightGBM | 5.27 ± 0.016 | 0.374 ± 0.0023 | 0.978 ± 0.0012 | |
Data 2 | CS-LightGBM | 6.55 ± 0.030 | 0.471 ± 0.0024 | 0.957 ± 0.0028 |
SMOTE-LightGBM | 8.12 ± 0.026 | 1.002 ± 0.0025 | 0.938 ± 0.0024 | |
K-means-SMOTE-LightGBM | 7.34 ± 0.029 | 0.773 ± 0.0029 | 0.946 ± 0.0027 | |
Borderline-SMOTE-LightGBM | 6.96 ± 0.031 | 0.552 ± 0.0034 | 0.952 ± 0.0032 | |
SMOTE-CS-LightGBM | 6.39 ± 0.024 | 0.507 ± 0.0021 | 0.958 ± 0.0019 | |
GSG-LightGBM | 6.92 ± 0.028 | 0.499 ± 0.0027 | 0.954 ± 0.0026 | |
GSG-CS-LightGBM | 6.32 ± 0.021 | 0.426 ± 0.0018 | 0.963 ± 0.0021 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tang, M.; Meng, C.; Wu, H.; Zhu, H.; Yi, J.; Tang, J.; Wang, Y. Fault Detection for Wind Turbine Blade Bolts Based on GSG Combined with CS-LightGBM. Sensors 2022, 22, 6763. https://doi.org/10.3390/s22186763
Tang M, Meng C, Wu H, Zhu H, Yi J, Tang J, Wang Y. Fault Detection for Wind Turbine Blade Bolts Based on GSG Combined with CS-LightGBM. Sensors. 2022; 22(18):6763. https://doi.org/10.3390/s22186763
Chicago/Turabian StyleTang, Mingzhu, Caihua Meng, Huawei Wu, Hongqiu Zhu, Jiabiao Yi, Jun Tang, and Yifan Wang. 2022. "Fault Detection for Wind Turbine Blade Bolts Based on GSG Combined with CS-LightGBM" Sensors 22, no. 18: 6763. https://doi.org/10.3390/s22186763
APA StyleTang, M., Meng, C., Wu, H., Zhu, H., Yi, J., Tang, J., & Wang, Y. (2022). Fault Detection for Wind Turbine Blade Bolts Based on GSG Combined with CS-LightGBM. Sensors, 22(18), 6763. https://doi.org/10.3390/s22186763