Abstract

Direct inverse analysis of faults in machinery systems such as gears using first principle is intrinsically difficult, owing to the multiple time- and length-scales involved in vibration modeling. As such, data-driven approaches have been the mainstream, whereas supervised trainings are deemed effective. Nevertheless, existing techniques often fall short in their ability to generalize from discrete data labels to the continuous spectrum of possible faults, which is further compounded by various uncertainties. This research proposes an interpretability-enhanced deep learning framework that incorporates Bayesian principles, effectively transforming convolutional neural networks (CNNs) into dynamic predictive models and significantly amplifying their generalizability with more accessible insights of the model's reasoning processes. Our approach is distinguished by a novel implementation of Bayesian inference, enabling the navigation of the probabilistic nuances of gear fault severities. By integrating variational inference into the deep learning architecture, we present a methodology that excels in leveraging limited data labels to reveal insights into both observed and unobserved fault conditions. This approach improves the model's capacity for uncertainty estimation and probabilistic generalization. Experimental validation on a lab-scale gear setup demonstrated the framework's superior performance, achieving nearly 100% accuracy in classifying known fault conditions, even in the presence of significant noise, and maintaining 96.15% accuracy when dealing with unseen fault severities. These results underscore the method's capability in discovering implicit relations between known and unseen faults, facilitating extended fault diagnosis, and effectively managing large degrees of measurement uncertainties.

1 Introduction

Rotating machinery systems, integral to various sectors including energy, manufacturing, and propulsion, necessitate rigorous condition monitoring and fault diagnosis to ensure their reliability and safety [1]. Among the diverse signals available for fault detection, vibration analysis is particularly valued for its rich informational content and low instrumentation costs. A quintessential example of such machinery is gear transmission systems, where the complexity of gear vibrations, spanning multiple time and length scales, makes direct inverse analysis of faults from response anomalies impractical. Consequently, the field has predominantly adopted data-driven approaches, comparing measurements from healthy baseline systems against those under operation to identify deviations indicative of faults. Traditional methodologies often incorporate signal processing techniques, such as wavelet transforms, to measure vibration responses and extract features reflecting gear health status [25]. These techniques, particularly effective in elucidating variations in underlying features following fault occurrences, have been extensively documented in the literature [68]. The integration of signal processing with machine learning has further augmented the potential of these techniques, enhancing feature classification capabilities as computational power has grown [912]. However, these early work in the field involved manual feature extraction, which proved ineffective for several reasons. The process is inherently subjective, heavily dependent on the expertise and judgment of the analyst, which can result in inconsistencies and the potential oversight of critical features. Additionally, manual feature extraction often fails to adequately capture the complex and subtle nuances of gear faults, especially when these faults develop gradually or occur under varying operational conditions. Moreover, such manual intervention is not scalable or efficient for continuous monitoring in large-scale industrial applications.

In recent years, machine learning, and particularly deep learning, have established themselves as powerful tools for fault diagnosis. Deep learning neural networks, especially convolutional neural networks (CNNs) [13], leverage powerful computing facilities to enhance feature extraction capabilities directly from raw data. These networks can process vast amounts of data, extracting significant features while maintaining a relatively small number of trainable parameters, which alleviates the computational burden during model training [1416]. The ability to automatically extract and process these features makes deep learning particularly effective and popular in handling complex diagnostic tasks. Under the umbrella of deep learning, supervised learning has shown considerable effectiveness. It relies on labeled data to train models, allowing for precise model tuning and validation. Examples of its application include work by Kim and Choi, who utilized a CNN for fault diagnosis based on signal segmentation [17], and Chen et al., who developed an adaptive neural network tailored to variable rotating speeds [18]. Shi et al. further extended these capabilities by introducing a Bidirectional Convolutional Long Short-Term Memory (BiConvLSTM) method that integrates vibration and rotational speed signals for enhanced fault classification [19]. Despite its strengths, supervised learning faces challenges, particularly the requirement for extensive labeled datasets, which can be difficult and costly to acquire in the context of gearboxes where the fault conditions form a continuous spectrum. This limitation can restrict the model's ability to generalize across diverse and unseen fault scenarios. On the other hand, unsupervised learning, which operates without labeled data, is often explored as a solution for situations characterized by small dataset sizes. This approach aims to autonomously discover patterns within the data, as demonstrated by Li et al. with an automatic encoder for gear state classification [20] and the use of a stacked sparse auto-encoder [21]. Zhou and Tang extended this approach into a semisupervised realm using a deep convolutional generative adversarial network to cope with limited fault labels [22]. While these methods are adept at making the most of small data volumes, they face significant challenges in ensuring model accuracy and maintaining reliability across varying operational conditions. Moreover, the challenge in our gearbox fault diagnosis scenario is not merely small data size but specifically the scarcity of labeled data, which is crucial for training supervised learning models.

In the case of gearbox, the diverse and often subtle nature of gearbox faults requires extensive labeled datasets to ensure accurate model training and validation. However, obtaining such comprehensive labeling is impractical and costly, particularly given the continuous spectrum of potential fault conditions in gearbox systems. This limitation significantly impacts the performance of data-driven diagnostic approaches, especially deep learning techniques, which are heavily reliant on the quality and expansiveness of training data. The complex dynamics of gearbox operations introduce additional layers of difficulty; the nonlinear vibrations and various operational anomalies compound the challenges, leading to frequent misclassifications of unfamiliar fault types as those within the limited scope of the training set. Such errors are exacerbated by inherent uncertainties in machinery operation, including variability in assembly and environmental noise, which can obscure diagnostic insights. Despite various advancements in data augmentation, such as transfer learning and semisupervised learning techniques [2325], the issues associated with limited labels and the persistent noise and uncertainty have seen limited progress in mitigation. Efforts like fuzzy neural networks (FNNs) [26,27] have been explored to extend classification capabilities to unknown fault scenarios using fuzzy logic, but these too fall short in integrating the uncertainties of real-world operational conditions into the classification process. The core of these challenges lies in the deterministic nature of most existing deep learning models, which are ill-equipped to handle the nuanced spectrum of uncertainty inherent in complex engineering systems like gearboxes. To truly advance the reliability and applicability of diagnostics in such contexts, a shift toward uncertainty-aware methodologies is necessary.

Therefore, it becomes imperative to integrate probabilistic analysis and account for prediction uncertainties within the diagnostic process. Given the inherent uncertainties and measurement noise in the data used for training, gear fault diagnosis must be approached as a probabilistic issue, akin to making diagnoses in medical practice. This probabilistic perspective introduces a crucial dimension of analysis, enabling the handling of unknown or unseen fault scenarios effectively. The nature of gear vibration is inherently complex, making it unfeasible to rely solely on deterministic models for fault classification. Instead, probabilistic methods allow for the assessment of similarity levels and implicit relationships among fault conditions in a probabilistic sense. Such analysis enables the classification of an unseen fault by comparing it probabilistically with known scenarios, thus allowing for confidence-based decision-making. This is particularly useful when direct deterministic extrapolation cannot accurately classify an unseen fault due to the variability and complexity of the signals involved. Bayesian approaches, with their capacity to provide rich probabilistic interpretations of outcomes, are particularly suited to this task. These methods model the uncertainty directly through the use of posterior distributions, offering a robust framework for incorporating both the known and the unknown variables affecting fault diagnosis [26]. Examples of Bayesian-based methods in machine learning include the Gaussian process, naïve Bayesian, and Gaussian mixture models, which all emphasize probabilistic prediction [2831]. However, while these models are powerful, they often face scalability limitations [3234] in high-dimensional spaces or require extensive datasets, which can be a significant constraint in practical applications. For instance, Gaussian processes are known for their computational complexity, which scales cubically with the number of data points [30], posing a challenge for their use in large-scale industrial environments. By adopting these uncertainty-aware methodologies, we can significantly enhance the accuracy and reliability of fault diagnosis systems. Such approaches not only provide a deeper understanding of the data and its inherent uncertainties but also improve the generalization capability of the diagnostic models, ensuring they are robust against the diverse and unpredictable nature of gearbox faults.

Building on the foundation of addressing the challenges posed by uncertainties and unseen fault scenarios in gearbox fault diagnosis, this research proposes a method to bridge the significant gap between theoretical models and their practical application. Our central hypothesis is that adopting a probabilistic analysis through advanced Bayesian methodologies can effectively meet these challenges. Specifically, our approach employs a deep learning framework that leverages modern Bayesian analytics, a strategy that has gained traction across various research domains for its robust feature learning capabilities and its proficiency in probabilistic prediction [3539]. This innovative framework stands out by incorporating state-of-the-art Bayesian optimization techniques in the training process, enabling precise estimation of the network's unknown parameters and their posterior distributions. Such methods significantly enhance our understanding of data correlations under conditions of uncertainty and are particularly effective at minimizing overfitting with limited datasets, while also simplifying the need for extensive hyper-parameter tuning. However, customizing and implementing this advanced Bayesian framework to address gear fault diagnosis, especially under conditions of limited labels and significant uncertainties, presents unique challenges. For instance, the computational complexity of this framework exceeds that of traditional CNNs, necessitating more sophisticated computational strategies to extract comprehensive information about the network's parameters effectively. Furthermore, the success of the posterior distribution calculations within this framework critically depends on the selection of appropriate priors, which are inherently based on specific hypotheses. Addressing these challenges to extend the capabilities of fault diagnosis in the face of limited data and significant uncertainties forms the core motivation of our research. By harnessing the power of advanced Bayesian analytics, our approach not only confronts uncertainties but also transforms them into actionable insights. By applying these refined Bayesian techniques, we aim to calculate posterior distributions of unknown parameters, providing a probabilistic understanding of gear faults that far exceeds binary or deterministic outputs. Different from traditional Bayesian Convolutional Neural Network (BCNN) applications, our approach specifically tailors the BCNN framework to the intricate task of gear fault diagnosis, focusing on the continuous spectrum of fault severities. By integrating BCNN with customized methods for classification and uncertainty quantification, our model goes beyond standard fault classification to also quantify prediction confidence. This capability is crucial for reliable diagnostics, particularly when dealing with unseen fault severities. Additionally, our approach introduces innovative techniques for analyzing and interpreting probabilistic outputs, offering deeper insights into the nuanced relationships between different fault types. This method allows for recognizing the continuous nature of fault progression rather than confining them to discrete categories, thereby promising to elevate the reliability and applicability of diagnostics in complex engineering systems.

The remainder of this paper is organized as follows: In Sec. 2, the concept and formulation of Bayesian-enriched deep learning framework is presented, in which the probabilistic optimization strategy to facilitate the model training is mathematically detailed. Section 3 outlines the data acquisition and case setup of a laboratory gear testbed employed in this research, as well as the specific Bayesian-influenced deep model details. In Sec. 4, comprehensive case demonstrations are conducted to showcase the unique aspects of the proposed Bayesian-enriched deep learning framework in dealing with uncertainties and unseen faults. Concluding remarks are summarized in Sec. 5.

2 Bayesian-Enriched Deep Learning Framework for Fault Diagnosis

In this section, we present the formulation of the enhanced fault diagnosis framework utilizing Bayesian-enriched deep learning framework. Our particular focus is on the Bayesian approach for neural network training and the variational inference approach for computational alleviation.

2.1 Bayesian Inference-Based Convolutional Neural Network Training Architecture.

Our framework enhances traditional convolutional neural networks by integrating Bayesian principles, significantly improving interpretability and robustness in gear fault diagnosis. Unlike conventional models that rely on fixed weights and biases, our architecture employs probabilistic weights and biases, enabling the network to naturally account for the uncertainties inherent in the diagnostic process. The primary innovation in our architecture lies in its ability to leverage Bayesian inference to transform convolutional layers into probabilistic entities. This approach captures the inherent uncertainties in feature extraction, leading to a more robust and comprehensive representation of the input data. By treating each weight and bias as a random variable drawn from a probability distribution, our model can better handle the variability and complexity of gear fault scenarios.

Incorporating Bayesian inference into the convolutional layers allows our model to maintain uncertainty information throughout the network. This probabilistic treatment extends to the fully connected layers, which aggregate features extracted by the convolutional layers while preserving the uncertainty information. Consequently, the model can provide predictions in the form of probability distributions, offering not only the predicted class but also the associated uncertainty quantification. This probabilistic output is crucial for making informed decisions, particularly in scenarios with limited data and significant measurement noise. The training process of our architecture involves optimizing these probabilistic weights and biases using Bayesian inference techniques [40]. Specifically, we employ variational inference to approximate the posterior distribution of the network parameters. The purpose of model training is to find the optimal posterior distribution [41,42] of these parameters, which inherently incorporates the uncertainty in the model. This approach enables efficient and scalable training, even in high-dimensional spaces, by converting the complex posterior inference problem into a more manageable optimization task.

Our framework's ability to quantify uncertainty offers a significant advantage over traditional deterministic models. By providing a measure of confidence in the predictions, our model enhances the reliability of fault diagnosis, especially in unseen and uncertain scenarios. This probabilistic approach allows for a nuanced interpretation of fault severities, acknowledging the continuous nature of fault progression rather than constraining them to discrete categories. In summary, our Bayesian-enriched architecture transforms the way convolutional neural networks handle gear fault diagnosis by embedding probabilistic reasoning at every stage. This methodology significantly improves the model's ability to generalize from limited data and enhances its performance in predicting and understanding complex fault scenarios. The specific construction of this architecture, including detailed layer configurations and the data conversion process, will be elaborated in the following Sec. 3.

2.2 Variational Inference for Probabilistic Parameter Optimization.

As the features the probabilistic nature, the backpropagation optimization becomes probabilistic, aiming at identifying the probabilistic distributions of the unknowns θ=[w,b], assuming that the input–output relation of a single sample can be implicitly expressed as y=f(x,w,b) in Bayes' rule
(1)

This classical expression, denoted as Eq. (1), involves several probabilistic components that integrate the likelihood, prior, and marginal likelihood of the data, which are essential for Bayesian inference. Since the analytical solution of posterior probability density function (PDF) is intractable, numerical approximations are commonly employed, which are oftentimes computationally intensive. This issue is further deteriorated by the high dimension of θ because of the large scale of BCNN model in gear diagnosis. While the Markov chain Monte Carlo (MCMC), such as Gibbs sampler, is shown to be able to approximate the true posterior well through expedited sampling, directly sampling a high-dimensional posterior is still costly [4346]. Even though the posterior PDF can be numerically approximated, it generally is a nonstandard probabilistic distribution, which cannot be well characterized by parametric representation. With such a nonstandard posterior PDF, the maximum a posterior estimate usually is further employed for point estimation/deterministic estimation (i.e., yields single solution corresponding to the largest probability value) [47,48]. Fundamentally, this cannot serve the probabilistic parameter optimization required for BCNN training.

To tackle these issues, an alternative approach is to approximate the true posterior PDF with a variational distribution q(w,b|ϕ) in the form of a known analytical function, referred to as the variational inference approach [49,50]. For example, if q(w,b|ϕ) is a normal distribution, then ϕ represents the mean and variance of unknowns θ=[w,b]. As θ is multivariate, its mean and variance essentially are in the form of a vector and a matrix, respectively. Mathematically, such variational distribution-based posterior approximation can be realized by minimizing the Kullback–Leibler (KL) divergence (also known as relative entropy) between p(w,b|X,y) and q(w,b|ϕ) [51]. The KL divergence is defined as
(2)
Eq(w,b|ϕ)(.) is the expectation operator of the target function with respect to the variational distribution. Substituting Eq. (1) into Eq. (2) yields
(3)
One may easily notice that logp(y|X) is a constant because it is not dependent upon [w,b]. Therefore, minimizing the KL divergence mathematically is equivalent to maximizing Eq(w,b|ϕ)[logp(y|w,b,X)+logp(w,b)logq(w,b|ϕ)] in Eq. (3). This is the evidence lower bound (ELBO) also known as the negative variational free energy [52]. The relation between KL divergence and ELBO can be shown as
(4)
At this point, the probabilistic training can be formulated as an optimization problem where the KL divergence is considered as cost function or optimization objective, upon which the best ϕ* can be identified through
(5)
As indicated, the ELBO essentially is an expectation value, which can naturally be computed following the law of large numbers [53]:
(6)

where M is a sufficiently large number to ensure the approximation convergence.

In the aforementioned procedure, each run performs the evaluation upon the sample [w(k),b(k)] randomly drawn from the variational distribution q(w,b|ϕ). This process can be facilitated by MCMC that is capable of reducing the computational cost via expediting sampling [43,46]. It should be noted that the model training process also allows one to involve the standard cross-entropy loss function that represents the maximum likelihood classification loss. Therefore, a more generic optimization problem based upon Eq. (5) can be formulated as
(7)

where Le is the cost function contributed by the cross-entropy loss. Combining the variational inference approach with the stochastic gradient descent (SGD) algorithm, the probabilistic backpropagation optimization to lead the model training becomes readily available. It is worth mentioning that the scheme of reparameterization is adopted to ensure the normal backpropagation since the training involves the stochastic sampling step [54]. The batch training is widely used for neural network training, where the single batch of training samples will be involved for optimization (Eq. (7)) in each iteration.

To seamlessly integrate the variation inference approach into neural network training, Gal and Ghahramani recently developed the method of Monte Carlo dropout that is embedded into the layer that has weights and bias to be optimized [55]. This new design is equivalent to applying variational inference with a specific variational distribution on neural network training. Gal further validated that the variational predictive distribution can be obtained through randomly drawing Bernoulli random variables from dropout during the inference [56]. This method provides a practical solution for BCNN model training, which lays the foundation for the successful implementation of this research.

2.3 Probabilistic Prediction Upon Harnessing Bayesian Convolutional Neural Network.

Once the BCNN model is established, we can apply it to performing the prediction. The posterior predictive distribution of y* over a set of unobserved time-series X* given the observed input–output relations D=[X,y] can be derived according to the Bayes' rule, which is expressed as
(8)

where p(y*|w,b,X,y,X*) is a marginal PDF to represent the posterior predictive distribution. p(w,b|X,y) is the posterior PDF of the network weights and biases (Eq. (1)). For each deterministic input xi*, its corresponding output y* is probabilistic because the weights and biases [w,b] are randomly sampled from the optimized variation distribution q(w,b|ϕ*). While the weights and biases of network are subject to the standard statistical distributions, the output of final layer, i.e., probability of classes may not have the analytical form due to the nonlinear activation functions applied. Therefore, Monte Carlo analysis usually is employed to numerically identify such probability distribution given the input xi*. Without loss of generality, the outputs of all layers other than the input layer in BCNN will become probabilistic, which can be extracted in a similar way. Such probabilistic nature of BCNN allows one to account for the uncertainty effect in prediction, showing the unique strength of BCNN.

The probabilistic predictions generated by our model enable the detection of unseen fault scenarios by assessing the associated uncertainty. When the model encounters an input that does not closely match any known fault patterns, the increased uncertainty in the prediction can indicate an unseen fault scenario. This is particularly valuable for distinguishing between very similar fault conditions, such as different severities of the same fault type. Traditional deterministic models often struggle with these nuances, especially when the severity of the fault varies along a continuous spectrum.

While traditional CNNs can categorize samples into predefined classes, they lack the capability to quantify uncertainty in their predictions. This limitation can lead to overconfident classifications, even when encountering unseen fault severities. In contrast, our Bayesian-inference enhanced framework can discern subtle differences between fault severities by modeling the probabilistic nuances of each condition. This approach provides a more detailed understanding of the similarities and discrepancies between similar conditions, making the model's reasoning more transparent and interpretable to humans. The ability to capture these nuances enhances the model's reliability in practical applications, where distinguishing between varying fault severities is crucial. The practical advantages of our approach, including its superior handling of unseen and uncertain scenarios, will be demonstrated in the subsequent case analysis. The workflow of the proposed framework is shown in Fig. 1.

Fig. 1
Workflow of Bayesian inference predictive framework
Fig. 1
Workflow of Bayesian inference predictive framework
Close modal

3 Fault Diagnosis Implementation on Lab-Scale Gear Testbed

In this section, we present the details of the lab-scale gear testbed used for collecting training data with various faulty scenarios as well as the experimental data. We further report the setup and the configuration details of the BCNN constructed in this research.

3.1 Experimental Testbed and Data Collected.

Gear vibration responses contain rich and arguably implicit features that can reflect the health condition of the underlying system. In this research, we measure the vibration information directly from a lab-scale gear testbed (Fig. 2) and use the signals to conduct the subsequent gear fault diagnosis. In this testbed, a 32-tooth pinion and an 80-tooth gear are installed on the first stage input shaft, whereas a 48-tooth pinion and a 64-tooth gear are installed on the second stage output shaft. A motor is used to control the gear speed following a trapezoidal profile with speed-increase, constant speed, and speed-decrease. The gear speed is related to the input shaft speed that is measured through a tachometer. An accelerometer is placed to the vicinity to measure the gear vibration signals, which are acquired using dSPACE system with 20 kHz sampling frequency. The time synchronous averaging (TSA) approach particularly is employed to minimize the measurement uncertainty and conduct the signal conversion between the time-even and the angle-even domains, leading to response signals with reduced noncoherent components [2]. This facilitates the succeeding gear fault diagnosis analysis for methodology validation. The ability to handle variable speeds in our experimental setup has significant implications for real-world applications, such as wind turbines, which operate under varying speed conditions. The probabilistic approach of our framework effectively accounts for the changes in operational dynamics, improving fault diagnosis reliability in such complex and variable environments.

Fig. 2
Experimental testbed
Fig. 2
Experimental testbed
Close modal

In this research, we purposely introduce nine different fault conditions shown in Fig. 3 into the pinion on the input shaft. Chipping tips are created by removing certain amount of material from the pinion. Specifically, pinion material losses of 0.15 mm, 0.24 mm, 0.38 mm, 0.48 mm, and 0.68 mm, respectively are introduced, leading to five different damaged teeth with increasing severity levels. For each fault condition, 104 signals are collected, yielding totally 936 (104×9) samples. The details of the dataset are given in Table 1, where each fault condition is assigned with a fault type identification (ID) number. Unless otherwise specified, these fault-type IDs are used throughout this research to indicate the associated fault conditions. The appearances of pinions under different fault conditions are plotted in Fig. 3. In each sample, 3600 angle-even data points are recorded in the course of 4 gear revolutions, resulting in 3600 time-domain features. These features are represented by the time-series signals, which are related to different fault conditions as illustrated in Fig. 4. The time-series signals can be stored in the form of either numeric array data or image data. In this research, we use the image format to establish the deep learning model since it can be potentially extended to other computer vision-based fault diagnosis. All the image data are shared, and can be found in the public link.

Fig. 3
Nine fault conditions on the pinion with five different severities of chipping tips
Fig. 3
Nine fault conditions on the pinion with five different severities of chipping tips
Close modal
Fig. 4
Vibration signal samples associated with different fault conditions: (a) healthy, (b) missing tooth, (c)crack, (d) spalling, (e) chipping tip 1 (least severe), (f) chipping tip 2, (g) chipping tip 3, (h) chipping tip 4, and (i) chipping tip 5 (most severe)
Fig. 4
Vibration signal samples associated with different fault conditions: (a) healthy, (b) missing tooth, (c)crack, (d) spalling, (e) chipping tip 1 (least severe), (f) chipping tip 2, (g) chipping tip 3, (h) chipping tip 4, and (i) chipping tip 5 (most severe)
Close modal
Table 1

Experimental data summary

TypeFault conditionData size
1Healthy104
2Missing tooth104
3Crack104
4Spalling104
5Chipping_tip_5 (least severe)104
6Chipping_tip_4104
7Chipping_tip_3104
8Chipping_tip_2104
9Chipping_tip_1 (most severe)104
TypeFault conditionData size
1Healthy104
2Missing tooth104
3Crack104
4Spalling104
5Chipping_tip_5 (least severe)104
6Chipping_tip_4104
7Chipping_tip_3104
8Chipping_tip_2104
9Chipping_tip_1 (most severe)104

3.2 Bayesian Convolutional Neural Network Model Construction.

Following the formulation outlined in Sec. 2, we develop the BCNN model using the Python TensorFlow framework [53]. As mentioned, BCNN fundamentally is a CNN built upon Bayesian learning. The architecture of BCNN thus is analogous to that of CNN. While there are certain guidelines for the design of deep learning model as indicated in literature [54], the implementation is usually case specific, aiming at ensuring adequate training while avoiding both underfitting and overfitting. In neural network training, monitoring training and validation accuracy with respect to epoch is a well-known way to assess the model learning adequacy. Following the aforementioned guidelines and learning performance assessment, we develop the architecture of BCNN model as shown in Fig. 5. Specifically, this architecture consists of six layers which are explained in detail in Table 2. The hyperparameters listed in Table 2, including the input shape and layer configurations, were carefully selected to optimize the performance of the BCNN model. The input shape of 256 × 256 × 1 was chosen to match the resolution of the signal plot images, ensuring that the model captures essential details from the vibration signals. The convolutional layers, with filter sizes and activation functions (ReLU) specified, were designed to efficiently extract features while keeping the model computationally manageable. These configurations were determined through empirical testing to provide a balance between model complexity and diagnostic accuracy. The size of output layer is selected toward the case investigated.

Fig. 5
BCNN model architecture
Fig. 5
BCNN model architecture
Close modal
Table 2

BCNN layer configuration

LayerOutput shapeParameter number
Input256×256×10
Convolutional (filter: 3×3×32) (ReLU)128×128×32608
Convolutional (filter: 3×3×64) (ReLU)64×64×6436,928
Convolutional (filter: 3×3×128) (ReLU)32×32×128147,584
Flatten131,0720
Dense (Softmax)92,097,160
LayerOutput shapeParameter number
Input256×256×10
Convolutional (filter: 3×3×32) (ReLU)128×128×32608
Convolutional (filter: 3×3×64) (ReLU)64×64×6436,928
Convolutional (filter: 3×3×128) (ReLU)32×32×128147,584
Flatten131,0720
Dense (Softmax)92,097,160

Note: Padding is applied, and stride is set as 2 in both directions of 2D feature map. Total number of trainable parameters is 2,282,280.

In this research, our goal is to tackle the challenges of uncertainties and unknown faults in gear fault diagnosis. While the BCNN technique holds the promising aspects of probabilistic prediction and combining empirical experience with deep learning inference, its true performance can only be tested by systematic case investigations. To this end, we analyze three cases, i.e., normal classification where the entire, original dataset is used, (2) noisy classification where the original dataset is contaminated with various levels of noise, and (3) extended classification where the training data only involves a subset of fault labels available while the testing data contains signals corresponding to unseen fault scenarios.

As a demonstration, we use nine nodes of output layer for the first two cases where all 9 labels of data will be utilized for model establishment. It is worth noting that the weights and biases to be optimized in the BCNN model are represented probabilistically, i.e., using mean and variance. Recall that ϕ represents the mean and variance of each network unknown in Sec. 2. The total number of trainable parameters hence is much greater than that of its corresponding CNN model.

4 Case Investigations

In this section, we present the details of the implementation and validation of the proposed BCNN to tackle the challenges of uncertainties and unseen faults. As mentioned, three separate cases are analyzed. We start from the usual classification with nine classes of gear conditions using the original dataset collected experimentally, followed by cases of dataset with further noise contamination and training data employing only a subset of fault labels. The performances of the last two cases will be analyzed with respect to the first case, in order to highlight the distinctive features of the proposed BCNN methodology.

4.1 Normal Classification of Existing/Known Faults Using the Entire Original Dataset.

In this first case, we follow the usual way of machine learning by simply splitting the entire original dataset (outlined in Table 1) into 80% training and 20% testing data. To allow the label balance, the split is implemented upon the subdataset of each fault condition, where 81 out of 104 are used as training data and the rest are used as testing data, respectively. This is known as the stratified data split [55]. Therefore, we in total have 747 (83×9) training and 189 (21×9) testing data samples. The BCNN model with architecture outlined in Table 2 is deployed for training. 5% of the training data is held out for validation during training. The experiments were conducted using the Anaconda distribution with the Spyder IDE. The model was implemented using TensorFlow and Keras libraries, which are well-suited for deep learning tasks. The hardware platform consisted of an Alienware desktop equipped with a high-performance CPU. Other operating hyperparameters and metrics used for training are listed in Table 3. The hyperparameters in Table 3 were selected to enhance the training process of the BCNN model. The tuning process involved adjusting key hyperparameters such as learning rate, batch size, and number of epochs. These hyperparameters were predefined within specific ranges based on empirical knowledge from similar studies, ensuring that variations in outcomes were minimal. We employed grid search to systematically explore different combinations and identify the optimal settings that led to effective model convergence and minimized overfitting. A learning rate of 0.0002 was chosen based on preliminary experiments that showed it provided stable and efficient convergence. The batch size and number of epochs were set to 9 and 100, respectively, to ensure that the model was trained effectively without overfitting. The use of SGD as the optimization algorithm was selected for its proven effectiveness in large-scale learning tasks, with adjustments made to balance learning efficiency and stability. ACC, which is the most intuitive classification accuracy metric, is used for evaluating the training and validation accuracy. It is simply a ratio of correctly predicted observations to the total observations, which can be retrieved from the confusion matrix [56]. The training and validation accuracy curves with respect to epoch are plotted in Fig. 6. Training accuracy curve exhibits a smooth and continuous increase, whereas the validation accuracy curve exhibits overall increasing tendency with slight oscillations. Both accuracies will approach nearly 100% as training proceeds, showing the adequate model learning without underfitting and overfitting.

Fig. 6
Training history: (a) training accuracy and (b) validation accuracy
Fig. 6
Training history: (a) training accuracy and (b) validation accuracy
Close modal
Table 3

Operating hyperparameters and metrics for classification analysis

Optimization algorithmLearning rateEpoch sizeBatch sizeLoss metricAccuracy metric
Stochastic gradient decent (SGD)0.00021009Categorical cross-entropy and KL divergenceACC
Optimization algorithmLearning rateEpoch sizeBatch sizeLoss metricAccuracy metric
Stochastic gradient decent (SGD)0.00021009Categorical cross-entropy and KL divergenceACC

Once the model is established, it can be directly used for the classification of testing data. As noted, BCNN model inherently behaves as a stochastic neural network because of the probabilistic weights and biases incorporated. Therefore, it can produce probabilistic prediction under each deterministic testing sample. Particularly, we adopt the Monte Carlo analysis to run the emulations 1000 times to gather the distribution information of each testing sample. As illustration, Fig. 7 gives the prediction results of two testing samples with actual fault type 1 (i.e., healthy) and 2 (i.e., missing tooth), respectively. Apparently, the results are no longer the point estimations, and they instead become probabilistic. Here, we use the kernel density to represent such probabilistic information since it can produce more smooth and continuous distribution curve than the histogram [57]. The probabilistic distributions are obtained for all nine fault conditions/types. Through comparing and interpreting those distributions, reliable decision-making can be realized.

Fig. 7
Illustration of probabilistic predictions over particular testing samples using kernel density: (a) sample corresponding to true fault type 1 and (b) sample corresponding to true fault type 2
Fig. 7
Illustration of probabilistic predictions over particular testing samples using kernel density: (a) sample corresponding to true fault type 1 and (b) sample corresponding to true fault type 2
Close modal

Figure 7(a) points to the prediction as fault type 1 being most probable since its probability mean is larger than that of other fault types. The widths of distributions are relatively small showing the high confidence level of the prediction. Similarly, Fig. 7(b) points to fault type 2 being identified. They are both correct predictions as compared with the actual faults. One may notice that there is “True” or “False” indicated in each prediction plot aforementioned. The y-axis shows the frequency of predicted mean probabilities within specific bins. Undoubtedly, the decision-making of these two testing samples (Fig. 7) is quite confident. Notably, incorrect classifications often have mean probabilities around 0.1, indicating that these fault types are similar or challenging to distinguish. This highlights the value of probabilistic predictions in managing uncertainty and enhancing the reliability of fault diagnosis. However, in some scenarios, the predicted distribution mean values of all fault labels are very close, and the distribution widths may also be large. This poses difficulty in making reliable decision by the BCNN model itself. Hence, in general, one should also allow BCNN model to indicate “Uncertain” in order to avoid false prediction. The metric to define unconfident decision-making essentially is indeed subjective and may depend on empirical knowledge. In this case, we set a threshold of prediction probability mean as 0.2. If the probability mean values of all labels are less than the threshold, the decision-making/classification is considered unconfident even the fault can be simply identified as the fault label with highest probability mean. Once the unconfident classification takes place, all prediction plots will be marked as “False.” To assist reliable decision-making, empirical experience and knowledge can be further incorporated, similar to human health diagnosis. This is the significant and unique strength of the BCNN model, which provides the various options to increase the flexibility of decision-making [5861].

Recall 21 testing samples for each fault label are employed. We can calculate the mean and standard deviation of the prediction probability distributions over each single testing sample (Fig. 7), and then gather such statistical information of all samples corresponding to different fault conditions together to establish the probabilistic prediction result over entire testing space, as shown in Fig. 8. Consistent with the observations obtained from Fig. 7, the probability mean of testing samples for true fault type is much larger than that for other fault types, indicating accurate classification in a probabilistic sense. Additionally, the probability standard deviations are small overall. The probability standard deviation of testing samples for true fault type is slightly greater than that for other fault types. However, the standard deviation percentage is still small because of its associated higher probability mean. The results collectively illustrate accurate decision-making with high confidence level. Apparently, the probabilistic nature of BCNN model aligns with the general decision-making logic for a stochastic process in real world.

Fig. 8
Classification probability outputs of testing samples corresponding to different fault types in Case 1: (a) true fault type 1, (b) true fault type 2, (c) true fault type 3, (d) true fault type 4, (e) true fault type 5, (f) true fault type 6, (g) true fault type 7, (h) true fault type 8, and (i) true fault type 9Classification probability outputs of testing samples corresponding to different fault types in Case 1: (a) true fault type 1, (b) true fault type 2, (c) true fault type 3, (d) true fault type 4, (e) true fault type 5, (f) true fault type 6, (g) true fault type 7, (h) true fault type 8, and (i) true fault type 9Classification probability outputs of testing samples corresponding to different fault types in Case 1: (a) true fault type 1, (b) true fault type 2, (c) true fault type 3, (d) true fault type 4, (e) true fault type 5, (f) true fault type 6, (g) true fault type 7, (h) true fault type 8, and (i) true fault type 9
Fig. 8
Classification probability outputs of testing samples corresponding to different fault types in Case 1: (a) true fault type 1, (b) true fault type 2, (c) true fault type 3, (d) true fault type 4, (e) true fault type 5, (f) true fault type 6, (g) true fault type 7, (h) true fault type 8, and (i) true fault type 9Classification probability outputs of testing samples corresponding to different fault types in Case 1: (a) true fault type 1, (b) true fault type 2, (c) true fault type 3, (d) true fault type 4, (e) true fault type 5, (f) true fault type 6, (g) true fault type 7, (h) true fault type 8, and (i) true fault type 9Classification probability outputs of testing samples corresponding to different fault types in Case 1: (a) true fault type 1, (b) true fault type 2, (c) true fault type 3, (d) true fault type 4, (e) true fault type 5, (f) true fault type 6, (g) true fault type 7, (h) true fault type 8, and (i) true fault type 9
Close modal

4.2 Classification of Existing Faults Under Contaminated Measurement.

The gear fault data employed in this research were measured through strictly controlling the motor speed, followed by signal processing using the TSA technique [2]. The uncertainties and noise thus are minimized in the dataset. Nevertheless, in practical situations, the measurement inevitably is contaminated with various noises and even assemblage uncertainties. Moreover, many real-time fault diagnosis systems are autonomous and oftentimes implemented upon the direct measurements without any data denoising. In order to examine the feasibility of this method for fault diagnosis under uncertainties, we purposely contaminate the current measurement by introducing different levels of white noise, i.e., 3% and 10% standard deviations of vibration signal amplitudes, respectively. The discrepancy of signals with higher level of noise will become more obvious as shown in Fig. 9. The results obtained will be compared with those presented in Sec. 4.1. We anticipate that the proposed BCNN can handle noises effectively by providing probabilistic decision-making.

Fig. 9
Illustration of vibration signal samples with different noise levels: (a) nominal sample without additional noise, (b) sample with additional noise 3%, and (c) sample with additional noise 10%
Fig. 9
Illustration of vibration signal samples with different noise levels: (a) nominal sample without additional noise, (b) sample with additional noise 3%, and (c) sample with additional noise 10%
Close modal

We first carry out the analysis under 3% noise level following the same procedure outlined in Sec. 4.1 and obtain the prediction results of two particular samples (Fig. 10) and the aggregated prediction results over entire testing space (Fig. 11). Figure 9 indicates the accurate fault classification when additional 3% noise is introduced. As compared with Fig. 6, the slight difference is the small reduction of probability mean from 0.25 to 0.24 (Fig. 6(a) versus Fig. 9(a)) accompanied by the small increase of the distribution width for the actual fault condition. While this indicates that the confidence level of decision-making for these particular samples decreases slightly, which is expected, the BCNN model is capable of conducting the fault diagnosis task successfully. The aggregated results (Fig. 11) generically reflect the probabilistic nature of decision-making for entire testing samples with different associated true faults. They are essentially identical to the results shown in Fig. 8, illustrating the good performance of BCNN in handling the contaminated data.

Fig. 10
Illustration of probabilistic predictions over particular testing samples in Case 2 (with additional noise 3%): (a) sample corresponding to true fault type 1 and (b) sample corresponding to true fault type 2
Fig. 10
Illustration of probabilistic predictions over particular testing samples in Case 2 (with additional noise 3%): (a) sample corresponding to true fault type 1 and (b) sample corresponding to true fault type 2
Close modal
Fig. 11
Classification probability outputs of testing samples corresponding to different fault types (with additional noise 3%): (a) true fault type 1, (b) true fault type 2, (c) true fault type 3, (d) true fault type 4, (e) true fault type 5, (f) true fault type 6, (g) true fault type 7, (h) true fault type 8, and (i) true fault type 9Classification probability outputs of testing samples corresponding to different fault types (with additional noise 3%): (a) true fault type 1, (b) true fault type 2, (c) true fault type 3, (d) true fault type 4, (e) true fault type 5, (f) true fault type 6, (g) true fault type 7, (h) true fault type 8, and (i) true fault type 9Classification probability outputs of testing samples corresponding to different fault types (with additional noise 3%): (a) true fault type 1, (b) true fault type 2, (c) true fault type 3, (d) true fault type 4, (e) true fault type 5, (f) true fault type 6, (g) true fault type 7, (h) true fault type 8, and (i) true fault type 9
Fig. 11
Classification probability outputs of testing samples corresponding to different fault types (with additional noise 3%): (a) true fault type 1, (b) true fault type 2, (c) true fault type 3, (d) true fault type 4, (e) true fault type 5, (f) true fault type 6, (g) true fault type 7, (h) true fault type 8, and (i) true fault type 9Classification probability outputs of testing samples corresponding to different fault types (with additional noise 3%): (a) true fault type 1, (b) true fault type 2, (c) true fault type 3, (d) true fault type 4, (e) true fault type 5, (f) true fault type 6, (g) true fault type 7, (h) true fault type 8, and (i) true fault type 9Classification probability outputs of testing samples corresponding to different fault types (with additional noise 3%): (a) true fault type 1, (b) true fault type 2, (c) true fault type 3, (d) true fault type 4, (e) true fault type 5, (f) true fault type 6, (g) true fault type 7, (h) true fault type 8, and (i) true fault type 9
Close modal

We further increase the noise level to 10% to examine how robust the model is to deal with the large degree of measurement uncertainties. The results obtained under different noise levels are compared in Fig. 12. For concise illustration, only the probability mean and standard deviation of testing samples for actual/true fault condition are included. Clearly, the predictions under different noise levels exhibit very small discrepancies. It appears that the worst case (10% noise level) will increase the bands of both probability mean and standard deviation for certain fault conditions, i.e., fault type 1, 8 and 9. This is reasonable since the noise naturally will interfere the decision-making. The consequence is, even though the decision-making is accurate, its confidence level may reduce. While the probability distributions predicted indeed illustrate the unique advantage of BCNN model, the crisp testing/classification accuracy is worth examining since it is a direct indicator to assess the model performance. Without loss of generality, the testing samples will be classified as the fault labels with highest probability mean. By comparing the actual faults of the testing samples, the classification accuracy can be easily computed. Table 4 gives the number of accurately classified samples. Recall we have total 189 testing samples. The classification accuracies of cases with different noise levels are all 100%, showing the excellent performance of BCNN model. Besides, the decision-making is categorized either confident or unconfident based upon the probability mean threshold mentioned in Sec. 5.1. When the probability mean values of all fault labels are less than the specified threshold, we claim that this decision-making is unconfident. Therefore, the accurately classified samples consist of the samples that are unconfidently or confidently identified. In other words, the confidently and unconfidently classified samples are two subsets of accurately classified samples. The same probability mean threshold, i.e., 0.2, is adopted here and the numbers of samples with confident and unconfident decision-making, respectively are appended in Table 4. Apparently, the accurate decision-making of all testing samples is highly confident. This consistently validates the proposed methodology.

Fig. 12
Classification probability outputs of testing samples with different actual fault types with respect to different additional noise levels: (a) probability mean and (b) probability standard deviation (solid line denotes the low or upper bound, and dash line denotes the mean value of distribution)
Fig. 12
Classification probability outputs of testing samples with different actual fault types with respect to different additional noise levels: (a) probability mean and (b) probability standard deviation (solid line denotes the low or upper bound, and dash line denotes the mean value of distribution)
Close modal
Table 4

Testing result with respect to different noise levels

No additional noiseAdditional noise 3%Additional noise 10%
AccurateConfidentUnconfidentAccurateConfidentUnconfidentAccurateConfidentUnconfident
18918901890018900
No additional noiseAdditional noise 3%Additional noise 10%
AccurateConfidentUnconfidentAccurateConfidentUnconfidentAccurateConfidentUnconfident
18918901890018900

4.3 Extended Classification With Unseen Faults.

One of the goals of this research is to explore a practical way of conducting extended classification for gear diagnosis. That is, we develop, leveraging the BCNN model, the diagnosis of gear system with testing data that the model has not seen during the training phase. This scenario is quite common in practical implementation, as gears feature (infinitely) many fault conditions, whereas the training dataset is always limited especially in terms of fault labels. Our hypothesis is that the probabilistic nature of the BCNN framework can provide insights of unseen faults with respect to existing fault labels employed in training, and produce useful guidance for decision-making. Recall that our dataset includes five different severity levels of the chipping fault. In practice, chipping is a continuously varying fault condition. Therefore, in our subsequent case study, we purposely hold out one of the chipping tip severity levels in BCNN training. In other words, only four chipping tip severity levels are treated as known fault labels to train the BCNN model, and the remaining one chipping tip data are used to evaluate the BCNN capacity in terms of coping with unseen fault conditions. We analyze two different scenarios, i.e., holding out a minor chipping tip case, and holding out a severe chipping tip case.

4.3.1 Scenario 1: Classification of a Minor Chipping Tip Fault as Unseen Fault Condition.

In this scenario, we choose the chipping tip 4, i.e., fault type 6 (Table 1) as the unseen fault. Correspondingly, we use 104 samples of chipping tip 4 as testing data and the rest of samples corresponding to other fault conditions as the training data. Since this classification analysis is not standard, the metric for assessing the classification accuracy needs to be redefined. That is, the classification is accurate when the unseen fault is identified as being close to the neighboring fault type. For this particular scenario, as long as the testing sample is classified into chipping tip 5 (fault type 5) or chipping tip 3 (type 7), it is counted as an accurately classified sample. The BCNN model architecture shown in Table 2 is employed. The major difference is that the size of output layer reduces to 8 because one fault label, i.e., chipping tip 4 is hidden from training. The same operating parameters and metrics (except the classification accuracy), as tabulated in Table 3, are also used. Once model training is completed, the BCNN model is employed for the extended classification analysis.

Similarly, we first look into the probabilistic prediction result of one particular sample as shown in Fig. 13. As compared with the result of normal classification analysis reported in Sec. 4.1, the interference of the faults other than the actual fault become more significant. Specifically, even though the classification result is type 5, which is correct according to above metric, fault type 1 (i.e., healthy condition) appears to play a role. To further elucidate the impact of different fault labels on the classification results, the prediction distributions of all 104 testing samples are generated and shown in Fig. 14. Fault type 1 indeed is a major interference according to the associated distribution of probability mean values (Fig. 14(a)). Such observation can be explained by the fact that the minor chipping tip generally induces small vibration change as compared with healthy condition. The features of vibration measurements under these two conditions thus will be similar, which are difficult to discriminate. However, the distribution of probability standard deviations for fault type 1 is wider than that of fault type 5 or 7 (correct fault to be identified), implying the larger degree of uncertainties exist in the predictions. Intuitively, fault type 5 will be the classified fault condition for most of the testing samples.

Fig. 13
Illustration of probabilistic prediction over particular testing sample (scenario 1 in Case 3)
Fig. 13
Illustration of probabilistic prediction over particular testing sample (scenario 1 in Case 3)
Close modal
Fig. 14
Classification probability outputs of testing samples (scenario 1 in Case 3)
Fig. 14
Classification probability outputs of testing samples (scenario 1 in Case 3)
Close modal

We then analyze the probabilistic information shown in Fig. 14 to obtain the testing/classification accuracy and confidence level of predictions, which are shown in Table 5. The same threshold, i.e., 0.2, is applied to indicate the confidence level of decision-making. Consistently, fault type 1 is found to be the major source of misclassification, to which 14 samples are misclassified. The testing accuracy thus is computed as 86.5% (90/104). While the wrong misclassification occurs on all 14 samples, it is observed that 13 out of 14 samples are classified unconfidently. In other words, the BCNN model may also consider those 13 samples to be either fault type 5 or 7, which match the ground truth. This provides the possibility toward the diagnosis accuracy enhancement when the empirical judgment is further applied on those samples that are classified unconfidently. In total, 54 (13 + 43) samples may have the pivot features between the features of fault types 1 and 5, which makes them difficult to be classified correctly with high confidence level. The fundamental reason is that the features in some testing samples may exhibit strong nonlinear relation with respect to the fault severity. For this reason, we can conclude that the current testing accuracy of this particular scenario is satisfactory.

Table 5

Testing result (scenario 1 in Case 3)

Minor chipping tip
InaccurateConfidentUnconfidentAccurateConfidentUnconfident
Type 1Type 5
14113904941
Minor chipping tip
InaccurateConfidentUnconfidentAccurateConfidentUnconfident
Type 1Type 5
14113904941

In order to highlight the enhanced performance of BCNN model for extended fault diagnosis, we also create a CNN model with the same architecture configuration for performance comparison. It is worth noting that the number of trainable parameters is significantly reduced (i.e., 1,141,256) in the CNN model because the weights and biases become deterministic. The same training and test data split is used for classification analysis. The result shows that 60 out of 104 samples will be correctly classified as fault type 5, and the rest will be misclassified as fault type 1. The testing accuracy hence is 57.69% (60/104), which is significantly worse than the 86.5% (90/104) achieved by the BCNN model. Additionally, the CNN model only supports deterministic decision-making without considering the uncertainty effect, leading to irreversible false prediction. Overall, the BCNN model greatly outperforms its counterpart, illustrating its feasibility in performing extended fault diagnosis.

4.3.2 Scenario 2: Classification of a Severe Chipping Tip Fault as Unseen Fault Condition.

To further validate the BCNN, we formulate another scenario, and revisit the BCNN modeling and analysis for performance reexamination. In the second scenario, we hold out the data samples belonging to chipping_tip_2 (fault type 8), which is a severe chipping tip severity. Once the model is established following the same procedures indicated above, we can carry out the testing process using the hold-out samples. In this scenario, the classification is considered correct if a testing sample is classified as being close to chipping_tip_3 (fault type 7) or chipping_tip_1 (fault type 9). The prediction results over one particular sample and entire testing space are shown in Figs. 15 and 16, respectively. The classification pattern of type 9 (chipping_tip_1) in the plot is significant as it shows the model correctly identifies type 8 as most similar to type 9, validating the model's capability to discern nuanced differences between fault severities. As can be seen clearly, fault type 2, i.e., missing tooth becomes the major interference. This makes very good sense since the severe chipping tip resembles the missing tooth in terms of the fault severity. Through analyzing the prediction distribution information, we can obtain the testing result as tabulated in Table 6. Only 4 testing samples are misclassified to fault type 2 and the rest all are correctly classified as fault type 8, yielding the high testing accuracy, i.e., 96.15% (100/104). Among those correctly classified samples, 85 samples are classified confidently, showing that type 8 indeed is the most probable fault to represent the unseen fault. On the other hand, all misclassified samples are subject to unconfident decision-making, which may also have the probability to belong to the fault type 8. Similarly, a CNN model with the same architecture is established to perform classification analysis. 79 samples are correctly classified (33 and 46 samples are classified as fault type 7 and 9, respectively). Testing accuracy thus is 75.96% (79/104), which still is much lower than that of BCNN model. Again, the results in this scenario demonstrate the effectiveness of the proposed BCNN for extended fault diagnosis. To determine the threshold for prediction confidence in future cases, one can observe the probability distribution across classes. In our analysis, the 0.2 threshold corresponds to a point of minimum occurrence in the distribution, marking the transition from confident to uncertain predictions. This natural threshold can be identified visually or refined using statistical methods like kernel density estimation or analyzing distribution overlaps to enhance confidence assessment. In comparison to other fault diagnosis methods, such as Gaussian Process, Naive Bayesian, CNN, and Fuzzy Neural Networks, our BCNN approach uniquely combines effective feature extraction with robust uncertainty quantification. While methods like Gaussian Process and Naïve Bayesian offer probabilistic outputs, they struggle with scalability and capturing complex feature relationships. CNNs excel in feature extraction but lack uncertainty quantification, and FNNs require extensive manual tuning. BCNN addresses these limitations by integrating CNN's strengths with Bayesian inference, offering a more comprehensive and flexible solution for fault diagnosis.

Fig. 15
Illustration of probabilistic prediction over particular testing sample (scenario 2 in Case 3)
Fig. 15
Illustration of probabilistic prediction over particular testing sample (scenario 2 in Case 3)
Close modal
Fig. 16
Classification probability outputs of testing samples (scenario 2 in Case 3)
Fig. 16
Classification probability outputs of testing samples (scenario 2 in Case 3)
Close modal
Table 6

Testing result (scenario 2 in Case 3)

Severe chipping tip
InaccurateConfidentUnconfidentAccurateConfidentUnconfident
Type 2Type 8
4041008515
Severe chipping tip
InaccurateConfidentUnconfidentAccurateConfidentUnconfident
Type 2Type 8
4041008515

While the proposed framework demonstrates significant potential, it also presents certain limitations that should be addressed in future research. The computational complexity associated with Bayesian inference, especially when scaling to larger systems, poses a challenge for real-time applications in industrial settings. Additionally, the accuracy of the model's predictions is sensitive to the selection of appropriate priors, which could lead to suboptimal outcomes if not carefully chosen. Furthermore, although the framework has been successfully validated on a lab-scale gearbox system, its generalizability to other types of machinery and more complex fault scenarios requires further exploration. Future research should focus on enhancing computational efficiency, developing adaptive methods for prior selection, and expanding the framework's applicability to a broader range of industrial diagnostics.

5 Conclusion

In this study, we introduce a novel probabilistic fault diagnosis framework that leverages the strengths of Bayesian-inference in convolutional neural networks. Unlike traditional deep learning methods, our approach enhances fault diagnosis by incorporating probabilistic decision-making, accounting for prediction uncertainties and thereby bolstering diagnostic robustness. This framework inherits the feature extraction capabilities of CNNs while adopting a Bayesian methodology to probabilistically understand hidden data correlations. Through the use of variational inference, expedited by integrating Monte Carlo dropout techniques into the network layers, we streamline the Bayesian-based learning process. Our model has been applied to the fault diagnosis of a lab-scale gearbox system, showcasing its ability to accurately diagnose gear faults even in the presence of significant noise. Notably, our approach excels in extended diagnostics, identifying unknown or unseen faults by their proximity to known fault categories, thus offering crucial insights for practical decision-making. This research marks a significant step forward in applying probabilistic decision-making to gear fault diagnosis, addressing the critical challenges of uncertainties and unseen faults commonly encountered in real-world scenarios. While the developed framework shows significant promise for application across various machinery diagnosis endeavors, the framework's scalability and applicability could be further enhanced by refining computational efficiency and exploring broader industrial scenarios.

Acknowledgment

The corresponding author also appreciates the start-up fund support from The Hong Kong Polytechnic University.

Funding Data

  • NSF (Grant Nos. CMMI – 2138522 and IIS – 1741174; Funder ID: 10.13039/100000001).

Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.

Nomenclature

Abbreviations
BCNN =

Bayesian convolutional neural networks

BiConvLSTM =

bidirectional convolutional long short-term memory

CNNs =

convolutional neural networks

FNNs =

fuzzy neural networks

KL =

Kullback-Leibler

MCMC =

Markov Chain Monte Carlo

PDF =

probability density function

ReLU =

rectified linear unit

SGD =

stochastic gradient descent

TSA =

time synchronous averaging

References

1.
Afia
,
A.
,
Rahmoune
,
C.
,
Benazzouz
,
D.
,
Merainani
,
B.
, and
Fedala
,
S.
,
2020
, “
New Intelligent Gear Fault Diagnosis Method Based on Autogram and Radial Basis Function Neural Network
,”
Adv. Mech. Eng.
,
12
(
5
), p.
168781402091659
.10.1177/1687814020916593
2.
Zhang
,
S.
, and
Tang
,
J.
,
2018
, “
Integrating Angle-Frequency Domain Synchronous Averaging Technique With Feature Extraction for Gear Fault Diagnosis
,”
Mech. Syst. Signal Process.
,
99
, pp.
711
729
.10.1016/j.ymssp.2017.07.001
3.
Feng
,
Z.
,
Gao
,
A.
,
Li
,
K.
, and
Ma
,
H.
,
2021
, “
Planetary Gearbox Fault Diagnosis Via Rotary Encoder Signal Analysis
,”
Mech. Syst. Signal Process.
,
149
, p.
107325
.10.1016/j.ymssp.2020.107325
4.
Wang
,
T.
,
Han
,
Q.
,
Chu
,
F.
, and
Feng
,
Z.
,
2019
, “
Vibration Based Condition Monitoring and Fault Diagnosis of Wind Turbine Planetary Gearbox: A Review
,”
Mech. Syst. Signal Process.
,
126
, pp.
662
685
.10.1016/j.ymssp.2019.02.051
5.
Sun
,
R.
,
Yang
,
Z.
,
Chen
,
X.
,
Tian
,
S.
, and
Xie
,
Y.
,
2018
, “
Gear Fault Diagnosis Based on the Structured Sparsity Time-Frequency Analysis
,”
Mech. Syst. Signal Process.
,
102
, pp.
346
363
.10.1016/j.ymssp.2017.09.028
6.
Wang
,
Y.
,
He
,
Z.
, and
Zi
,
Y.
,
2010
, “
Enhancement of Signal Denoising and Multiple Fault Signatures Detecting in Rotating Machinery Using Dual-Tree Complex Wavelet Transform
,”
Mech. Syst. Signal Process.
,
24
(
1
), pp.
119
137
.10.1016/j.ymssp.2009.06.015
7.
Lin
,
J.
, and
Zuo
,
M. J.
,
2003
, “
Gearbox Fault Diagnosis Using Adaptive Wavelet Filter
,”
Mech. Syst. Signal Process.
,
17
(
6
), pp.
1259
1269
.10.1006/mssp.2002.1507
8.
Rafiee
,
J.
,
Rafiee
,
M. A.
, and
Tse
,
P. W.
,
2010
, “
Application of Mother Wavelet Functions for Automatic Gear and Bearing Fault Diagnosis
,”
Expert Syst. Appl.
,
37
(
6
), pp.
4568
4579
.10.1016/j.eswa.2009.12.051
9.
Li
,
C.
,
Sanchez
,
R.-V.
,
Zurita
,
G.
,
Cerrada
,
M.
,
Cabrera
,
D.
, and
Vásquez
,
R. E.
,
2016
, “
Gearbox Fault Diagnosis Based on Deep Random Forest Fusion of Acoustic and Vibratory Signals
,”
Mech. Syst. Signal Process.
,
76–77
, pp.
283
293
.10.1016/j.ymssp.2016.02.007
10.
Widodo
,
A.
, and
Yang
,
B.
,
2008
, “
Wavelet Support Vector Machine for Induction Machine Fault Diagnosis Based on Transient Current Signal
,”
Expert Syst. Appl.
,
35
(
1–2
), pp.
307
316
.10.1016/j.eswa.2007.06.018
11.
Chen
,
C.
,
Shen
,
F.
,
Xu
,
J.
, and
Yan
,
R.
,
2020
, “
Probabilistic Latent Semantic Analysis-Based Gear Fault Diagnosis Under Variable Working Conditions
,”
IEEE Trans. Instrum. Meas.
,
69
(
6
), pp.
2845
2857
.10.1109/TIM.2019.2925410
12.
Huang
,
X.
,
Xie
,
T.
,
Luo
,
S.
,
Wu
,
J.
,
Luo
,
R.
, and
Zhou
,
Q.
,
2024
, “
Incremental Learning With Multi-Fidelity Information Fusion for Digital Twin-Driven Bearing Fault Diagnosis
,”
Eng. Appl. Artif. Intell.
,
133
, p.
108212
.10.1016/j.engappai.2024.108212
13.
Huang
,
X.
,
Xie
,
T.
,
Wu
,
J.
,
Zhou
,
Q.
, and
Hu
,
J.
,
2024
, “
Deep Continuous Convolutional Networks for Fault Diagnosis
,”
Knowl.-Based Syst.
,
292
, p.
111623
.10.1016/j.knosys.2024.111623
14.
Cao
,
P.
,
Zhang
,
S.
, and
Tang
,
J.
,
2018
, “
Preprocessing-Free Gear Fault Diagnosis Using Small Datasets With Deep Convolutional Neural Network-Based Transfer Learning
,”
IEEE Access
,
6
, pp.
26241
26253
.10.1109/ACCESS.2018.2837621
15.
Goodfellow
,
I.
,
Bengio
,
Y.
, and
Courville
,
A.
,
2016
,
Deep Learning
,
MIT Press
, Cambridge, MA.
16.
Zhou
,
Q.
, and
Tang
,
J.
,
2024
, “
An Interpretable Parallel Spatial CNN-LSTM Architecture for Fault Diagnosis in Rotating Machinery
,”
IEEE Internet Things J.
,
11
(
19
), pp.
31730
31744
.10.1109/JIOT.2024.3422969
17.
Kim
,
S.
, and
Choi
,
J.-H.
,
2019
, “
Convolutional Neural Network for Gear Fault Diagnosis Based on Signal Segmentation Approach
,”
Struct. Health Monit.
,
18
(
5–6
), pp.
1401
1415
.10.1177/1475921718805683
18.
Chen
,
P.
,
Li
,
Y.
,
Wang
,
K.
, and
Zuo
,
M. J.
,
2021
, “
An Automatic Speed Adaption Neural Network Model for Planetary Gearbox Fault Diagnosis
,”
Measurement
,
171
, p.
108784
.10.1016/j.measurement.2020.108784
19.
Shi
,
J.
,
Peng
,
D.
,
Peng
,
Z.
,
Zhang
,
Z.
,
Goebel
,
K.
, and
Wu
,
D.
,
2022
, “
Planetary Gearbox Fault Diagnosis Using Bidirectional-Convolutional LSTM Networks
,”
Mech. Syst. Signal Process.
,
162
, p.
107996
.10.1016/j.ymssp.2021.107996
20.
Li
,
Y.
,
Cheng
,
G.
,
Liu
,
C.
, and
Chen
,
X.
,
2018
, “
Study on Planetary Gear Fault Diagnosis Based on Variational Mode Decomposition and Deep Neural Networks
,”
Measurement
,
130
, pp.
94
104
.10.1016/j.measurement.2018.08.002
21.
Saufi
,
S. R.
,
Bin Ahmad
,
Z. A.
,
Leong
,
M. S.
, and
Lim
,
M. H.
,
2020
, “
Gearbox Fault Diagnosis Using a Deep Learning Model With Limited Data Sample
,”
IEEE Trans. Ind. Inf.
,
16
(
10
), pp.
6263
6271
.10.1109/TII.2020.2967822
22.
Zhou
,
K.
,
Diehl
,
E.
, and
Tang
,
J.
,
2023
, “
Deep Convolutional Generative Adversarial Network With Semi-Supervised Learning Enabled Physics Elucidation for Extended Gear Fault Diagnosis Under Data Limitations
,”
Mech. Syst. Signal Process.
,
185
, p.
109772
.10.1016/j.ymssp.2022.109772
23.
Hu
,
C.
,
Wu
,
J.
,
Sun
,
C.
,
Yan
,
R.
, and
Chen
,
X.
,
2023
, “
Interinstance and Intratemporal Self-Supervised Learning With Few Labeled Data for Fault Diagnosis
,”
IEEE Trans. Ind. Inf.
,
19
(
5
), pp.
6502
6512
.10.1109/TII.2022.3183601
24.
Tang
,
Z.
,
Bo
,
L.
,
Liu
,
X.
, and
Wei
,
D.
,
2022
, “
A Semi-Supervised Transferable LSTM With Feature Evaluation for Fault Diagnosis of Rotating Machinery
,”
Appl. Intell.
,
52
(
2
), pp.
1703
1717
.10.1007/s10489-021-02504-1
25.
Yu
,
K.
,
Lin
,
T. R.
,
Ma
,
H.
,
Li
,
X.
, and
Li
,
X.
,
2021
, “
A Multi-Stage Semi-Supervised Learning Approach for Intelligent Fault Diagnosis of Rolling Bearing Using Data Augmentation and Metric Learning
,”
Mech. Syst. Signal Process.
,
146
, p.
107043
.10.1016/j.ymssp.2020.107043
26.
Stone
,
J. V.
,
2013
, “
Bayes' Rule: A Tutorial Introduction to Bayesian Analysis
,” Sebtel Press, Frederick, MA.
27.
Huang
,
X.
,
Xie
,
T.
,
Hu
,
J.
, and
Zhou
,
Q.
,
2024
, “
Three-Dimensional Hybrid Fusion Networks for Current-Based Bearing Fault Diagnosis
,”
Meas. Sci. Technol.
,
35
(
2
), p.
025126
.10.1088/1361-6501/ad099b
28.
Avendaño-Valencia
,
L. D.
, and
Fassois
,
S. D.
,
2017
, “
Damage/Fault Diagnosis in an Operating Wind Turbine Under Uncertainty Via a Vibration Response Gaussian Mixture Random Coefficient Model Based Framework
,”
Mech. Syst. Signal Process.
,
91
, pp.
326
353
.10.1016/j.ymssp.2016.11.028
29.
Yu
,
J.
,
Bai
,
M.
,
Wang
,
G.
, and
Shi
,
X.
,
2018
, “
Fault Diagnosis of Planetary Gearbox With Incomplete Information Using Assignment Reduction and Flexible Naive Bayesian Classifier
,”
J. Mech. Sci. Technol.
,
32
(
1
), pp.
37
47
.10.1007/s12206-017-1205-y
30.
Zhou
,
K.
, and
Tang
,
J.
,
2021
, “
Structural Model Updating Using Adaptive Multi-Response Gaussian Process Meta-Modeling
,”
Mech. Syst. Signal Process.
,
147
, p.
107121
.10.1016/j.ymssp.2020.107121
31.
Zhou
,
K.
, and
Tang
,
J.
,
2018
, “
Uncertainty Quantification in Structural Dynamic Analysis Using Two-Level Gaussian Processes and Bayesian Inference
,”
J. Sound Vib.
,
412
, pp.
95
115
.10.1016/j.jsv.2017.09.034
32.
Pinto
,
R. C.
, and
Engel
,
P. M.
,
2015
, “
A Fast Incremental Gaussian Mixture Model
,”
PLoS One
,
10
(
10
), p.
e0139931
.10.1371/journal.pone.0139931
33.
Belyaev
,
M.
,
Burnaev
,
E.
, and
Kapushev
,
Y.
,
2014
, “
Exact Inference for gaussian process Regression in Case of Big Data With the Cartesian Product Structure
,”
arXiv
:1403.6573.10.48550/arXiv.1403.6573
34.
Liu
,
H.
,
Ong
,
Y.-S.
,
Shen
,
X.
, and
Cai
,
J.
,
2020
, “
When Gaussian Process Meets Big Data: A Review of Scalable GPs
,”
IEEE Trans. Neural Networks Learn. Syst.
,
31
(
11
), pp.
4405
4423
.10.1109/TNNLS.2019.2957109
35.
Ul Abideen
,
Z.
,
Ghafoor
,
M.
,
Munir
,
K.
,
Saqib
,
M.
,
Ullah
,
A.
,
Zia
,
T.
,
Tariq
,
S. A.
,
Ahmed
,
G.
, and
Zahra
,
A.
,
2020
, “
Uncertainty Assisted Robust Tuberculosis Identification With Bayesian Convolutional Neural Networks
,”
IEEE Access
,
8
, pp.
22812
22825
.10.1109/ACCESS.2020.2970023
36.
Kwon
,
Y.
,
Won
,
J. H.
,
Kim
,
B. J.
, and
Paik
,
M. C.
,
2020
, “
Uncertainty Quantification Using Bayesian Neural Networks in Classification: Application to Biomedical Image Segmentation
,”
Comput. Stat. Data Anal.
,
142
, p.
106816
.10.1016/j.csda.2019.106816
37.
Wei
,
Z.
, and
Chen
,
X.
,
2021
, “
Uncertainty Quantification in Inverse Scattering Problems With Bayesian Convolutional Neural Networks
,”
IEEE Trans. Antennas Propag.
,
69
(
6
), pp.
3409
3418
.10.1109/TAP.2020.3030974
38.
Sinha
,
S.
,
Franciosa
,
P.
, and
Ceglarek
,
D.
,
2021
, “
Object Shape Error Response Using Bayesian 3D Convolutional Neural Networks for Assembly Systems With Compliant Parts
,”
IEEE Trans. Ind. Inf.
,
17
(
10
), pp.
6676
6686
.10.1109/TII.2020.3043226
39.
Yang
,
H.
,
Jiao
,
S.
, and
Sun
,
P.
,
2020
, “
Bayesian-Convolutional Neural Network Model Transfer Learning for Image Detection of Concrete Water-Binder Ratio
,”
IEEE Access
,
8
, pp.
35350
35367
.10.1109/ACCESS.2020.2975350
40.
Haut
,
J. M.
,
Paoletti
,
M. E.
,
Plaza
,
J.
,
Li
,
J.
, and
Plaza
,
A.
,
2018
, “
Active Learning With Convolutional Neural Networks for Hyperspectral Image Classification Using a New Bayesian Approach
,”
IEEE Trans. Geosci. Remote Sens.
,
56
(
11
), pp.
6440
6461
.10.1109/TGRS.2018.2838665
41.
Portilla
,
J.
,
Strela
,
V.
,
Wainwright
,
M. J.
, and
Simoncelli
,
E. P.
,
2003
, “
Image Denoising Using Scale Mixtures of Gaussians in the Wavelet Domain
,”
IEEE Trans. Image Process.
,
12
(
11
), pp.
1338
1351
.10.1109/TIP.2003.818640
42.
Karabatak
,
M.
,
2015
, “
A New Classifier for Breast Cancer Detection Based on Naïve Bayesian
,”
Measurement
,
72
, pp.
32
36
.10.1016/j.measurement.2015.04.028
43.
Sun
,
H.
,
Mordret
,
A.
,
Prieto
,
G. A.
,
Toksöz
,
M. N.
, and
Büyüköztürk
,
O.
,
2017
, “
Bayesian Characterization of Buildings Using Seismic Interferometry on Ambient Vibrations
,”
Mech. Syst. Signal Process.
,
85
, pp.
468
486
.10.1016/j.ymssp.2016.08.038
44.
Niu
,
Z.
,
2020
, “
Frequency Response-Based Structural Damage Detection Using Gibbs Sampler
,”
J. Sound Vib.
,
470
, p.
115160
.10.1016/j.jsv.2019.115160
45.
Huang
,
Y.
,
Beck
,
J. L.
, and
Li
,
H.
,
2017
, “
Bayesian System Identification Based on Hierarchical Sparse Bayesian Learning and Gibbs Sampling With Application to Structural Damage Assessment
,”
Comput. Methods Appl. Mech. Eng.
,
318
, pp.
382
411
.10.1016/j.cma.2017.01.030
46.
Zhou
,
K.
, and
Tang
,
J.
,
2016
, “
Highly Efficient Probabilistic Finite Element Model Updating Using Intelligent Inference With Incomplete Modal Information
,”
ASME J. Vib. Acoust.
,
138
(
5
), p.
051016
.10.1115/1.4033965
47.
Gauvain
,
J.-L.
, and
Lee
,
C.-H.
,
1994
, “
Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains
,”
IEEE Trans. Speech Audio Process.
,
2
(
2
), pp.
291
298
.10.1109/89.279278
48.
Das
,
A.
, and
Debnath
,
N.
,
2020
, “
A Bayesian Model Updating With Incomplete Complex Modal Data
,”
Mech. Syst. Signal Process.
,
136
, p.
106524
.10.1016/j.ymssp.2019.106524
49.
Blei
,
D. M.
,
Kucukelbir
,
A.
, and
McAuliffe
,
J. D.
,
2017
, “
Variational Inference: A Review for Statisticians
,”
J. Am. Stat. Assoc.
,
112
(
518
), pp.
859
877
.10.1080/01621459.2017.1285773
50.
Shridhar
,
K.
,
Laumann
,
F.
, and
Liwicki
,
M.
,
2019
, “
A Comprehensive Guide to Bayesian Convolutional Neural Network With Variational Inference
,”
arXiv
:1901.02731.10.48550/arXiv.1901.02731
51.
Kullback
,
S.
,
1997
,
Information Theory and Statistics
,
Dover Publications
, Mineola, New York.
52.
Yang
,
X.
,
2017
, “
Understanding the Variational Lower Bound
,” Variational Lower bound, ELBO, Hard Attention,
epub
.https://xyang35.github.io/2017/04/14/variational-lower-bound/
53.
Goodman
,
G. S.
,
2019
,
The Law of Large Numbers: How to Make Success Inevitable
,
Gildan Media Corporation
, Old Saybrook, CO.
54.
Kingma
,
D. P.
, and
Welling
,
M.
,
2019
, “
An Introduction to Variational Autoencoders
,”
Found. Trends® Mach. Learn.
,
12
(
4
), pp.
307
392
.10.1561/2200000056
55.
Papadopoulou
,
O.
,
Zampoglou
,
M.
,
Papadopoulos
,
S.
, and
Kompatsiaris
,
I.
,
2019
, “
A Corpus of Debunked and Verified User-Generated Videos
,”
Online Inf. Rev.
,
43
(
1
), pp.
72
88
.10.1108/OIR-03-2018-0101
56.
Gal
,
Y.
,
2016
,
Uncertainty in Deep Learning
,
University of Cambridge
, Cambridge, UK.
57.
Abadi
,
M.
,
Barham
,
P.
,
Chen
,
J.
,
Chen
,
Z.
,
Davis
,
A.
,
Dean
,
J.
,
Devin
,
M.
, et al.,
2016
, “
TensorFlow: A System for Large-Scale Machine Learning
,”
Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation
(
OSDI ’16
), Savannah, GA, Nov. 2–4, pp.
265
283
.https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf
58.
Zhou
,
K.
,
Sun
,
H.
,
Enos
,
R.
,
Zhang
,
D.
, and
Tang
,
J.
,
2021
, “
Harnessing Deep Learning for Physics-Informed Prediction of Composite Strength With Microstructural Uncertainties
,”
Comput. Mater. Sci.
,
197
, p.
110663
.10.1016/j.commatsci.2021.110663
59.
Boehmke
,
B.
, and
Greenwell
,
B. M.
,
2019
,
Hands-On Machine Learning With R
,
CRC Press
, Boca Raton, FL.
60.
Sammut
,
C.
, and
Webb
,
G. I.
,
2011
,
Encyclopedia of Machine Learning
,
Springer Science & Business Media
, New York.
61.
Scott
,
D. W.
,
2015
,
Multivariate Density Estimation: Theory, Practice, and Visualization
,
Wiley
, Hoboken,NJ.