Spectrogram from short time Fourier transform of the Surface-Mounted Device (SMD) machine sound with sampling rate = 44,100 Hz, window = Hann, window size = 2048 samples and overlap length = 512 samples. 

Spectrogram from short time Fourier transform of the Surface-Mounted Device (SMD) machine sound with sampling rate = 44,100 Hz, window = Hann, window size = 2048 samples and overlap length = 512 samples. 

Source publication
Article
Full-text available
Detecting an anomaly or an abnormal situation from given noise is highly useful in an environment where constantly verifying and monitoring a machine is required. As deep learning algorithms are further developed, current studies have focused on this problem. However, there are too many variables to define anomalies, and the human annotation for a...

Context in source publication

Context 1
... terms of the hyper-parameters in STFT, the higher the sampling rate, the higher the maximum frequency that can be analyzed. As we get more details of the frequency bands, the performance often gets improved; however, the data processing burden increases at the same time. The motivation of this research is that the professional manager can successfully grasp the anomaly by the operation sound. We designed the system especially when the experts cannot monitor the machine constantly, and the goal of this research is to detect abnormalities and inform the managers. Therefore, we concluded that the optimal sampling rate is close to the audible frequency, 16 kHz. We have experimented on other values such as 44 K = kHz and detected that there is almost no available information in the high frequency bands, as shown in Figure 2. Most of the energy is concentrated on audible frequency. In addition, we have experimented with smaller window sizes and set the overlap length as a quarter and a half of each window size. However, all these cases showed that the frequency resolution was too small for the network to capture the frequency difference between abnormal and normal data and that smaller overlap could not capture the intermittent noise. As a result, all STFT hyper-parameters depicted in Figure 3 have been determined through several experiments. After applying the STFT to the audio files that we want to classify, we segmented the STFT results about every 1.5 s. The segmentation interval also has been experimentally determined upon the fact that experts generally need sounds at least 2 s long in complex machines to identify their states. However, the segmentation interval can be adjusted according to the equipment type. In addition, the DC component was first removed, and we normalized all spectrograms because the amplitude differences depending on the recording volume could potentially exist, and this represents different ranges of the feature distribution. The height of the spectrogram matrix shown in Figure 3 indicates that the frequency resolution is 1024. Finally, we split the spectrogram into 32 columns for all data. More details about the acquisition process of audio data and the use of spectrograms in our model will be introduced in the next sections. ...

Similar publications

Article
Full-text available
Collecting reliable training samples plays a crucial role in improving the accuracy of land cover (LC) mapping products, which are essential foundational data for global environmental and climate change research. However, the process is labor-intensive and time-consuming, as it heavily relies on human interpretation. This article proposes an automa...
Article
Full-text available
Anomaly detection is a classical problem of identifying whether a query is an inlier or an outlier, with only inliers available during training. Reconstruction error is regarded to be a good metric for anomaly detection in many approaches, while they ignore the data distribution of inliers. In this study, we present a novel model leveraging data di...
Chapter
Full-text available
Internet of Things (IoT) sensors generate massive streaming data which needs to be processed in real-time for many applications. Anomaly detection is one popular way to process such data and discover nuggets of information. Various machine learning techniques for anomaly detection rely on pre-labelled data which is very expensive and not feasible f...
Article
Full-text available
Many real data sets contain numerical features (variables) whose distribution is far from normal (Gaussian). Instead, their distribution is often skewed. In order to handle such data it is customary to preprocess the variables to make them more normal. The Box–Cox and Yeo–Johnson transformations are well-known tools for this. However, the standard...

Citations

... Existing anomaly detection techniques for production equipment predominantly incorporate vibration monitoring, temperature surveillance, oil analysis, sound detection, and image recognition [1][2][3][4][5]. In the specific context of spinning equipment, these conventional methods are complemented by data related to evenness quality, which serves as a crucial criterion for assessment. ...
Article
Full-text available
Abnormal detection plays a pivotal role in the routine maintenance of industrial equipment. Malfunctions or breakdowns in the drafting components of spinning equipment can lead to yarn defects, thereby compromising the overall quality of the production line. Fault diagnosis of spinning equipment entails the examination of component defects through Wavelet Spectrogram Analysis (WSA). Conventional detection techniques heavily rely on manual experience and lack generality. To address this limitation, this current study leverages machine learning technology to formulate a semi-supervised anomaly detection approach employing a convolutional autoencoder. This method trains deep neural networks with normal data and employs the reconstruction mode of a convolutional autoencoder in conjunction with Kernel Density Estimation (KDE) to determine the optimal threshold for anomaly detection. This facilitates the differentiation between normal and abnormal operational modes without the necessity for extensive labeled fault data. Experimental results from two sets of industrial data validate the robustness of the proposed methodology. In comparison to conventional Autoencoder and prevalent machine learning techniques, the proposed approach demonstrates superior performance across evaluation metrics such as Accuracy, Recall, Area Under the Curve (AUC), and F1-score, thereby affirming the feasibility of the suggested model.
... Sound plays a key role in identifying a rare event, whether it's the anomalous machine sound classification [6,7], environmental sound classification [2], surveillance at public places (e.g. railway stations [2], subway stations [8], public squares [3], roads [4,[9][10][11] and homes [12]), anomalous health conditions [13,14] or management of cowsheds with low manpower support [15]. ...
Article
Full-text available
This paper presents a novel sound event detection (SED) system for rare events occurring in an open environment. Wavelet multiresolution analysis (MRA) is used to decompose the input audio clip of 30 seconds into five levels. Wavelet denoising is then applied on the third and fifth levels of MRA to filter out the background. Significant transitions, which may represent the onset of a rare event, are then estimated in these two levels by combining the peak-finding algorithm with the K-medoids clustering algorithm. The small portions of one-second duration, called ‘chunks’ are cropped from the input audio signal corresponding to the estimated locations of the significant transitions. Features from these chunks are extracted by the wavelet scattering network (WSN) and are given as input to a support vector machine (SVM) classifier, which classifies them. The proposed SED framework produces an error rate comparable to the SED systems based on convolutional neural network (CNN) architecture. Also, the proposed algorithm is computationally efficient and lightweight as compared to deep learning models, as it has no learnable parameter. It requires only a single epoch of training, which is 5, 10, 200, and 600 times lesser than the models based on CNNs and deep neural networks (DNNs), CNN with long short-term memory (LSTM) network, convolutional recurrent neural network (CRNN), and CNN respectively. The proposed model neither requires concatenation with previous frames for anomaly detection nor any additional training data creation needed for other comparative deep learning models. It needs to check almost 360 times fewer chunks for the presence of rare events than the other baseline systems used for comparison in this paper. All these characteristics make the proposed system suitable for real-time applications on resource-limited devices.
... To this end, potential damages can be reduced by suggesting maintenance beforehand or detecting defects when they occur. For this, anomaly detection is successfully applied by researchers in the industrial domain [8] [9]. An example for the application of fault detection is the early detection of machine defects by observing the vibration of machine parts using specifically placed vibration sensors [10] [11]. ...
Conference Paper
Full-text available
As the algorithms mature, the bottleneck in applying machine learning (ML) to process analysis, monitoring and control is often caused by the availability of suitable data and the cost of data acquisition. For many ML projects, datasets have been collected independently of subsequent analysis. In industrial production, data acquisition and coverage of possible process uncertainties pose challenges to the preparation of suitable datasets. This article discusses dataset generation for ML from scratch under the constraint of limited resources with process uncertainties. A new approach towards an adapted design of experiments (DOE) is proposed with the aim of sampling data more efficiently. In this way, we contribute to the challenge of preparing datasets for ML applications.
... Scene-based categorization for audio datasets is shown via Fig. 7. Owing to different recording locations for each dataset, the number of multi-scene datasets is much higher and more easily available as compared to those collected from one location, such as ICBHI [211] and SMD [212]. The single-scene datasets have been collected in indoor scenarios, whereas multiscene datasets have been created at various indoor and outdoor locations such as industry, roads, forests, etc. Figure 8 shows anomaly induction mode-based categorization of datasets. ...
Article
Full-text available
Multimedia anomaly datasets play a crucial role in automated surveillance. They have a wide range of applications expanding from outlier objects/ situation detection to the detection of life-threatening events. For more than 1.5 decades, this field has attracted a lot of research attention, and as a result, more and more datasets dedicated to anomalous actions and object detection have been developed. Tapping these public anomaly datasets enable researchers to generate and compare various anomaly detection frameworks with the same input data. This paper presents a comprehensive survey on a variety of video, audio, as well as audio-visual datasets based on the application of anomaly detection. This survey aims to address the lack of a comprehensive comparison and analysis of multimedia public datasets based on anomaly detection. Also, it can assist researchers in selecting the best available dataset for bench-marking frameworks. Additionally, we discuss gaps in the existing dataset and insights for future direction towards developing multimodal anomaly detection datasets.
... The mean squared error (MSE) is frequently used by generative neural network-based anomaly detection algorithms to determine if a situation is normal or not [38,[43][44][45][46][47][48][49]. Some designs [43][44][45] and [48] use the MSE as a target function for training. ...
... The mean squared error (MSE) is frequently used by generative neural network-based anomaly detection algorithms to determine if a situation is normal or not [38,[43][44][45][46][47][48][49]. Some designs [43][44][45] and [48] use the MSE as a target function for training. When employing MSE for both anomaly detection and training, metrics can be built using its straightforward equation, as shown in the following equation. ...
Article
Full-text available
Hajj is an annual Islamic event attended by millions of pilgrims every year from around the globe. It is considered to be the biggest religious event that includes large human crowds in the world. Managing such crowds and detecting abnormal behaviors is one of the most significant challenges for the host country, particularly the crowds of pilgrims. Most of the current solutions can only handle small-scale crowd management issues, that involve simple and clear abnormal behaviors. Therefore, there is a need to have a human abnormal behavior detection approach that can deal with large-scale crowd situations. This study aims to propose a computer vision-based framework that automatically analyzes video sequences and detects human abnormal behaviors. The Convolutional LSTM Autoencoder is used for analyzing video scenes and extracting valuable spatial and temporal features. The proposed approach has achieved a good loss reduction of 0.176587 in detecting abnormal pilgrims’ behavior. The results demonstrate a promising picture of the effectiveness of computer vision technologies to detect abnormal behavior in large-scale crowds.
... Since anomalous sounds can indicate system error or malicious activities, ASD has received much attention [1][2][3][4][5], which has been widely used in various applications, such as road surveillance [6,7], animal disease detection [8], and industrial equipment predictive maintenance [9]. Recently, ASD has also been used to monitor the abnormality of industrial machinery equipment, such as anomaly detection for surface-mounted device machine [10,11], and the Detection and Classification of Acoustic Scenes and Events (DCASE) challenge Task2 from 2020 to 2023 [12][13][14][15], to reduce the loss caused by machine damage and the cost of manual inspection. ...
... In unsupervised ASD, a method is to employ the autoencoder (AE) to learn the distributions of sound signals and perform anomaly detection. Conventional AE-based approaches adopt autoencoder to reconstruct multiple frames of spectrogram to learn the distribution of normal sounds, and then the reconstruction error is used to obtain the anomaly score for anomaly detection [10,12,[17][18][19]. However, the conventional AE-based methods do not work well for non-stationary ASD [20], as non-stationary normal sounds (e.g., sound signals of valves) can easily have larger reconstruction errors than abnormal sounds, thus deteriorating the detection performance. ...
... The AE-based methods are widely used for unsupervised ASD [10,12,17,18] An AE model is trained with normal sounds to learn their feature distribution. It implicitly assumes that it can reconstruct normal sounds better than anomalous sounds, so that anomalous sounds often have larger reconstruction errors than normal sound. ...
Article
Full-text available
Unsupervised anomalous sound detection (ASD) aims to detect unknown anomalous sounds of devices when only normal sound data is available. The autoencoder (AE) and self-supervised learning based methods are two mainstream methods. However, the AE-based methods could be limited as the feature learned from normal sounds can also fit with anomalous sounds, reducing the ability of the model in detecting anomalies from sound. The self-supervised methods are not always stable and perform differently, even for machines of the same type. In addition, the anomalous sound may be short-lived, making it even harder to distinguish from normal sound. This paper proposes an ID constrained Transformer-based autoencoder (IDC-TransAE) architecture with weighted anomaly score computation for unsupervised ASD. Machine ID is employed to constrain the latent space of the Transformer-based autoencoder (TransAE) by introducing a simple ID classifier to learn the difference in the distribution for the same machine type 1 and enhance the ability of the model in distinguishing anomalous sound. Moreover , weighted anomaly score computation is introduced to highlight the anomaly scores of anomalous events that only appear for a short time. Experiments performed on DCASE 2020 Challenge Task2 development dataset demonstrate the effectiveness and superiority of our proposed method.
... Anomalies are detected by measuring reconstruction error and likelihood. In [16], a model using autoencoders and residual error detects anomalies in sound sensor data for complex machines, aiding early maintenance planning. [14] introduces an LSTM-based real-time AD algorithm for time-series, adjusting the detection threshold based on pattern changes. ...
... Support Vector Machine (SVM) algorithms have distinct advantages in solving high-dimensional, small-sample, and non-linear problems when compared to traditional algorithms. However, the algorithm also has limitations, such as the kernel function being limited by Mercer conditions, the regularization being difficult to determine the coefficient, and its being features sensitive [17]. The disadvantage of being so reliant on expert experience limits its generalizability and practicality. ...
Article
Full-text available
In recent years, deep learning‐based fault diagnosis technology for high‐voltage circuit breakers (HVCB) has advanced significantly, but the working environment of HVCBs is complex, resulting in unsatisfactory fault diagnosis results of HVCBs in noisy environment and existing deep learning methods are difficult to solve this problem. This paper proposes a multi‐channel convolutional neural network combines dense residual structure and attention mechanism to achieve high‐precision and high‐robust diagnosis of HVCBs in noisy backgrounds. A dense residual network is introduced into the convolutional neural network to prevent feature loss during network propagation to preserve the difference information between the network layers as much as possible, Simultaneously, a channel attention mechanism is introduced to adaptively adjust the weights of different convolution channels. The model can extract multi‐scale features from the original signal and fully exploit the intrinsic relationship between the vibration signal and the HVCB's operating state. The experimental results show that the diagnostic method can still meet the requirements of fault diagnosis in the presence of noise, with an average diagnostic accuracy rate of 85.92% when the signal‐to‐noise ratio is −4. The model outperforms the traditional single‐channel model in terms of diagnostic accuracy and stability.
... Anomalous sound detection (ASD) aims to automatically determine whether the state of a target object is normal or anomalous by analyzing the sound emitted by the object [1][2][3][4][5][6][7]. ASD is commonly an unsupervised task due to the infrequent and varied occurrence of anomalous machine sounds in real-world scenarios [1,3,[5][6][7][8]. ...
... Anomalous sound detection (ASD) aims to automatically determine whether the state of a target object is normal or anomalous by analyzing the sound emitted by the object [1][2][3][4][5][6][7]. ASD is commonly an unsupervised task due to the infrequent and varied occurrence of anomalous machine sounds in real-world scenarios [1,3,[5][6][7][8]. Therefore, only normal sounds are employed for training to learn the audio feature distribution of normal sounds. ...
Preprint
Full-text available
Different machines can exhibit diverse frequency patterns in their emitted sound. This feature has been recently explored in anomaly sound detection and reached state-of-the-art performance. However, existing methods rely on the manual or empirical determination of the frequency filter by observing the effective frequency range in the training data, which may be impractical for general application. This paper proposes an anomalous sound detection method using self-attention-based frequency pattern analysis and spectral-temporal information fusion. Our experiments demonstrate that the self-attention module automatically and adaptively analyses the effective frequencies of a machine sound and enhances that information in the spectral feature representation. With spectral-temporal information fusion, the obtained audio feature eventually improves the anomaly detection performance on the DCASE 2020 Challenge Task 2 dataset.
... Damage can then be identified by the high reconstruction error associated with the data. This technique has been leveraged for both vibration and acoustic anomaly detection models (Anaissi and Zandavi, 2010;Bayram et al., 2021;Marchi et al., 2015;Oh and Yun, 2018). ...
Article
This paper proposes a new in-situ damage detection approach for wind turbine blades, which leverages blade-internal non-stationary acoustic pressure fluctuations caused by the mechanical loading as the main source of excitation. This acoustic excitation was leveraged for the detection of fatigue-related damage modes on a full-scale wind turbine blade undergoing edgewise fatigue testing. An unsupervised, data-driven structural health monitoring strategy was developed to learn the normal cavity-internal acoustic sequences generated by the blade’s load cycles and to detect damage-related anomalies in the context of those sequences. A linear cepstral-coefficient based feature set was used to characterize the cavity-internal acoustics and LSTM-autoencoders were trained to accurately reconstruct healthy-case sequences. The reconstruction error was then used to characterize anomalous acoustic patterns within the blade cavity. The technique was able to detect a damage event earlier than a strain-based system by 120,000 load cycles.