The Future is Now: The Potential of Combining Machine Learning and Condition Monitoring

No matter where you search for the terms condition monitoring or machine learning, search engines find more than 1 billion hits. That is hardly surprising, as in times of digitalization, Industry 4.0 and IoT, they are a central part of networking, data collection and processing. Their relevance has increased enormously in companies in recent years — there is no doubt that these buzzwords are on everyone's lips. With a practical test as part of his diploma thesis, Christoph Wilding took a close look at what condition monitoring of machines, systems and devices can do and where machine learning starts to automatically derive patterns and relationships from data. In the interview, graduate engineer Christoph Wilding and data scientist Maik Pertermann answered the most important questions and pinned down the buzzwords based on their complexity using the simple, everyday example of a washing machine.

What does condition monitoring do and where can it be used?

Christoph Wilding: Condition monitoring of machines, systems and devices is particularly useful for understanding the status of machines over the course of time and to ensure that they run optimally and efficiently. The status assessment is always a snapshot that is evaluated as a trend over time and therefore makes performance changes visible. This in turn makes it possible to detect the wear and tear of machine parts and to identify the affected components, the components causing the problem, and to initiate process optimizations.

Maik Pertermann: This type of monitoring also has its limits. That is because if the data associated with condition monitoring is recorded only cyclically (e.g. once a week), it is usually not possible to provide a meaningful analysis and in turn a reliable forecast of possible downtime because the data is so limited. The answer is systems that monitor the machine status in real time 24/7, automate the data flow and enable continuous analysis.

What are the biggest challenges when retrofitting machinery and systems?

Christoph Wilding: The biggest challenges for companies considering a retrofit are the high conversion costs, lack of know-how and existing systems that do not have suitable sensors and/or interfaces. However, this contradicts the goals that many companies have set themselves:

Advancing digitalization
Increasing machine efficiency
Continuous monitoring of machines and systems
Reducing downtime by using predictive maintenance

The aim is to solve this dilemma by developing the monitoring system during the practical test.

Why was a washing machine chosen for this retrofit use case?

Christoph Wilding: The washing machine was chosen because it is very similar to machines used in industry. From the outside, it is a kind of black box, has many programs and a high level of complexity for a household appliance, as well as numerous components such as water inlet valves, a drum submerged in soapy water that rotates at different speeds and with different loads depending on the weight of the laundry that is fed in. The operating mode of the washing machine depends on the following elements, which also influence other actuators of the washing machine:

Washing temperature (e.g. 40°C or 60°C), which is selected by the user and influences the behavior of the heating element
Washing program, e.g. high temperature or sensitive colors
Washing phases, e.g. rinsing phases
Machine activities, e.g. washing

How was the appropriate monitoring system set up for the washing machine?

Christoph Wilding: We started by choosing sensors that record the behavior of the washing machine as best possible and which can be used to determine measured values on the washing machine. These included sensors to record vibrations and audio signals, as well as power consumption by measuring the current draw. Based on this, the measured variables were assigned to system states, and we looked for characteristics that make the data usable with machine learning algorithms. It was only after the design phase that most of the data was collected.

Figure 2: Mobile application for entering machine activity

What was the setup like on the machine?

Christoph Wilding: The setup consisted of a standard washing machine, which was equipped with a power meter from the manufacturer myStrom for measuring power consumption and with a multi-sensor from Bosch, the Bosch Connected Industrial Sensor Solution (Bosch CISS, accelerometer, gyroscope, magnetometer, temperature sensor) and by Sensry, the Kallisto (accelerometer, gyroscope, magnetometer). The sensors were each installed on the front of the washing machine under the door so as not to have to intervene invasively into the machine. The microphones integrated into Android notebooks, smartphones and tablet PCs were used for audio recording, which also ran the measurement application (graphical interface) for entering the metadata of the washing process (washing program, temperature, options).

Over which period was data collected and what framework conditions were created

Christoph Wilding: For the practical test, a total of 30 wash cycles were carried out with three different types of washing machines using different program sequences and loads over a period of 10 weeks. Depending on their characteristics, the data were recorded at different frequencies in the range of seconds (audio 44.1 kHz, multi-sensor 100 Hz, power consumption 0.2 Hz).

What was the setup like on the machine?

Christoph Wilding: The monitoring system consisted of three components:

Detection of machine activity
Detection of washing temperature, phase and program
Detection of anomalies or deviations from data learned

What was the procedure for detecting machine activity?

Christoph Wilding: The procedure is best described using the following steps:

Label aggregated metrics
Extract features
Train learning algorithm: neural network, decision tree (CART), support vector machine (SVM)
Evaluate trained models based on new data sets that were not used for training and further optimization of the models
Identify machine activity using new data sets and compare different learning algorithms

Which metrics had a particular influence on the monitoring system?

Christoph Wilding: The monitoring system is affected by vibration, audio signal, power consumption and process time. Vibration is the variable frequently used in practice to determine the operating mode and machine condition. The different vibrations make it possible to draw conclusions about machine activity. The audio signal is used to determine the operating mode and is particularly effective in detecting anomalies in combination with machine learning. The power consumption of the washing machine is affected by the temperature and the wash cycle selected. Here, the instantaneous power consumption provides good conclusions about the activity of the heating element and the pattern of power consumption during the washing program used. The process time and the running time of the various washing programs differ significantly from one another due to the length of the washing phase, which makes it easier to recognize the washing program.

What are the prerequisites to be able to successfully carry out machine learning?

Christoph Wilding: In short, data, data, and more data — and not just any data, but high-quality and meaningful data that was obtained in similar conditions. The influencing factors should be known.

Maik Pertermann: Another requirement is, of course, the right choice of sensor technology, which requires a good understanding of the program and process flows.

How are anomalies detected?

Christoph Wilding: In order to detect anomalies, the first step is that data from the trouble-free operation (standard state) of the live system is constantly available so that they can be qualified and used to detect deviations. The second step is to narrow down deviations. People and domain knowledge often play a decisive role in naming the anomaly...

Maik Pertermann: ... and, as a data scientist, to create a suitable model for it..

Figure 5: Mobile application for monitoring the operating mode and notification when anomalies occur during operation

What are the limits and restrictions when it comes to anomaly detection?

Maik Pertermann: If a wide variety of high-quality data, which includes the normal state and anomalies, is available over a sufficiently long period of time and is available to train a model, then very good anomaly detection is possible. If this data is incomplete, anomaly detection reaches its limits. In addition, the anomaly must be detectable with sensors. For example, it would not be possible to detect if the washing machine is leaking because the sensors are just not designed for that.

Which learning method was used? How can outliers be detected in the data?

Christoph Wilding: Supervised learning was used to recognize machine activity by using a training data set with labels to train the machine learning algorithms. For this purpose, the anomalies were determined in advance on the basis of the components contained in the washing machine: motor defect (P1), shock absorber defect (P2), solenoid valve defect (P3), waste water pump (P4) defect, V-belt defect (P5) or heating element defect (P6). The machine status was then detected using unknown data with the assistance of the trained model.

How did you validate the accuracy of machine activity detection?

Christoph Wilding: The so-called F1 score was used to assess the quality, which combines the evaluation metrics Precision (rate of false alarms) and Recall (rate of correctly detected anomalies) to create an overall accuracy. With the Support Vector Machine (SVM) learning algorithm, machine activity could be determined with an accuracy of 98%, taking into account the overall data. The highest F1 scores were achieved for the sub-quantity data vibration (83%) and audio signal (97%) using SVM and for power consumption (90%) using the decision tree (CART). The confusion matrix below, which is based on the combined sensor data for CART, is intended to show how these results are clearly visualized.

Figure 5: Confusion matrix based on combined sensor data for CART

What are the benefits of predictive maintenance?

Christoph Wilding: Predictive maintenance refers to the maintenance of machines based on their actual condition, which can be determined in particular by evaluating sensor data and condition monitoring. This allows maintenance to be optimized so that maintenance costs can be reduced, the service life of the machine extended and productivity increased.
Which tools are needed to carry out this kind of retrofit?

Christoph Wilding: There are three things I would want to mention here:

Automated collection of data
Experience using machine learning tools such as Python, decision trees and artificial neural networks such as Tensor Flow, as well as Microsoft Azure cloud computing for training
Data transformation and statistics, because similar data dimensions of different metrics (e.g. temperatures and humidity) are essential in order to be able to process them further

Maik Pertermann: There are two more points that I would add. Firstly, the direct exchange of knowledge between local machine operators, owners and engineers, who are familiar with the machine and may be familiar with the characteristics, is essential for analysts such as data scientists. Secondly, thoroughly preprocessing data, such as cleaning up outliers, as this significantly influences the quality of the results.

How did measuring the values work in practice? Were there any complications and how were they resolved?

Christoph Wilding: There are various things that need to be considered here. The vibration data was not very meaningful, which was mainly due to the calibration of the sensors. In the course of the practical test, it also became apparent that a high-frequency module would have been more suitable, and being able to recalibrate the system should also be taken into account in advance. Placing the sensor near the motor, for optimum vibration readings, was not considered due to the conditions of the practical test. This would have significantly increased the accuracy of the results compared to mounting them outside the housing. Since the metal housing has a shielding effect, the temperature could not be measured adequately and the effects of the outside temperature could not be reliably eliminated. The magnetic field data, for example to determine the activity and rotational speed of the motor, was also not recorded very precisely, as the sampling rate was insufficient and the magnetic field of the motor was heavily shielded by the washing machine housing. On top of that, there limited accuracy in the timestamps of measurement values to start with due to the time delay in USB transmission from the CISS sensor.

Maik Pertermann: Most of the challenges of recording the various sensor data were solved by taking a very close look at the time stamps. Once we knew which sensor data included that delay, they could be corrected accordingly and then synchronized with each other.

Which measured values had the biggest impact on detecting machine activity? Which findings were the most surprising?

Maik Pertermann: What was particularly surprising here was that a powerful model could be created using the audio signal alone together with a fairly simple algorithm, which provided a very good hit rate in terms of machine activity and its prediction. The fact that vibration actually has such a small influence on the detection of machine activity was very surprising. That is because most people are likely to have this in their heads as a particularly characteristic feature.

Christoph Wilding: Another important measurement value was power consumption, which allowed conclusions to be drawn about both the wash temperature and the machine activity. When power consumption is high, the washing machine is either consuming up to 2000 kW during the heating up phase, or it is currently spinning. When power consumption is low, the washing machine is in wash or soaking mode, or water is being fed in or pumped out. The vibration data could have been dispensed, since the vibration data collected contained too much noise and therefore too few clear signals.
The first patterns and correlations could be derived from the audio signal and power consumption after just a few washes and were confirmed by the high accuracy of the measurement results.

Which mindset is needed to successfully establish machine learning for retrofit use cases?

Maik Pertermann and Christoph Wilding:

You need to enjoy understanding and analyzing processes.
Experiment with sensors and don't be deterred by failure.
Have an interest in machine learning and analyzing data.
Enjoy experimentation, which includes parameter selection and learning algorithms.
Don't be afraid of failures: Entire series of measurements using misplaced or incorrectly calibrated sensors need to be discarded or re-measured if necessary, because the measurement parameters do not match. It is possible that the most expensive component provides the least benefit.
Cost-benefit analysis in advance is the key if condition monitoring is to be used long term.

How could the monitoring system be up-scaled and what volumes of data are needed so that it is statistically representative?

Christoph Wilding: One way it could be up-scaled would be to extend the monitoring system to other washing machines and other machinery. There are also more adjustments that can be made. These range from the use of high-frequency sensors and applying other statistical methods to the use of artificial neural networks and a larger volume of data for analysis and model development.

Maik Pertermann: The scope of data often depends on the cycle times of the processes. Rapidly changing processes need higher process data collection frequencies to detect even small changes.

Are you also thinking about introducing condition monitoring or implementing a machine learning project and would like to exchange ideas with our experts? You can make an appointment here.