Azure ML Predictive Maintenance Template – data preparation and feature engineering

Predictive maintenance can be quite a challenge :). Some of you might have tried to build the Azure ML Predictive Maintenance Template by Microsoft.

In this template, you are guided through the steps that are required to build and deploy several predictive maintenance scenarios. These steps are offered as experiments in the Cortana Analytics Gallery and can be easily downloaded. The original data comes from the NASA prognostic data repository.

This blog focusses on step 1: Predictive Maintenance: Step 1 of 3, data preparation and feature engineering

First of all, I would recommend to download the original dataset from the NASA website and read the description.

Basically, we start with 3 datasets:

  • training data
  • test data
  • true Remaining Useful Life (RUL) values for the test data

The training data offers you a unit id, cycle number, 3 basic settings, and measures from 21 sensors. The last cycle indicates that until that specific cycle, the engine worked fine. With this information we can build labels, i.e. with information whether an engine will break in the next 15 cycles, or 30 cycles.

So far, so good.

But let’s have a look at the test data. This set also offers you a unit id, cycle number, 3 basic settings, and measures from 21 sensors.


In contrast to the training data, the true value of the RUL can not be found in the test data itself, but it is provided as a vector (true Remaining Useful Life (RUL) values).

It is unclear whether these values should be added to the last cycle, or that it indicates the maximum of cycles. In this case, we assume that we should add the true RUL value for that specific engine (unit id) to the last occuring cycle in the test data for that engine.

Therefore, in contrast to the code that Microsoft provides, we should first calculate the maximum cycle value, based on the last provided cycle in the test data in combination with the true RUL value. With this calculated maximum cycle value, we could now calculate the RUL per observation, and label them. As said, this contrasts the decision Microsoft made, which selects only the last cycle of every engine, and labels it based on the true RUL values, however, without checking the maximum cycle number. Besides, I would like to know why we would only use 100 test observations, and why it should be the last cycle of every engine.

Comments and ideas would be very welcome!

You can find this adapted experiment on the Cortana Analytics Gallery, however, please be aware that these changes in the preparation of the data will impact the next steps of the Predictive Maintenance template.

[1] A. Saxena and K. Goebel (2008). “Turbofan Engine Degradation Simulation Data Set”, NASA Ames Prognostics Data Repository (, NASA Ames Research Center, Moffett Field, CA

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.