Human Resources Analytics – Why employees are leaving

In this blog about human resources analytics, we are building a model to predict whether an employee will leave or not, and we will also try to find out why they leave, according to the data. We will use a simulated dataset from Kaggle, which can be found here:

Fields in the dataset include:

  • Last evaluation
  • Number of projects
  • Average monthly hours
  • Time spent at the company
  • Whether they have had a work accident
  • Whether they have had a promotion in the last 5 years
  • Department
  • Salary
  • Whether the employee has left

We will build this prediction model with the Azure Machine Learning Studio. The complete experiment can be downloaded from the Cortana Intelligence Gallery.

First we will build the model to predict whether an employee will leave or not. Secondly, we will look at why, according to the data, this employee would be leaving.

The complet model will look like this:

Human resources analytics - complete model

Step 1: Get a first understanding of the data

After selecting the dataset, we first want to get a first impression of the data. Therefore we use the Summarize Data module, which gives us insights about the data. We see that we have 14999 observations, and that we don’t miss any data. We also get an idea about the variance and distribution of the data.

Human resources analytics - summarize data

Step 2: Prepare a training and a test set

We split the dataset with the Split Data module into a training and a test set, using 70% of the data to train the model with, and 30% of the data to test the model later on. We set a seed, so we can repeat this experiment.

Human resources analytics - split data


Step 3: Train the model

In this experiment we use the Two-Class Boosted Decision Tree algorithm with the standard parametrization. We do add a seed to make this experiment replicable.

Human resources analytics - boosted decision tree

With this algorithm, we train the model on the column “left”.

Human resources analytics - train model


Step 4: Score the test set

Now we are prepared to use the Score Model module and score the test set.

Step 5: Evaluate the results

Finally, we use the Evaluate Model module to evaluate our model  by using the results of our prior scoring. We can predict with 98% accuracy and 98% precision.

Human resources analytics - model evaluation

Step 6: Gain insights on the why

Our final question was why employees were leaving. Therefore, we could use the Permutation Feature Importancy module. We set a seed to make the experiment replicable, and we focus on accuracy.

Human resources analytics - permutation feature importance setting

We find that satisfaction was one of the main factors when leaving, according to this dataset.

Human resources analytics - permutation feature importance results

We hope you enjoyed this blog. Please feel free to leave us your comments!





Leave a comment