Human Resources Analytics – Predict Employee Leave


Predict Employee Leave

In this tutorial, you will learn how to employ a simulated dataset from Kaggle to build a machine learning model to both predict and explain whether employees will leave their employer or not and the reason(s) why they may do so. The data comprise a wide range of topics which allow to explain employees’ leave behavior in relation with A) organizational factors (department); B) employment relational factors (i.e. tenure, the number of projects participated in; the average working hours per month; objective career development; salary); and C) job-related factors (performance evaluation; involvement in workplace accidents).

This tutorial has the objective to inspire you to explore the possibilities of using machine learning for your own research.

You will follow several steps to explore the data and build a machine learning model to predict whether an employee will leave or not, and why.

  • Step 1: Get a first understanding of the data
  • Step 2: Create the Experiment
  • Step 3: Prepare a training and a test set
  • Step 4: Train the model
  • Step 5: Score the test set
  • Step 6: Evaluate the results
  • Step 7: Gain insights on the why

You will build this prediction model with the Azure Machine Learning Studio. The complete model will look like this:

Human resources analytics - complete model

Prerequisites: Get Access to Azure Machine Learning Studio

There are several options to start with Azure ML. The easiest way is to got to https://azure.microsoft.com/en-us/services/machine-learning/ and click on the Get started now button.

get started with azure

Hereafter, you can select the Free Workspace option. You will need a Windows LiveID to sign in. If you don’t have one, you can sign up here: https://signup.live.com/

Step 1: Get a first understanding of the data

You can download the data at Kaggle https://www.kaggle.com/lnvardanyan/hr-analytics/data and save it as turnover.csv.

Note: If you have trouble obtaining the data, you can also start with thestarting experiment from the Azure AI Gallery. You would have to open the experiment in your studio. After this you can skip a few instructions, going to the  * to continue.

Predict employee leave starting model

For those that have downloaded the data, we can continue inspecting the dataset.

We have the following available variables in the dataset:

Organizational factors

  • Department

Employment relational factors

  • Time spent at the company
  • Number of projects
  • Average monthly hours
  • Salary
  • Whether they have had a promotion in the last 5 years

Job-related factors

  • Last evaluation
  • Whether they have had a work accident

Dependent variable

  • Whether the employee has left

Step 2: Create the Experiment

Open a browser and browse to https://studio.azureml.net. Then sign in using the Microsoft account associated with your Azure ML. Create a new blank experiment by clicking on the + NEW button in the left of your browser, and select EXPERIMENT, and subsequently BLANK EXPERIMENT. You can change the generated name into Predict Employee Leave.

Azure-ML-Create-Experiment

The next step is to upload the turnover.csv file to Azure ML and name it Employee Leave data. To do this, you have to click on the + NEW button in the left lower corner of your browser, and select DATASET, and subsequently FROM LOCAL FILE.
Azure-ML-Upload-datafile

In the Predict Employee Leave experiment, you can go to My Datasets under Saved datasets, and drag the Employee Leave data on the canvas.
employee leave data

* if you have started with the starting experiment, you can continue here:

To get a first impression of the data, you can right-click the output port of the dataset to visualize the data. You can scroll through the different columns, and by selecting them, you get an overview in the panel on the right.
inspect employee leave data

Another way to get a first impression of the data. Therefore we use the Summarize Data module, which gives us insights about the data.
summarize employee leave data

You have to RUN the model, and right-click on the output port of the Summarize Data module and select Visualize. We see that we have 14999 observations, and that we don’t miss any data. We also get an idea about the variance and distribution of the data.predict employee leave data summary

 

Step 3: Prepare a training and a test set

We split the dataset into a training and a test set, using 70% of the data to train the model with, and 30% of the data to test the model later on.  Therefore we drag the Split Data module on the canvas, and connect the output port of the dataset to the inport port of the Split Data module. We set a seed, so we can repeat this experiment.

Step 4: Train the model

Since we have split the data, we can continue to work with the training data set. We first select the Train model module and drag it on the canvas. But when we do so you will a little red exclamation mark. This is because we haven’t selected the variable that we want to predict and we haven’t defined the algorithm that we want to use to train the model with. First we will select the dependent variable. Therefore, we have to click on the Launch column selector.

train employee leave model

In order to set the dependent variable, we select the variable “left” (indicating whether an employee has left or not) from AVAILABLE COLUMNS and use the arrow button to get it to the right side, under “SELECTED COLUMNS”.

dependent variable employee leave

Furthermore, we have to select the algorithm to train the model with. In this experiment we use the Two-Class Boosted Decision Tree algorithm with the standard parametrization. We do add a seed to make this experiment replicable.

model algorithm employee leave

Step 5: Score the test set

After this, we are prepared to score the test set and see how our model performs. Therefore, we use the Score Model module and we connect both the output port of the Train model module, which contains the trained model, as the outcome of the Split Data set, containing the test data.

score employee leave model

Step 6: Evaluate the results

Finally, it’s time to evaluate the results of our model. We use the Evaluate Model module which we connect to the results of our prior scoring.

evaluate employee leave model

Let’s run the model, and then right click on the Evaluate Model module to visualize the results. We can predict with 98% accuracy and 98% precision.

Human resources analytics - model evaluation

Step 7: Gain insights on the why

Our final question was why employees were leaving. Therefore, we could add the Permutation Feature Importancy module. We connect the output port of the Train Model module and the output port of the Split Data module. Now we can compute the permutation feature importance scores of feature variables given this trained model and the test dataset. We set a seed to make the experiment replicable, and we focus on accuracy, meaning that we are both interested in selected correctly the people that leave, and the people that will not leave.

feature importance employee leave model

If we run the model, and right-click on the output port of the Permutation Feature Importance module, we find that satisfaction was one of the main factors when leaving, according to this dataset.

Human resources analytics - permutation feature importance results

Limitations

Of course there is much information missing. We don’t know anything about the dates of the obtained data, nor do we know anything between the data gathering and the moment that the employee left.

Inspiration

As mentioned before, this tutorial is created to inspire you. If for whatever reason you were struggling to get the model built, you can also download the complete model from the Azure AI Gallery. If you want to know more about this, we would like to invite you to take a look at the Principles of Machine Learning course, offered by DataChangers (proudly part of MD2C)

We hope you enjoyed this tutorial. Please feel free to leave us your comments!

 

 

 

 

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.