FAQ PROCESS – Frequently Asked Questions

Frequently Asked Questions about how to use the PROCESS templates

moderated mediation model 14 PROCESSYou can use the PROCESS macro from Andrew Hayes to calculate conditional effects within SPSS. If you want to visualize these effects, you can use one of our Excel templates, which are based on the SPSS output.

Current available PROCESS templates

Current PROCESS blogs

For an example how to use this template, you are welcome to read our blogs:

Graphing moderation of PROCESS v3.0 Model 1

Graphing conditional indirect effects with the MD2C Excel Template

We are very thankful for all the great feedback and questions we are receiving from fellow-researchers, and we hope that you will continue doing this.

With this list of Frequently Asked Question, we are trying to cover most of the question, but please don’t hesitate to reach out to us!

PROCESS Questions & Answers

Question: I can’t see any graph, what goes wrong?

Answer: Please check first if you have “.” or “,” as a decimal separator. The templates require a “.” as decimal separator. A quick solution could be the usage of “find-and-replace” in Excel, where you replace the “,” for the “.” for the input rows. Secondly, please check the axis of the graph: sometimes you have to reset them by right-clicking on the required axis, select “Format Axis”, and reset the Minimum and/or Maximum Bounds values.

FAQ PROCESS reset axis bound

Question: The axis are crossing the middle of the graphs (mostly at value 0), what can I do?

Answer: you can right-click on the required axis, select “Format Axis”, and set the value at which the axis crosses

FAQ PROCESS set axis

Question: There isn’t a template for my model, can you make one?

Answer: if you could contact us, and provide us the data (we won’t use it other than just for inspection and possible creation of the template), we can let you know if that is possible or not.

 

Human Resources Analytics – Predict Employee Leave

Human Resources Analytics - Why are employees leaving

Predict Employee Leave

In this tutorial, you will learn how to employ a simulated dataset from Kaggle to build a machine learning model to both predict and explain whether employees will leave their employer or not and the reason(s) why they may do so. The data comprise a wide range of topics which allow to explain employees’ leave behavior in relation with A) organizational factors (department); B) employment relational factors (i.e. tenure, the number of projects participated in; the average working hours per month; objective career development; salary); and C) job-related factors (performance evaluation; involvement in workplace accidents). Continue reading “Human Resources Analytics – Predict Employee Leave”

Meetup Instruction Guide Build your Bot

build your bot

Build your Bot

Workshop Setup and Instruction Guide to Build your Bot

In most of our Microsoft Data Science meetup, hosted by i.e. Infi, InSpark, Winvision, Macaw, among others, we organize workshops. This time you will learn how to build your bot with the Cognitive Services of Microsoft. In this workshop you wil build a Question & Answer Bot. This type of bots is able to answer questions based on predefined answers.

The point of this workshop is to introduce you to the basics of creating a simple bot, and it is not intended to be a deep-dive into bot development. If you want to learn more, please check out the Microsoft Bot Framework. Continue reading “Meetup Instruction Guide Build your Bot”

Introduction to Python for data science – Microsoft Professional Program

Introduction to Python for Data Science

Introduction to Python for Data ScienceIntroduction to Python for Data Science

The ability to analyze data with Python is critical in data science. Learn the basics with this Introduction to Python for Data Science course, and move on to create stunning visualizations.

About This Course

Python is a very powerful programming language used for many different applications. Over time, the huge community around this open source language has created quite a few tools to efficiently work with Python. In recent years, a number of tools have been built specifically for data science. As a result, analyzing data with Python has never been easier.

In this practical course, you will start from the very beginning, with basic arithmetic and variables, and learn how to handle data structures, such as Python lists, Numpy arrays, and Pandas DataFrames. Along the way, you’ll learn about Python functions and control flow. Plus, you’ll look at the world of data visualizations with Python and create your own stunning visualizations based on real data. Read more…

Introduction to R for Data Science – Microsoft Professional Program

Introduction to R for Data Science

Introduction to R for Data ScienceIntroduction to R for Data Science

This Introduction to R for Data Science course will teach you the R statistical programming language, the lingua franca of data science in this hands-on course.

About This Course

R is rapidly becoming the leading language in data science and statistics. Today, R is the tool of choice for data science professionals in every industry and field. Whether you are full-time number cruncher, or just the occasional data analyst, R will suit your needs.

This introduction to R programming course will help you master the basics of R. In seven sections, you will cover its basic syntax, making you ready to undertake your own first data analysis using R. Starting from variables and basic operations, you will eventually learn how to handle data structures such as vectors, matrices, data frames and lists. In the final section, you will dive deeper into the graphical capabilities of R, and create your own stunning data visualizations. No prior knowledge in programming or data science is required. Read more…

 

Graphing moderation of PROCESS v3.0 Model 1

Graphing moderation with PROCESS V3.0 graph 2

This blog is about graphing moderation with the help of SPSS with the PROCESS macro, and our corresponding MD2C Graphing template for PROCESS v3.0 Model 1 – Moderation.

The case that we used is based on the article of Chapman and Lickel (2016), and you can find a detailed elaboration of this case in Andrew Hayes’ second book about Introduction to Mediation, Moderation, and Conditional Process Analysis (Hayes, 2017). You can download the data from Hayes’ website. The datafile you need for this example is called DISASTER. Besides, you can also download the PROCESS V3.0 macro for SPSS and SAS (and much more) from the site: http://www.processmacro.org/ Continue reading “Graphing moderation of PROCESS v3.0 Model 1”

Microsoft Data Science @ InSpark – Amsterdam: Workshop Azure Machine Learning

Microsoft Data Science meetup

Microsoft Data Science Azure Machine Learning Workshop

DEZE TEKST WORDT SPOEDIG VERTAALD!

Lab  Setup and Instruction Guide

In this 2nd Microsoft Data Science meetup, hosted by InSpark and with guest speaker Jeroen ter Heerdt from Microsoft, we also organized a workshop to get the basics of machine learning on the Azure platform.

Overview

In this lab, as part of the Microsoft Data Science meetup community, you will learn how to build a Human Activity Classifier with Azure Machine Learning. This classifier predicts somebody’s activity class (sitting, standing up, standing, sitting down, walking) based on the use of wearable sensors. The point of this lab is to introduce you to the basics of creating and deploying a machine learning model in Azure ML, it is not intended to be a deep-dive into model design, validation and improvement.

This lab environment contains the following tasks:

  1. Setup your Azure ML environment
  2. Get the data
  3. Build your model
  4. Publish your model

What You’ll Need

To perform the tasks, you will need the following:

  • A Windows, Linux, or Mac OSX
  • A web browser and Internet

1.     Setup your Azure ML environment

There are several options to start with Azure ML: https://azure.microsoft.com/en-us/services/machine-learning/

Azure-ML-Sign-Up

If you don’t have an Azure account already, we recommend you to use the Free Workspace option. Therefore, you would have to sign up for a Microsoft account. If you don’t have one already, you can sign up for one at https://signup.live.com/.

  1. Get the data This classifier predicts somebody’s activity class (sitting, standing up, standing, sitting down, walking). It is based on the Human Activity Recognition dataset. Human Activity Recognition (HAR) is an active research area, results of which have the potential to benefit the development of assistive technologies in order to support care of the elderly, the chronically ill and people with special needs. Activity recognition can be used to provide information about patients’ routines to support the development of e-health systems. Two approaches are commonly used for HAR: image processing and use of wearable sensors. In this case we will use information generated by wearable sensors (Ugulino et al, 2012).

Understand the data source

In this lab we use the Human Activity Recognition Data from its source: http://groupware.les.inf.puc-rio.br/har#ixzz2PyRdbAfA. More info can also be found on the UCI repository. You can download the data from http://groupware.les.inf.puc-rio.br/static/har/dataset-har-PUC-Rio-ugulino.zip and extract the downloaded zip file to a convenient folder on your local computer.

The data has been collected during 8 hours of activities, 2 hours with each of the 2 men and 2 women, all adults and healthy. These people were wearing 4 accelerometers from LiliPad Arduino, respectively positioned in the waist, left thigh, right ankle, and right arm. This resulted in a dataset with 165634 rows and 19 columns.

  • user (text)
  • gender (text)
  • age (integer)
  • how_tall_in_meters (real)
  • weight (int)
  • body_mass_index (real)
  • x1 (type int, value of the axis ‘x’ of the 1st accelerometer, mounted on waist)
  • y1 (type int, value of the axis ‘y’ of the 1st accelerometer, mounted on waist)
  • z1 (type int, value of the axis ‘z’ of the 1st accelerometer, mounted on waist)
  • x2 (type int, value of the axis ‘x’ of the 2nd accelerometer, mounted on the left thigh)
  • y2 (type int, value of the axis ‘y’ of the 2nd accelerometer, mounted on the left thigh)
  • z2 (type int, value of the axis ‘z’ of the 2nd accelerometer, mounted on the left thigh)
  • x3 (type int, value of the axis ‘x’ of the 3rd accelerometer, mounted on the right ankle)
  • y3 (type int, value of the axis ‘y’ of the 3rd accelerometer, mounted on the right ankle)
  • z3 (type int, value of the axis ‘z’ of the 3rd accelerometer, mounted on the right ankle)
  • x4 (type int, value of the axis ‘x’ of the 4th accelerometer, mounted on the right upper-arm)
  • y4 (type int, value of the axis ‘y’ of the 4th accelerometer, mounted on the right upper-arm)
  • z4 (type int, value of the axis ‘z’ of the 4th accelerometer, mounted on the right upper-arm)
  • class (text, ‘sitting-down’ ,’standing-up’, ‘standing’, ‘walking’, and ‘sitting’)

2.     Build the Human Activity Classifier

Prepare the data

Before you can use it to train a classification model you must prepare and upload the data:

  1. Azure ML works with comma separated files. The original data file contains ‘;’ as separator and will therefore be not suitable for uploading. We first have to open the downloaded csv file and convert it to a csv with a ‘,’ as a separator. Make sure you also have ‘.’ for your decimals.
    If you have trouble creating such file, you can start with this starting experiment: https://gallery.cortanaintelligence.com/Experiment/Human-Activity-Classifier-Step-1-Load-data. This will open a window where you have to sign in into Azure ML.
  2. Open a browser and browse to https://studio.azureml.net. Then sign in using the Microsoft account associated with your Azure ML.
  3. Create a new blank experiment by clicking on the + NEW button in the left of your browser, and select EXPERIMENT, and subsequently BLANK EXPERIMENT. You can change the generated name into Human Activity Classifier.
    Azure-ML-Create-Experiment
  4. Upload the csv file to Azure ML and name it HAR dataset. To do this, you have to click on the + NEW button in the left lower corner of your browser, and select DATASET, and subsequently FROM LOCAL FILE.
    Azure-ML-Upload-datafile
  5. In the Human Activity Classifier experiment, go to My Datasets under Saved datasets, and drag the HAR dataset on the canvas, and click on RUN (menu below). You will have to wait until the model is finished running before you continue with the next step.
    Azure-ML-Select-Dataset
  6. To visualize the output of the dataset, right-click on the output port of the data module and select Visualize.
    Azure-ML-Visualize-Data
    Now you can review the data it contains. Note that the dataset contains the following variables:
  • user (string)
  • gender (string)
  • age (numeric)
  • how_tall_in_meters (numeric)
  • weight (numeric)
  • body_mass_index (numeric)
  • x1 (numeric)
  • y1 (numeric)
  • z1 (numeric)
  • x2 (numeric)
  • y2 (numeric)
  • z2 numeric)
  • x3 (numeric)
  • y3 (numeric)
  • z3 (numeric)
  • x4 (numeric)
  • y4 (numeric)
  • z4 (string) !!!
  • class (string)
  1. Ups, something went wrong: ‘z4’ has been processed as a ‘string’ instead of an ‘integer’. You can change this by using a few lines of R with the Execute R Script Drag the Execute R Script module on the canvas and use this code to convert ‘z4’ to a numeric:
     # Map 1-based optional input ports to variables
    
    df <- maml.mapInputPort(1) # class: data.frame
    
    df$z4 <- as.numeric(df$z4)
    
    # Select data.frame to be sent to the output Dataset port
    
    maml.mapOutputPort("df");

You might ask yourself: why don’t you use the Edit Metadata module for that. Well, if we try that we will get an error that Azure cannot convert specific strings to an integer.

  1. After converting ‘z4’ to an integer, we have to inspect the data if we miss any. Therefore, click on the Results dataset1 (left) output of the Execute R Script Although the UCI repository states that there are no missing values, we find that the ‘z4’ column has 1 missing value.
  2. We will delete this row with the Clean Missing Data Set the properties as follows:
    • Columns to be cleaned: all
    • Minimum missing value ratio: 0
    • Maximum missing value ratio: 1
    • Cleaning mode: entire row
  3. After cleaning the data, we can inspect the data. We start with some descriptive statistics using the Summarize Data module.
  4. Besides, we can inspect the correlation between the numeric columns using the using the Select Columns in Dataset module. Drag this module on the canvas and connect the output port of the Cleaning Missing Data module to the input port of the Select Columns in Dataset module. Now we have to select the numeric columns, using the WITH RULES, and starting with NO COLUMNS, and subsequently select Include, column types, Numeric:
    Azure-ML-Select-Numerics
  5. Now we can add the Compute Linear Correlation module to calculate the (Pearson’s) correlation. Observe that there is a strong correlation between length (how_tall_in_meters), weight (weight) and b.m.i. (body_mass_index). This is not surprising as b.m.i is calculated based on length and weight.
    Azure-ML-Correlation
  6. Based on prior logic, we will remove ‘body_mass_index’ using the Select Columns in Dataset Here we also exclude ‘user’, as we don’t need this identifier later on in our model. Select the Select Columns in Dataset module, and in the Properties pane launch the column selector. Then use the column selector to exclude the following columns:
    • user
    • body_mass_index

You can use the WITH RULES page of the column selector to accomplish this as shown here:

Azure-ML-Select-Columns

  1. Now we transform gender to be a categorical variable by adding an Edit Metadata module to the experiment, and connect the Select Columns in Dataset output to its input. Set the properties of the Edit Metadata module as follows:
    • Column: gender
    • Data type: Unchanged
    • Categorical: Make categorical
    • Fields: Features
    • New column names: Leave blank
  2. We will do a likewise transformation with our dependent variable ‘class’, and set it to a categorical variable and define it as our label. Add an Edit Metadata module to the experiment, and connect the Edit Metadata output to its input. Set the properties of the Edit Metadata module as follows:
    • Column: Edit Metadata class
    • Data type: Unchanged
    • Categorical: Make categorical
    • Fields: Label
    • New column names: Leave blank
  3. When the experiment has finished running, visualize the output of the Edit Metadata module and verify that:
    • The columns you specified have been removed.
    • All numeric columns now have a Feature Type of Numeric Feature.
    • All string columns now have a Feature Type of Categorical Feature.

 

Create and Evaluate a Classification Model

Now that you have prepared the data, you will construct and evaluate a classification model. The goal of this model is to identify a human activity and to find out if somebody is ‘sitting-down’, ‘standing-up’, ‘standing’, ‘walking’, or ‘sitting’.

  1. We are now ready to split the data into separate training and test We will train the model with the training dataset, and test the model with the test dataset. Therefore, add a Split Data module to the Human Activity Classifier experiment, and connect the output of the Edit Metadata module to the input of the Split Data module. Set the properties of the Split Data module as follows:
    • Splitting mode: Split Rows
    • Fraction of rows in the first output dataset: 0.7
    • Randomized split: Checked
    • Random seed: 123
    • Stratified split: False
  2. Add a Train Model module to the experiment, and connect the Results dataset1 (left) output of the Split Data module to the Dataset (right) input of the Train Model In the Properties pane for the Train Model module, use the column selector to select the class column. This sets the label column that the classification model will be trained to predict.
  3. Add a Multiclass Decision Forest module to the experiment, and connect the output of the Multiclass Decision Forest module to the Untrained model (left) input of the Train Model This specifies that the classification model will be trained using the multiclass decision forest algorithm.
  4. Set the properties of the Multiclass Decision Forest module as follows:
    • Resampling method: Bagging
    • Create trainer mode: Single Parameter
    • Number of decision trees: 8
    • Maximum depth of decision trees: 32
    • Number of random splits per node: 128
    • Minimum number of samples per leaf: 1
    • Allow unknown categorical levels: Checked
  5. Add a Score Model module to the experiment. Then connect the output of the Train Model module to the Trained model (left) input of the Score Model module, and connect the Results dataset2 (right) output of the Split Data module to the Dataset (right) input of the Score Model module.
  6. On the Properties pane for the Score Model module, ensure that the Append score columns to output checkbox is selected.
  7. Add an Evaluate Model module to the experiment, and connect the output of the Score model module to the Scored dataset (left) input of the Evaluate Model module.
  1. Verify that your experiment resembles the figure below, then save and run the experiment.
    Azure-ML-Model-1
  2. When the experiment has finished running, visualize the output of the Score Model module, and compare the predicted values in the Scored Labels column with the actual values from the test data set in the class column.
  3. Visualize the output of the Evaluate Model module, and review the results (shown below). We see the score per class. Then review the Overall Accuracy figure for the model, which should be around 0.994. This indicates that the classifier model is correct 99% of the time, which is a good figure for an initial model, keeping in mind the original distribution of the classification (see below).
    Azure-ML-Metrics
    Azure-ML-Confusion-Matrix

Detailed Accuracy from the original paper

Correctly Classified Instances  164662 .4144 %

Incorrectly Classified Instances   970 0.5856 %

Root mean squared error  0.0463

Relative absolute error   0.7938 %

Relative absolute error   0.7938 %

Azure-ML-Accuracy

3.     Publish your Human Activity Classifier

Publish the Model as a Web Service

  1. Make sure you have saved and ran the experiment. With the Human Activity Classifier experiment open, click the SET UP WEB SERVICE icon at the bottom of the Azure ML Studio page and click Predictive Web Service [Recommended]. A new Predictive Experiment tab will be automatically created.
  2. Verify that, with a bit of rearranging, the Predictive Experiment resembles this figure:
    Azure-ML-Predictive
  3. We can now start to remove variables we don’t need for prediction. Besides eliminating ‘user’, and ‘bmi’ we can now also remove ‘class’, as we want that as output from the model. Therefore, you can drag the Select Columns in Dataset module up, add ‘class’ to be removed, and connect it to the original dataset and the output to the Execute R Script.
  4. Besides, we will make sure to use a numeric value for ‘z4’, so we can move the Webservice input and connect it directly to the Edit Metadata module where we make ‘gender’ categorical.
  5. For this experiment, we will also make sure to send complete records, so we remove the Clean Missing Data.
  6. Delete the connection between the Score Model module and the Web service output module.
  7. Add a Select Columns in Dataset module to the experiment, and connect the output of the Score Model module to its input. Then connect the output of the Select Columns in Dataset module to the input of the Web service output module.
  8. Select the Select Columns in Dataset module, and use the column selector to select only the Scored Labels This ensures that when the web service is called, only the predicted value is returned.
  9. Ensure that the predictive experiment now looks like the following, and then save and run the predictive experiment:
    Azure-ML-Predictive-Setup

 

  1. When the experiment has finished running, visualize the output of the last Select Columns in Dataset module and verify that only the Scored Labels column is returned.

Deploy and Use the Web Service

  1. In the Human Activity Classifier [Predictive Exp.] experiment, click the Deploy Web Service icon at the bottom of the Azure ML Studio window.
  2. Wait a few seconds for the dashboard page to appear, and note the API key and Request/Response You will use these to connect to the web service from a client application.
    Azure-ML-Testing-Model
  3. You have several options to connect to the webservice. To test this webservice, you can click on New Web Services Experience (preview). This will open a new browser.
  4. Here you have the option to test your model (Test endpoint option under BASICS):
    Azure-ML-Test-Endpoint
  5. When clicking on Test endpoint, you have the option to enable the usage of sample data, which will generate a sample record to test your model with:
    Azure-ML-Sample-Data
  6. After enabling this sample data, you will see the generated sample data:
    Azure-ML-Test-Webservice
  7. The final step would be pressing the Test Request-Response button: what kind of activity is this woman doing according to your model?
  8. Another option is to click on the blue TEST button.
    Azure-ML-Test-API
  9. This will open a pop-up window, where you can fill out some test values:
    Azure-ML-Enter-Data
  10. The last option is to open an Excel file, which will automatically create sample data. Opening this file will add the Azure Machine Learning add-in to the workbook. If that doesn’t work, or you don’t have Excel on your laptop, you could follow the next steps to make a workbook online:
  11. Open a new browser tab.
  12. In the new browser tab, navigate to https://office.live.com/start/Excel.aspx. If prompted, sign in with your Microsoft account (use the same credentials you use to access Azure ML).
  13. In Excel Online, create a new blank workbook.
  14. On the Insert tab, click Office Add-ins. Then in the Office Add-ins dialog box, select Store, search for Azure Machine Learning, and add the Azure Machine Learning add-in as shown below:
    Azure-ML-Office-Addin
  15. After the add-in is installed, in the Azure Machine Learning pane on the right of the Excel workbook, click Add Web Service. Boxes for the URL and API key of the web service will appear.
  16. On the browser tab containing the dashboard page for your Azure ML web service, right-click the Request/Response link you noted earlier and copy the web service URL to the clipboard. Then return to the browser tab containing the Excel Online workbook and paste the URL into the URL box.
  17. On the browser tab containing the dashboard page for your Azure ML web service, click the Copy button for the API key you noted earlier to copy the key to the clipboard. Then return to the browser tab containing the Excel Online workbook and paste it into the API key box.
  18. Verify that the Azure Machine Learning pane in your workbook now resembles this, and click Add:
    Azure-ML-Excel-Azure-Key
  19. After the web service has been added, in the Azure Machine Learning pane, it is opened on 2. Predict. Here you have the option to generate sample data by clicking on Use sample data. This enters some sample input values in the worksheet.
  20. Select the cells containing the input data (cells A1 to P6), and in the Azure Machine Learning pane, click the button to select the input range and confirm that it is ‘Sheet1′!A1:P6.
  21. Ensure that the My data has headers box is checked.
  1. In the Output box type Q1, and ensure the Include headers box is checked.
  2. Click the Predict button, and after a few seconds, view the predicted label in cell Q2.
    Azure-ML-Excel
  3. Change some values of row 2 and click Predict Then view the updated label that is predicted by the web service.
  4. Try changing a few of the input variables and predicting the human activity class. You can add multiple rows to the input range and try various combinations at once.

 

Summary

By completing this lab, you have prepared your environment and data, and built and deployed your own Azure ML model. We hope you enjoyed this introductory lab and that you will build many more machine learning solutions!

If you want to download this lab, you can find a pdf version here Microsoft Data Science Azure Machine Learning Workshop (93 downloads)

Microsoft Data Science @ InSpark – Amsterdam: Workshop Azure Machine Learning

Microsoft Data Science meetup

Microsoft Data Science Azure Machine Learning Workshop

Lab  Setup and Instruction Guide

In this 2nd Microsoft Data Science meetup, hosted by InSpark and with guest speaker Jeroen ter Heerdt from Microsoft, we also organized a workshop to get the basics of machine learning on the Azure platform.

Overview

In this lab, as part of the Microsoft Data Science meetup community, you will learn how to build a Human Activity Classifier with Azure Machine Learning. This classifier predicts somebody’s activity class (sitting, standing up, standing, sitting down, walking) based on the use of wearable sensors. The point of this lab is to introduce you to the basics of creating and deploying a machine learning model in Azure ML, it is not intended to be a deep-dive into model design, validation and improvement.

This lab environment contains the following tasks:

  1. Setup your Azure ML environment
  2. Get the data
  3. Build your model
  4. Publish your model

What You’ll Need

To perform the tasks, you will need the following:

  • A Windows, Linux, or Mac OSX
  • A web browser and Internet

1.     Setup your Azure ML environment

There are several options to start with Azure ML: https://azure.microsoft.com/en-us/services/machine-learning/

Azure-ML-Sign-Up

If you don’t have an Azure account already, we recommend you to use the Free Workspace option. Therefore, you would have to sign up for a Microsoft account. If you don’t have one already, you can sign up for one at https://signup.live.com/.

  1. Get the data This classifier predicts somebody’s activity class (sitting, standing up, standing, sitting down, walking). It is based on the Human Activity Recognition dataset. Human Activity Recognition (HAR) is an active research area, results of which have the potential to benefit the development of assistive technologies in order to support care of the elderly, the chronically ill and people with special needs. Activity recognition can be used to provide information about patients’ routines to support the development of e-health systems. Two approaches are commonly used for HAR: image processing and use of wearable sensors. In this case we will use information generated by wearable sensors (Ugulino et al, 2012).

Understand the data source

In this lab we use the Human Activity Recognition Data from its source: http://groupware.les.inf.puc-rio.br/har#ixzz2PyRdbAfA. More info can also be found on the UCI repository. You can download the data from http://groupware.les.inf.puc-rio.br/static/har/dataset-har-PUC-Rio-ugulino.zip and extract the downloaded zip file to a convenient folder on your local computer.

The data has been collected during 8 hours of activities, 2 hours with each of the 2 men and 2 women, all adults and healthy. These people were wearing 4 accelerometers from LiliPad Arduino, respectively positioned in the waist, left thigh, right ankle, and right arm. This resulted in a dataset with 165634 rows and 19 columns.

  • user (text)
  • gender (text)
  • age (integer)
  • how_tall_in_meters (real)
  • weight (int)
  • body_mass_index (real)
  • x1 (type int, value of the axis ‘x’ of the 1st accelerometer, mounted on waist)
  • y1 (type int, value of the axis ‘y’ of the 1st accelerometer, mounted on waist)
  • z1 (type int, value of the axis ‘z’ of the 1st accelerometer, mounted on waist)
  • x2 (type int, value of the axis ‘x’ of the 2nd accelerometer, mounted on the left thigh)
  • y2 (type int, value of the axis ‘y’ of the 2nd accelerometer, mounted on the left thigh)
  • z2 (type int, value of the axis ‘z’ of the 2nd accelerometer, mounted on the left thigh)
  • x3 (type int, value of the axis ‘x’ of the 3rd accelerometer, mounted on the right ankle)
  • y3 (type int, value of the axis ‘y’ of the 3rd accelerometer, mounted on the right ankle)
  • z3 (type int, value of the axis ‘z’ of the 3rd accelerometer, mounted on the right ankle)
  • x4 (type int, value of the axis ‘x’ of the 4th accelerometer, mounted on the right upper-arm)
  • y4 (type int, value of the axis ‘y’ of the 4th accelerometer, mounted on the right upper-arm)
  • z4 (type int, value of the axis ‘z’ of the 4th accelerometer, mounted on the right upper-arm)
  • class (text, ‘sitting-down’ ,’standing-up’, ‘standing’, ‘walking’, and ‘sitting’)

2.     Build the Human Activity Classifier

Prepare the data

Before you can use it to train a classification model you must prepare and upload the data:

  1. Azure ML works with comma separated files. The original data file contains ‘;’ as separator and will therefore be not suitable for uploading. We first have to open the downloaded csv file and convert it to a csv with a ‘,’ as a separator. Make sure you also have ‘.’ for your decimals.

    If you have trouble creating such file, you can start with this starting experiment: https://gallery.cortanaintelligence.com/Experiment/Human-Activity-Classifier-Step-1-Load-data. You can copy this experiment into your own workspace by clicking on the “Open in Studio” button. This will open a window where you have to sign in into Azure ML. You can rename the copied experiment to Human Activity Classifier by clicking on the title. You can now skip to step 6.

  2. Open a browser and browse to https://studio.azureml.net. Then sign in using the Microsoft account associated with your Azure ML.
  3. Create a new blank experiment by clicking on the + NEW button in the left of your browser, and select EXPERIMENT, and subsequently BLANK EXPERIMENT. You can change the generated name into Human Activity Classifier.
    Azure-ML-Create-Experiment
  4. Upload the csv file to Azure ML and name it HAR dataset. To do this, you have to click on the + NEW button in the left lower corner of your browser, and select DATASET, and subsequently FROM LOCAL FILE.
    Azure-ML-Upload-datafile
  5. In the Human Activity Classifier experiment, go to My Datasets under Saved datasets, and drag the HAR dataset on the canvas, and click on RUN (menu below). You will have to wait until the model is finished running before you continue with the next step.
    Azure-ML-Select-Dataset
  6. To visualize the output of the dataset, right-click on the output port of the data module and select Visualize.
    Azure-ML-Visualize-Data
    Now you can review the data it contains. Note that the dataset contains the following variables:
  • user (string)
  • gender (string)
  • age (numeric)
  • how_tall_in_meters (numeric)
  • weight (numeric)
  • body_mass_index (numeric)
  • x1 (numeric)
  • y1 (numeric)
  • z1 (numeric)
  • x2 (numeric)
  • y2 (numeric)
  • z2 numeric)
  • x3 (numeric)
  • y3 (numeric)
  • z3 (numeric)
  • x4 (numeric)
  • y4 (numeric)
  • z4 (string) !!!
  • class (string)
  1. Ups, something went wrong: ‘z4’ has been processed as a ‘string’ instead of an ‘integer’. You can change this by using a few lines of R with the Execute R Script Drag the Execute R Script module on the canvas and use this code to convert ‘z4’ to a numeric:
     # Map 1-based optional input ports to variables
    
    df <- maml.mapInputPort(1) # class: data.frame
    
    df$z4 <- as.numeric(df$z4)
    
    # Select data.frame to be sent to the output Dataset port
    
    maml.mapOutputPort("df");

You might ask yourself: why don’t you use the Edit Metadata module for that. Well, if we try that we will get an error that Azure cannot convert specific strings to an integer.

  1. After converting ‘z4’ to an integer, we have to inspect the data if we miss any. Therefore, click on the Results dataset1 (left) output of the Execute R Script Although the UCI repository states that there are no missing values, we find that the ‘z4’ column has 1 missing value.
  2. We will delete this row with the Clean Missing Data Set the properties as follows:
    • Columns to be cleaned: all
    • Minimum missing value ratio: 0
    • Maximum missing value ratio: 1
    • Cleaning mode: entire row
  3. After cleaning the data, we can inspect the data. We start with some descriptive statistics using the Summarize Data module.
  4. Besides, we can inspect the correlation between the numeric columns using the using the Select Columns in Dataset module. Drag this module on the canvas and connect the output port of the Cleaning Missing Data module to the input port of the Select Columns in Dataset module. Now we have to select the numeric columns, using the WITH RULES, and starting with NO COLUMNS, and subsequently select Include, column types, Numeric:
    Azure-ML-Select-Numerics
  5. Now we can add the Compute Linear Correlation module to calculate the (Pearson’s) correlation. Observe that there is a strong correlation between length (how_tall_in_meters), weight (weight) and b.m.i. (body_mass_index). This is not surprising as b.m.i is calculated based on length and weight.
    Azure-ML-Correlation
  6. Based on prior logic, we will remove ‘body_mass_index’ using the Select Columns in Dataset Here we also exclude ‘user’, as we don’t need this identifier later on in our model. Select the Select Columns in Dataset module, and in the Properties pane launch the column selector. Then use the column selector to exclude the following columns:
    • user
    • body_mass_index

You can use the WITH RULES page of the column selector to accomplish this as shown here:

Azure-ML-Select-Columns

  1. Now we transform gender to be a categorical variable by adding an Edit Metadata module to the experiment, and connect the Select Columns in Dataset output to its input. Set the properties of the Edit Metadata module as follows:
    • Column: gender
    • Data type: Unchanged
    • Categorical: Make categorical
    • Fields: Features
    • New column names: Leave blank
  2. We will do a likewise transformation with our dependent variable ‘class’, and set it to a categorical variable and define it as our label. Add an Edit Metadata module to the experiment, and connect the Edit Metadata output to its input. Set the properties of the Edit Metadata module as follows:
    • Column: Edit Metadata class
    • Data type: Unchanged
    • Categorical: Make categorical
    • Fields: Label
    • New column names: Leave blank
  3. When the experiment has finished running, visualize the output of the Edit Metadata module and verify that:
    • The columns you specified have been removed.
    • All numeric columns now have a Feature Type of Numeric Feature.
    • All string columns now have a Feature Type of Categorical Feature.

 

Create and Evaluate a Classification Model

Now that you have prepared the data, you will construct and evaluate a classification model. The goal of this model is to identify a human activity and to find out if somebody is ‘sitting-down’, ‘standing-up’, ‘standing’, ‘walking’, or ‘sitting’.

  1. We are now ready to split the data into separate training and test We will train the model with the training dataset, and test the model with the test dataset. Therefore, add a Split Data module to the Human Activity Classifier experiment, and connect the output of the Edit Metadata module to the input of the Split Data module. Set the properties of the Split Data module as follows:
    • Splitting mode: Split Rows
    • Fraction of rows in the first output dataset: 0.7
    • Randomized split: Checked
    • Random seed: 123
    • Stratified split: False
  2. Add a Train Model module to the experiment, and connect the Results dataset1 (left) output of the Split Data module to the Dataset (right) input of the Train Model In the Properties pane for the Train Model module, use the column selector to select the class column. This sets the label column that the classification model will be trained to predict.
  3. Add a Multiclass Decision Forest module to the experiment, and connect the output of the Multiclass Decision Forest module to the Untrained model (left) input of the Train Model This specifies that the classification model will be trained using the multiclass decision forest algorithm.
  4. Set the properties of the Multiclass Decision Forest module as follows:
    • Resampling method: Bagging
    • Create trainer mode: Single Parameter
    • Number of decision trees: 8
    • Maximum depth of decision trees: 32
    • Number of random splits per node: 128
    • Minimum number of samples per leaf: 1
    • Allow unknown categorical levels: Checked
  5. Add a Score Model module to the experiment. Then connect the output of the Train Model module to the Trained model (left) input of the Score Model module, and connect the Results dataset2 (right) output of the Split Data module to the Dataset (right) input of the Score Model module.
  6. On the Properties pane for the Score Model module, ensure that the Append score columns to output checkbox is selected.
  7. Add an Evaluate Model module to the experiment, and connect the output of the Score model module to the Scored dataset (left) input of the Evaluate Model module.
  1. Verify that your experiment resembles the figure below, then save and run the experiment.
    Azure-ML-Model-1
  2. When the experiment has finished running, visualize the output of the Score Model module, and compare the predicted values in the Scored Labels column with the actual values from the test data set in the class column.
  3. Visualize the output of the Evaluate Model module, and review the results (shown below). We see the score per class. Then review the Overall Accuracy figure for the model, which should be around 0.994. This indicates that the classifier model is correct 99% of the time, which is a good figure for an initial model, keeping in mind the original distribution of the classification (see below).
    Azure-ML-Metrics
    Azure-ML-Confusion-Matrix

Detailed Accuracy from the original paper

Correctly Classified Instances  164662 .4144 %

Incorrectly Classified Instances   970 0.5856 %

Root mean squared error  0.0463

Relative absolute error   0.7938 %

Relative absolute error   0.7938 %

Azure-ML-Accuracy

3.     Publish your Human Activity Classifier

Publish the Model as a Web Service

  1. Make sure you have saved and ran the experiment. With the Human Activity Classifier experiment open, click the SET UP WEB SERVICE icon at the bottom of the Azure ML Studio page and click Predictive Web Service [Recommended]. A new Predictive Experiment tab will be automatically created.
  2. Verify that, with a bit of rearranging, the Predictive Experiment resembles this figure:
    Azure-ML-Predictive
  3. We can now start to remove variables we don’t need for prediction. Besides eliminating ‘user’, and ‘bmi’ we can now also remove ‘class’, as we want that as output from the model. Therefore, you can drag the Select Columns in Dataset module up, add ‘class’ to be removed, and connect it to the original dataset and the output to the Execute R Script.
  4. Besides, we will make sure to use a numeric value for ‘z4’, so we can move the Webservice input and connect it directly to the Edit Metadata module where we make ‘gender’ categorical.
  5. For this experiment, we will also make sure to send complete records, so we remove the Clean Missing Data.
  6. Delete the connection between the Score Model module and the Web service output module.
  7. Add a Select Columns in Dataset module to the experiment, and connect the output of the Score Model module to its input. Then connect the output of the Select Columns in Dataset module to the input of the Web service output module.
  8. Select the Select Columns in Dataset module, and use the column selector to select only the Scored Labels This ensures that when the web service is called, only the predicted value is returned.
  9. Ensure that the predictive experiment now looks like the following, and then save and run the predictive experiment:
    Azure-ML-Predictive-Setup

 

  1. When the experiment has finished running, visualize the output of the last Select Columns in Dataset module and verify that only the Scored Labels column is returned.

Deploy and Use the Web Service

  1. In the Human Activity Classifier [Predictive Exp.] experiment, click the Deploy Web Service icon at the bottom of the Azure ML Studio window.
  2. Wait a few seconds for the dashboard page to appear, and note the API key and Request/Response You will use these to connect to the web service from a client application.
    Azure-ML-Testing-Model
  3. You have several options to connect to the webservice. To test this webservice, you can click on New Web Services Experience (preview). This will open a new browser.
  4. Here you have the option to test your model (Test endpoint option under BASICS):
    Azure-ML-Test-Endpoint
  5. When clicking on Test endpoint, you have the option to enable the usage of sample data, which will generate a sample record to test your model with:
    Azure-ML-Sample-Data
  6. After enabling this sample data, you will see the generated sample data:
    Azure-ML-Test-Webservice
  7. The final step would be pressing the Test Request-Response button: what kind of activity is this woman doing according to your model?
  8. Another option is to click on the blue TEST button.
    Azure-ML-Test-API
  9. This will open a pop-up window, where you can fill out some test values:
    Azure-ML-Enter-Data
  10. The last option is to open an Excel file, which will automatically create sample data. Opening this file will add the Azure Machine Learning add-in to the workbook. If that doesn’t work, or you don’t have Excel on your laptop, you could follow the next steps to make a workbook online:
  11. Open a new browser tab.
  12. In the new browser tab, navigate to https://office.live.com/start/Excel.aspx. If prompted, sign in with your Microsoft account (use the same credentials you use to access Azure ML).
  13. In Excel Online, create a new blank workbook.
  14. On the Insert tab, click Office Add-ins. Then in the Office Add-ins dialog box, select Store, search for Azure Machine Learning, and add the Azure Machine Learning add-in as shown below:
    Azure-ML-Office-Addin
  15. After the add-in is installed, in the Azure Machine Learning pane on the right of the Excel workbook, click Add Web Service. Boxes for the URL and API key of the web service will appear.
  16. On the browser tab containing the dashboard page for your Azure ML web service, right-click the Request/Response link you noted earlier and copy the web service URL to the clipboard. Then return to the browser tab containing the Excel Online workbook and paste the URL into the URL box.
  17. On the browser tab containing the dashboard page for your Azure ML web service, click the Copy button for the API key you noted earlier to copy the key to the clipboard. Then return to the browser tab containing the Excel Online workbook and paste it into the API key box.
  18. Verify that the Azure Machine Learning pane in your workbook now resembles this, and click Add:
    Azure-ML-Excel-Azure-Key
  19. After the web service has been added, in the Azure Machine Learning pane, it is opened on 2. Predict. Here you have the option to generate sample data by clicking on Use sample data. This enters some sample input values in the worksheet.
  20. Select the cells containing the input data (cells A1 to P6), and in the Azure Machine Learning pane, click the button to select the input range and confirm that it is ‘Sheet1′!A1:P6.
  21. Ensure that the My data has headers box is checked.
  1. In the Output box type Q1, and ensure the Include headers box is checked.
  2. Click the Predict button, and after a few seconds, view the predicted label in cell Q2.
    Azure-ML-Excel
  3. Change some values of row 2 and click Predict Then view the updated label that is predicted by the web service.
  4. Try changing a few of the input variables and predicting the human activity class. You can add multiple rows to the input range and try various combinations at once.

 

Summary

By completing this lab, you have prepared your environment and data, and built and deployed your own Azure ML model. We hope you enjoyed this introductory lab and that you will build many more machine learning solutions!

If you want to download this lab, you can find a pdf version here Microsoft Data Science Azure Machine Learning Workshop (93 downloads)

Meetup: Microsoft Data Science Azure Machine Learning Workshop

Microsoft Data Science Azure Machine Learning Workshop

DEZE TEKST WORDT SPOEDIG VERTAALD!

Lab  Setup and Instruction Guide

In this first Microsoft Data Science meetup, hosted by Infi and with guest speaker Jeroen ter Heerdt from Microsoft, we also organized a workshop to get the basics of machine learning on the Azure platform.

Overview

In this lab, as part of the Microsoft Data Science meetup community, you will learn how to build a Human Activity Classifier with Azure Machine Learning. This classifier predicts somebody’s activity class (sitting, standing up, standing, sitting down, walking) based on the use of wearable sensors. The point of this lab is to introduce you to the basics of creating and deploying a machine learning model in Azure ML, it is not intended to be a deep-dive into model design, validation and improvement.

This lab environment contains the following tasks:

  1. Setup your Azure ML environment
  2. Get the data
  3. Build your model
  4. Publish your model

What You’ll Need

To perform the tasks, you will need the following:

  • A Windows, Linux, or Mac OSX
  • A web browser and Internet

1.     Setup your Azure ML environment

There are several options to start with Azure ML: https://azure.microsoft.com/en-us/services/machine-learning/

Azure-ML-Sign-Up

If you don’t have an Azure account already, we recommend you to use the Free Workspace option. Therefore, you would have to sign up for a Microsoft account. If you don’t have one already, you can sign up for one at https://signup.live.com/.

  1. Get the data This classifier predicts somebody’s activity class (sitting, standing up, standing, sitting down, walking). It is based on the Human Activity Recognition dataset. Human Activity Recognition (HAR) is an active research area, results of which have the potential to benefit the development of assistive technologies in order to support care of the elderly, the chronically ill and people with special needs. Activity recognition can be used to provide information about patients’ routines to support the development of e-health systems. Two approaches are commonly used for HAR: image processing and use of wearable sensors. In this case we will use information generated by wearable sensors (Ugulino et al, 2012).

Understand the data source

In this lab we use the Human Activity Recognition Data from its source: http://groupware.les.inf.puc-rio.br/har#ixzz2PyRdbAfA. More info can also be found on the UCI repository. You can download the data from http://groupware.les.inf.puc-rio.br/static/har/dataset-har-PUC-Rio-ugulino.zip and extract the downloaded zip file to a convenient folder on your local computer.

The data has been collected during 8 hours of activities, 2 hours with each of the 2 men and 2 women, all adults and healthy. These people were wearing 4 accelerometers from LiliPad Arduino, respectively positioned in the waist, left thigh, right ankle, and right arm. This resulted in a dataset with 165634 rows and 19 columns.

  • user (text)
  • gender (text)
  • age (integer)
  • how_tall_in_meters (real)
  • weight (int)
  • body_mass_index (real)
  • x1 (type int, value of the axis ‘x’ of the 1st accelerometer, mounted on waist)
  • y1 (type int, value of the axis ‘y’ of the 1st accelerometer, mounted on waist)
  • z1 (type int, value of the axis ‘z’ of the 1st accelerometer, mounted on waist)
  • x2 (type int, value of the axis ‘x’ of the 2nd accelerometer, mounted on the left thigh)
  • y2 (type int, value of the axis ‘y’ of the 2nd accelerometer, mounted on the left thigh)
  • z2 (type int, value of the axis ‘z’ of the 2nd accelerometer, mounted on the left thigh)
  • x3 (type int, value of the axis ‘x’ of the 3rd accelerometer, mounted on the right ankle)
  • y3 (type int, value of the axis ‘y’ of the 3rd accelerometer, mounted on the right ankle)
  • z3 (type int, value of the axis ‘z’ of the 3rd accelerometer, mounted on the right ankle)
  • x4 (type int, value of the axis ‘x’ of the 4th accelerometer, mounted on the right upper-arm)
  • y4 (type int, value of the axis ‘y’ of the 4th accelerometer, mounted on the right upper-arm)
  • z4 (type int, value of the axis ‘z’ of the 4th accelerometer, mounted on the right upper-arm)
  • class (text, ‘sitting-down’ ,’standing-up’, ‘standing’, ‘walking’, and ‘sitting’)

2.     Build the Human Activity Classifier

Prepare the data

Before you can use it to train a classification model you must prepare and upload the data:

  1. Azure ML works with comma separated files. The original data file contains ‘;’ as separator and will therefore be not suitable for uploading. We first have to open the downloaded csv file and convert it to a csv with a ‘,’ as a separator. Make sure you also have ‘.’ for your decimals.
    If you have trouble creating such file, you can start with this starting experiment: https://gallery.cortanaintelligence.com/Experiment/Human-Activity-Classifier-Step-1-Load-data. This will open a window where you have to sign in into Azure ML.
  2. Open a browser and browse to https://studio.azureml.net. Then sign in using the Microsoft account associated with your Azure ML.
  3. Create a new blank experiment by clicking on the + NEW button in the left of your browser, and select EXPERIMENT, and subsequently BLANK EXPERIMENT. You can change the generated name into Human Activity Classifier.
    Azure-ML-Create-Experiment
  4. Upload the csv file to Azure ML and name it HAR dataset. To do this, you have to click on the + NEW button in the left lower corner of your browser, and select DATASET, and subsequently FROM LOCAL FILE.
    Azure-ML-Upload-datafile
  5. In the Human Activity Classifier experiment, go to My Datasets under Saved datasets, and drag the HAR dataset on the canvas, and click on RUN (menu below). You will have to wait until the model is finished running before you continue with the next step.
    Azure-ML-Select-Dataset
  6. To visualize the output of the dataset, right-click on the output port of the data module and select Visualize.
    Azure-ML-Visualize-Data
    Now you can review the data it contains. Note that the dataset contains the following variables:
  • user (string)
  • gender (string)
  • age (numeric)
  • how_tall_in_meters (numeric)
  • weight (numeric)
  • body_mass_index (numeric)
  • x1 (numeric)
  • y1 (numeric)
  • z1 (numeric)
  • x2 (numeric)
  • y2 (numeric)
  • z2 numeric)
  • x3 (numeric)
  • y3 (numeric)
  • z3 (numeric)
  • x4 (numeric)
  • y4 (numeric)
  • z4 (string) !!!
  • class (string)
  1. Ups, something went wrong: ‘z4’ has been processed as a ‘string’ instead of an ‘integer’. You can change this by using a few lines of R with the Execute R Script Drag the Execute R Script module on the canvas and use this code to convert ‘z4’ to a numeric:
     # Map 1-based optional input ports to variables
    
    df <- maml.mapInputPort(1) # class: data.frame
    
    df$z4 <- as.numeric(df$z4)
    
    # Select data.frame to be sent to the output Dataset port
    
    maml.mapOutputPort("df");

You might ask yourself: why don’t you use the Edit Metadata module for that. Well, if we try that we will get an error that Azure cannot convert specific strings to an integer.

  1. After converting ‘z4’ to an integer, we have to inspect the data if we miss any. Therefore, click on the Results dataset1 (left) output of the Execute R Script Although the UCI repository states that there are no missing values, we find that the ‘z4’ column has 1 missing value.
  2. We will delete this row with the Clean Missing Data Set the properties as follows:
    • Columns to be cleaned: all
    • Minimum missing value ratio: 0
    • Maximum missing value ratio: 1
    • Cleaning mode: entire row
  3. After cleaning the data, we can inspect the data. We start with some descriptive statistics using the Summarize Data module.
  4. Besides, we can inspect the correlation between the numeric columns using the using the Select Columns in Dataset module. Drag this module on the canvas and connect the output port of the Cleaning Missing Data module to the input port of the Select Columns in Dataset module. Now we have to select the numeric columns, using the WITH RULES, and starting with NO COLUMNS, and subsequently select Include, column types, Numeric:
    Azure-ML-Select-Numerics
  5. Now we can add the Compute Linear Correlation module to calculate the (Pearson’s) correlation. Observe that there is a strong correlation between length (how_tall_in_meters), weight (weight) and b.m.i. (body_mass_index). This is not surprising as b.m.i is calculated based on length and weight.
    Azure-ML-Correlation
  6. Based on prior logic, we will remove ‘body_mass_index’ using the Select Columns in Dataset Here we also exclude ‘user’, as we don’t need this identifier later on in our model. Select the Select Columns in Dataset module, and in the Properties pane launch the column selector. Then use the column selector to exclude the following columns:
    • user
    • body_mass_index

You can use the WITH RULES page of the column selector to accomplish this as shown here:

Azure-ML-Select-Columns

  1. Now we transform gender to be a categorical variable by adding an Edit Metadata module to the experiment, and connect the Select Columns in Dataset output to its input. Set the properties of the Edit Metadata module as follows:
    • Column: gender
    • Data type: Unchanged
    • Categorical: Make categorical
    • Fields: Features
    • New column names: Leave blank
  2. We will do a likewise transformation with our dependent variable ‘class’, and set it to a categorical variable and define it as our label. Add an Edit Metadata module to the experiment, and connect the Edit Metadata output to its input. Set the properties of the Edit Metadata module as follows:
    • Column: Edit Metadata class
    • Data type: Unchanged
    • Categorical: Make categorical
    • Fields: Label
    • New column names: Leave blank
  3. When the experiment has finished running, visualize the output of the Edit Metadata module and verify that:
    • The columns you specified have been removed.
    • All numeric columns now have a Feature Type of Numeric Feature.
    • All string columns now have a Feature Type of Categorical Feature.

 

Create and Evaluate a Classification Model

Now that you have prepared the data, you will construct and evaluate a classification model. The goal of this model is to identify a human activity and to find out if somebody is ‘sitting-down’, ‘standing-up’, ‘standing’, ‘walking’, or ‘sitting’.

  1. We are now ready to split the data into separate training and test We will train the model with the training dataset, and test the model with the test dataset. Therefore, add a Split Data module to the Human Activity Classifier experiment, and connect the output of the Edit Metadata module to the input of the Split Data module. Set the properties of the Split Data module as follows:
    • Splitting mode: Split Rows
    • Fraction of rows in the first output dataset: 0.7
    • Randomized split: Checked
    • Random seed: 123
    • Stratified split: False
  2. Add a Train Model module to the experiment, and connect the Results dataset1 (left) output of the Split Data module to the Dataset (right) input of the Train Model In the Properties pane for the Train Model module, use the column selector to select the class column. This sets the label column that the classification model will be trained to predict.
  3. Add a Multiclass Decision Forest module to the experiment, and connect the output of the Multiclass Decision Forest module to the Untrained model (left) input of the Train Model This specifies that the classification model will be trained using the multiclass decision forest algorithm.
  4. Set the properties of the Multiclass Decision Forest module as follows:
    • Resampling method: Bagging
    • Create trainer mode: Single Parameter
    • Number of decision trees: 8
    • Maximum depth of decision trees: 32
    • Number of random splits per node: 128
    • Minimum number of samples per leaf: 1
    • Allow unknown categorical levels: Checked
  5. Add a Score Model module to the experiment. Then connect the output of the Train Model module to the Trained model (left) input of the Score Model module, and connect the Results dataset2 (right) output of the Split Data module to the Dataset (right) input of the Score Model module.
  6. On the Properties pane for the Score Model module, ensure that the Append score columns to output checkbox is selected.
  7. Add an Evaluate Model module to the experiment, and connect the output of the Score model module to the Scored dataset (left) input of the Evaluate Model module.
  1. Verify that your experiment resembles the figure below, then save and run the experiment.
    Azure-ML-Model-1
  2. When the experiment has finished running, visualize the output of the Score Model module, and compare the predicted values in the Scored Labels column with the actual values from the test data set in the class column.
  3. Visualize the output of the Evaluate Model module, and review the results (shown below). We see the score per class. Then review the Overall Accuracy figure for the model, which should be around 0.994. This indicates that the classifier model is correct 99% of the time, which is a good figure for an initial model, keeping in mind the original distribution of the classification (see below).
    Azure-ML-Metrics
    Azure-ML-Confusion-Matrix

Detailed Accuracy from the original paper

Correctly Classified Instances  164662 .4144 %

Incorrectly Classified Instances   970 0.5856 %

Root mean squared error  0.0463

Relative absolute error   0.7938 %

Relative absolute error   0.7938 %

Azure-ML-Accuracy

3.     Publish your Human Activity Classifier

Publish the Model as a Web Service

  1. Make sure you have saved and ran the experiment. With the Human Activity Classifier experiment open, click the SET UP WEB SERVICE icon at the bottom of the Azure ML Studio page and click Predictive Web Service [Recommended]. A new Predictive Experiment tab will be automatically created.
  2. Verify that, with a bit of rearranging, the Predictive Experiment resembles this figure:
    Azure-ML-Predictive
  3. We can now start to remove variables we don’t need for prediction. Besides eliminating ‘user’, and ‘bmi’ we can now also remove ‘class’, as we want that as output from the model. Therefore, you can drag the Select Columns in Dataset module up, add ‘class’ to be removed, and connect it to the original dataset and the output to the Execute R Script.
  4. Besides, we will make sure to use a numeric value for ‘z4’, so we can move the Webservice input and connect it directly to the Edit Metadata module where we make ‘gender’ categorical.
  5. For this experiment, we will also make sure to send complete records, so we remove the Clean Missing Data.
  6. Delete the connection between the Score Model module and the Web service output module.
  7. Add a Select Columns in Dataset module to the experiment, and connect the output of the Score Model module to its input. Then connect the output of the Select Columns in Dataset module to the input of the Web service output module.
  8. Select the Select Columns in Dataset module, and use the column selector to select only the Scored Labels This ensures that when the web service is called, only the predicted value is returned.
  9. Ensure that the predictive experiment now looks like the following, and then save and run the predictive experiment:
    Azure-ML-Predictive-Setup

 

  1. When the experiment has finished running, visualize the output of the last Select Columns in Dataset module and verify that only the Scored Labels column is returned.

Deploy and Use the Web Service

  1. In the Human Activity Classifier [Predictive Exp.] experiment, click the Deploy Web Service icon at the bottom of the Azure ML Studio window.
  2. Wait a few seconds for the dashboard page to appear, and note the API key and Request/Response You will use these to connect to the web service from a client application.
    Azure-ML-Testing-Model
  3. You have several options to connect to the webservice. To test this webservice, you can click on New Web Services Experience (preview). This will open a new browser.
  4. Here you have the option to test your model (Test endpoint option under BASICS):
    Azure-ML-Test-Endpoint
  5. When clicking on Test endpoint, you have the option to enable the usage of sample data, which will generate a sample record to test your model with:
    Azure-ML-Sample-Data
  6. After enabling this sample data, you will see the generated sample data:
    Azure-ML-Test-Webservice
  7. The final step would be pressing the Test Request-Response button: what kind of activity is this woman doing according to your model?
  8. Another option is to click on the blue TEST button.
    Azure-ML-Test-API
  9. This will open a pop-up window, where you can fill out some test values:
    Azure-ML-Enter-Data
  10. The last option is to open an Excel file, which will automatically create sample data. Opening this file will add the Azure Machine Learning add-in to the workbook. If that doesn’t work, or you don’t have Excel on your laptop, you could follow the next steps to make a workbook online:
  11. Open a new browser tab.
  12. In the new browser tab, navigate to https://office.live.com/start/Excel.aspx. If prompted, sign in with your Microsoft account (use the same credentials you use to access Azure ML).
  13. In Excel Online, create a new blank workbook.
  14. On the Insert tab, click Office Add-ins. Then in the Office Add-ins dialog box, select Store, search for Azure Machine Learning, and add the Azure Machine Learning add-in as shown below:
    Azure-ML-Office-Addin
  15. After the add-in is installed, in the Azure Machine Learning pane on the right of the Excel workbook, click Add Web Service. Boxes for the URL and API key of the web service will appear.
  16. On the browser tab containing the dashboard page for your Azure ML web service, right-click the Request/Response link you noted earlier and copy the web service URL to the clipboard. Then return to the browser tab containing the Excel Online workbook and paste the URL into the URL box.
  17. On the browser tab containing the dashboard page for your Azure ML web service, click the Copy button for the API key you noted earlier to copy the key to the clipboard. Then return to the browser tab containing the Excel Online workbook and paste it into the API key box.
  18. Verify that the Azure Machine Learning pane in your workbook now resembles this, and click Add:
    Azure-ML-Excel-Azure-Key
  19. After the web service has been added, in the Azure Machine Learning pane, it is opened on 2. Predict. Here you have the option to generate sample data by clicking on Use sample data. This enters some sample input values in the worksheet.
  20. Select the cells containing the input data (cells A1 to P6), and in the Azure Machine Learning pane, click the button to select the input range and confirm that it is ‘Sheet1′!A1:P6.
  21. Ensure that the My data has headers box is checked.
  1. In the Output box type Q1, and ensure the Include headers box is checked.
  2. Click the Predict button, and after a few seconds, view the predicted label in cell Q2.
    Azure-ML-Excel
  3. Change some values of row 2 and click Predict Then view the updated label that is predicted by the web service.
  4. Try changing a few of the input variables and predicting the human activity class. You can add multiple rows to the input range and try various combinations at once.

 

Summary

By completing this lab, you have prepared your environment and data, and built and deployed your own Azure ML model. We hope you enjoyed this introductory lab and that you will build many more machine learning solutions!

If you want to download this lab, you can find a pdf version here Microsoft Data Science Azure Machine Learning Workshop (93 downloads)

Meetup #1: Microsoft Data Science Azure Machine Learning Workshop

Microsoft Data Science Azure Machine Learning Workshop

Lab  Setup and Instruction Guide

In this first Microsoft Data Science meetup, hosted by Infi and with guest speaker Jeroen ter Heerdt from Microsoft, we also organized a workshop to get the basics of machine learning on the Azure platform. Continue reading “Meetup #1: Microsoft Data Science Azure Machine Learning Workshop”