Azure Machine Learning: how to easily improve your model

Azure Machine Learning example: Regression Demand EstimationImproving your Azure Machine Learning model

In this example we start with a sample experiment from the Microsoft Azure Machine Learning Gallery: Regression: Demand estimation. In this example there are four models built, and compared, based the newly created features. We will explore whether standard operations could improve these samples models, inspired by the e-book Data Science in the Cloud with Azure Machine Learning and R of Stephen Elston.  We haven’t used the suggested new features that depend on prior info which wasn’t always complete, but created some other variables, that could be created based on the available dataset. Observing the sample, there are basically two areas where we see quick possibilities for improvement: data cleaning and transformation and evaluation of the results.

Data cleaning and transformation

What kind of data are you dealing with? In this case it’s rental bike information over time. When dealing with time series, you might want to detrend your dependent variable (amount of rented bikes). You can also check for multicollinearity (see chapter 6: Econometrics):  a very high correlation could cause problems for some types of predictive models. But please remember that correlation doesn’t mean causation! Standardizing your numeric features if they don’t have the same scale can also improve your results. Finally, check for outliers.

Evaluation of the results

If you check the results from the Evaluate Model box, the coefficients of determination are not that different, but please also check the errors:

result table AzureML Regression Demand Estimation

The information you don’t get from this step, is an overview of your predicted values regarding the original amounts. This can be realized with an R script. In this case, as we have information of bike renting per hour, it is recommended to check your results per hour as well. Here you can see where the models differ:

results AzureML Regression Demand Estimation

As you can observe, the original models have quite some peaks. Although we have to avoid overfitting, it seems that simple adjustments can give better results.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.