Part 2: Predictive Modeling using R and SQL Server Machine Learning Services



Part 1 of Predictive Modeling using R and SQL Server Machine Learning Services covered an overview of Predictive Modeling and the steps involved in building a Predictive Model. Using our sample dataset – Ski Resort rental data – we wanted to predict RentalCount for the year 2015, given the variables – Month, Day, Weekday, Holiday and Snow.

Part 1 covered:

– Getting data from a SQL Server database

– Preparing data for modeling

– Training models

– Comparing results and finalizing a model

We found that for this problem, predictions using the Decision Tree model were more accurate than the Linear Regression model. SQL Server Machine Learning Services (MLS) lets us train and test Predictive Models using R or Python, in the context of SQL Server. Thus, we can build T-SQL programs that contain embedded R/Python scripts that train on data stored in the database.

Deploy Machine Learning code with SQL Server

In this part, we will deploy the R code we wrote in Part 1 to SQL Server. To deploy, we will store the trained model in database and create a stored procedure that predicts using the model. This stored procedure can be invoked from applications.

1. Create Table for storing the model: Here, we create a table in SQL Server to store the trained model. The model will be used for prediction in step 3.


2. Create Stored Procedure for generating the model: This stored procedure will use the R scripts we wrote in Part 1 utilizing sp_execute_external_script introduced in SQL Server 2016. To execute sp_execute_external_script, first enable external scripts by using the statement – sp_configure ‘external scripts enabled’, 1;

The function to generate Decision Tree model – rxDTree – is part of the RevoScaleR package for R. RevoScaleR package includes numerous other R functions for importing, transforming, and analyzing data at scale. Point to note is that the functions run on the RevoScaleR interpreter, built on open-source R. It is engineered to leverage the multithreaded and multinode architecture of the host platform, meaning when R code executes within a SQL Server SP, it utilizes parallel processing.



3. Create Stored Procedure for prediction: Now that we have the model output, we can create an SP that would use the model to predict rental count for new data. Again, we are using the R code covered in Part 1, only that this time we are using it in a SQL Stored Procedure.





Isn’t that just awesome? We have a Predictive Model that can be used within applications to predict rental count. Now that we have covered a sample project, in Part 3  of the series, I will share my experience using SQL Server Machine Learning Services to solve a problem at my work.

Before we conclude Part 2,

Predict using Native Scoring (SQL Server 2017*): In SQL Server 2017, Microsoft has introduced a native predict function. What this means is we do not need to run R/Python code in a SQL stored procedure to do the actual prediction. Native scoring uses native C++ libraries that reads a trained model stored in binary format (in our case in a SQL Server table), and generate scores for new input data.





Scripts for Part 2:

SQL Server ML Tutorial:
Native Scoring:
Forecasts and Prediction using SQL Server MLS: