SQLServerCentral Article

SQL Server and AI: Integrating Machine Learning Services

,

Have you ever wished you could run your data science experiments right where your data lives, without the hassle of moving huge files around? That’s exactly what Microsoft set out to solve with SQL Server Machine Learning Services (MLS). Starting with SQL Server 2016 and really hitting its stride in 2017, MLS lets you run Python and R code directly inside your database. No more exporting terabytes of data just to build a model, now, the analytics come to your data.

Why Bring Machine Learning Into SQL Server?

Let's be honest: moving data is cumbersome. As companies gather more data, transferring it between systems for analysis becomes costly and slow. MLS changes this by enabling data analysis where it already exists. This approach enhances speed, security, and convenience for data engineers and data scientists alike. Moreover, by centralizing everything, teams can collaborate more efficiently. There's no need to exchange files or worry about version discrepancies.

Here's the magic: SQL Server has a service called Launchpad. When you want to run a Python or R script, you use a special stored procedure (sp_execute_external_script). SQL Server handles creating a secure environment, transferring your data, executing your code, and returning the results, all without leaving the database.

And don't worry about missing your favorite libraries. MLS includes popular packages like pandas, scikit-learn, and forecast, and you can add more if needed.

Let’s Walk Through an Example

Imagine you’re working with the AdventureWorks sample database, and you want to predict the final bill (TotalDue) for each order based on its subtotal, tax, and freight. Here’s how you might tackle that, step by step.

Step 1: Flip the Switch

First things first, you need to tell SQL Server it’s okay to run external scripts. It’s a quick setup:

1 EXEC sp_configure 'external scripts enabled', 1;
2 RECONFIGURE;
3-- You might need to restart SQL Server after this

Step 2: Build and Run Your Model

Now for the fun part. You’ll write a Python script that grabs the first 1,000 orders, cleans up any missing data, splits it into training and test sets, and fits a simple linear regression model. Here’s what that looks like:

1 DECLARE @py NVARCHAR(MAX) = N'
2 import pandas as pd
3 from sklearn.linear_model import LinearRegression
4 from sklearn.model_selection import train_test_split
5 from sklearn import metrics
6
7# Clean up the data
8 df = InputDataSet.dropna()
9
10# Pick out the features and what we want to predict
11 X = df[["SubTotal","TaxAmt","Freight"]]
12 y = df["TotalDue"]
13
14# Split into training and test sets
15 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
16
17# Train the model
18 model = LinearRegression().fit(X_train, y_train)
19
20# Make predictions and see how we did
21 y_pred = model.predict(X_test)
22 r2 = metrics.r2_score(y_test, y_pred)
23 rmse = metrics.mean_squared_error(y_test, y_pred, squared=False)
24
25# Package up the results
26 out = X_test.copy()
27 out["Actual"] = y_test
28 out["Predicted"] = y_pred
29 out["R2"] = r2
30 out["RMSE"] = rmse
31 OutputDataSet = out
32';
33
34 EXEC sp_execute_external_script
35  @language = N'Python',
36  @script = @py,
37  @input_data_1 = N'
38    SELECT SubTotal, TaxAmt, Freight, TotalDue
39    FROM Sales.SalesOrderHeader
40    WHERE SalesOrderID <= 1000';

Let’s break it down. First, SQL Server hands off the data to Python as a DataFrame. We clean out any incomplete rows, because missing values can throw off our model. Then, we pick which columns to use as inputs (SubTotal, TaxAmt, Freight) and which one we want to predict (TotalDue).

Splitting the data into training and test sets helps us see how well our model might perform on new, unseen data. After training, we check two key metrics: R² (which tells us how much of the variation in TotalDue our model explains) and RMSE (which gives us the average prediction error in dollars).

If you see an R² of 0.87, that means your model explains 87% of the variation in TotalDue. An RMSE of $12.34? On average, your predictions are within $12.34 of the actual value. Not bad!

Taking It Further: From Experiment to Production

Once you’re happy with your model, you can wrap it up in a stored procedure. That way, you (or your colleagues) can reuse it anytime. Want to retrain the model every night? Schedule it with SQL Agent. Need to store the trained model for later? Save it in a table. You can even set up real-time scoring for things like fraud detection or product recommendations, all without moving your data out of SQL Server.

Tips for Success

  • Use SQL Server’s Resource Governor to keep your analytics from hogging all the resources.
  • Keep your scripts in version control, just like any other code.
  • Monitor performance and errors with Extended Events or SQL Agent alerts.
  • Rest easy knowing your data never leaves the secure database environment.

Wrapping Up

SQL Server Machine Learning Services brings the power of Python and R right to your data. It’s fast, secure, and makes life easier for everyone involved in analytics. Whether you’re just exploring or building production pipelines, MLS gives you the tools to do it all, without ever leaving the comfort of your database.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating