The Stairway to Machine Learning Services series is designed to provide both the information needed to install and configure SQL Server’s Machine Learning Service as well as an introduction to both languages. This series will provide answers to those who want to better understand what is being installed on SQL Server and what it does. After the service is installed, this series will provide and introduction to R and Python and provide samples and for those people who are interested in learning to code. Machine Learning Service, new to SQL Server 2017, now provides the capability to incorporate both R and Python. This stairway will start cover everything about Machine Learning Services to help you get up to speed on configuring SQL Server to incorporate the languages as well as getting started on writing Python.
Why is Machine Learning Services Important?
Companies are looking for data to provide more answers. There are numerous articles on how to analyze data to better retain customers, sell more products and increase revenue as part of a data science or machine learning solution. The methods for deriving this information are achieved using complex algorithms and statistical packages, which are found in R and Python. Most database providers, including Microsoft, have incorporated R and Python languages as a method to increase the use of databases as part of an advanced analytic distribution. Moving beyond SQL to analyze data will provide the data professional skills that are going to be used far into the future. Machine Learning Services with SQL Server can be part of that process.
Installing Machine Learning Service
R was first introduced in SQL Server 2016, and it was called R Services. For SQL Server 2017, this same service was renamed to Machine Learning Services and expanded to include Python. The Machine Learning Service is available in every version of SQL Server, from Standard to Enterprise and every version in-between. In the install there are three options, installing the Machine Learning Service, then selecting R and/or Python as you see in the attached picture.
Looking further at the install, screen you may also notice a reference to the Machine Learning Standalone Server. This option will install the Machine Learning Service on a separate server, which is done to provide additional scalability for Machine Learning code. You will want to check with your Microsoft representative to determine what the additional cost of scaling to an additional server, as this is not covered under a SQL Server License. It is recommended that if you are just starting with R and Python coding with SQL Server that you start with the integrated service and scale out as your performance increases. Moving the processing to a different server means that you will be migrating a lot of data to that server from SQL Server, so you will want to make sure that your network supports it and to decrease latency the server is on the same subnet as SQL Server containing your data.
Machine Learning Service Install Requires Internet Access
As R and Python are both open source languages, they need to be installed from their respective organizations. To do this, the installation will try to connect to the Internet and download them. If your server is set up not to allow connections to the Internet, you will be prompted to download R and Python from another location and copy them to the server, using the window shown below. The links isted in the screen will download R and Python components into 4 different cab files. Place these files into the install path and the installation can continue. If you have Internet access, the installs will be downloaded from the respective sites.
What you see after the Installation
To validate that the installation has completed successfully and Machine Learning Service is running, open up the Windows Services. As you see by the screen below highlighted in purple, a SQL Server Launchpad service is running. This service is installed by Machine Learning Service and must be running to execute R or Python code.
The other thing that DBAs will most certainly notice, is there are 20 new users added to SQL Server. The installation creates by default 20 different external users, which are used to call R or Python. You can see these new users in the Computer Management Screen.
If you have installed SQL Server using the default name, the accounts will be called MSSQLServer01 to MSSQLServer20. If SQLServer is installed using an instance name, the instance name replaces MSSQLServer, so the users would be called something like SQLServerInstance01 to SQLServerInstance20. These users are required to call R or Python and should not be deleted or renamed. If you want to modify the number of users added, open up SQL Server Configuration Manager, and click on SQL Server Launchpad, then right click and select the properties. In the popup window which appears, click on the Advanced Tab.
The number of External Users will be listed here. If you decrease this number below 20 performance of Machine Learning Service will be adversely impacted. If you decrease the number of users to 1, SQL Server Launchpad will no longer execute. The minimum number of users is 2, which is not recommended because the service will then wait for the user to be done executing the individual process. This means on any SQL Server instance, only one call of R or Python code can be executed at a time, which will adversely impact performance.
File Directories for Installed Components
There will be a subdirectory created for each user, with the name of the SQL Server instance name, which is by default MSSQLSERVER, followed by a number 00-20. These subdirectories have nothing in them, as they are used temporarily when R or Python needs them and the information in them is eventually deleted by SQL Server. The default location is C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\ExtensibilityData.
The R tools are located in the C:\Program Files\Microsoft SQL Server \MSSQL14.MSSQLSERVER\ R_SERVICES directory, and include RLaucher.dll which will load R.
The Python tools are located in the C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES directory, and include Pythonlauncher.dll which will load Python 3.5. The installation includes the Anaconda distribution for Python, which includes not only the data science components of Anaconda, but also SciKitLearn and Pandas. Microsoft has also included machine learning algorithms which were created for Python in the microsoftml and revoscalepy libraries.
Installation Configuration and Validation
After the components are installed,there are some configuration steps that must be completed to permit R to run on SQL Server. From within an SSMS query window, the following script needs to be run to enable R
sp_configure 'external scripts enabled', 1 GO Reconfigure GO
After this step completes successfully, a restart of SQL Server Services is required. After the restart, if you execute the command sp_execute_external_script and provide the language parameter and a script, you can run code in R and Python. To check to see if R is working properly, run the following code from within and SSMS query window.
EXEC sp_execute_external_script @language =N'R', @script=N'OutputDataSet <-InputDataSet', @input_data_1 =N'SELECT 1 as CheckToSeeIfRIsWorking' WITH result sets (([CheckToSeeIfRIsWorking] int not null)); GO
When run successfully, this script will return a 1. SQL Server is now ready to run R. To check to see if SQL Server 2017 can run Python, run the following script:
EXEC sp_execute_external_script @language = N'Python', @script = N'print("Hello World :)") import sys import os print(sys.version) print(os.getcwd())'
This command will return text for Hello World : ) and using two of the libraries installed as part of the Machine Learning Services, will also return the version of python installed and the current working directory where the code was run.
Hello World :) 3.5.2 |Continuum Analytics, Inc.| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)] C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\ExtensibilityData\MSSQLSERVER01\C83CF7FB-C1D6-4526-9EED-33E549085AD2
In the first part of the Stairway to Machine Learning Service, we reviewed why you might want to install the components and the steps required for installation. This article also covers what is installed with Machine Learning Services and where the components are located. In the next part we will review what these components do and how they interact with SQL Server.