In the first part of the series on Machine Learning in SQL Server, we reviewed how to install Machine Learning Services (MLS) as part of SQL Server. By default they are not installed and if you want to use them, this is the first step. In the second part of the series we reviewed what was installed and how the different component parts work together when using Machine Learning services in SQL Server. Now we are going to take a look at writing R code within SQL Server.
In addition to SQL Server, writing R in requires installing R tools. As part of the MLS install, you could use the R GUI which is part of the install, but as the R GUI is very light on features and SQL Server Management Studio does not contain any R features, you are going to need to install a R GUI to write R code. The most common application used for writing R is R Studio, which is a free GUI created to write R. Microsoft has also created two other environments for writing R code Visual Studio with R tools installed and VS Code with R tools. All are free and offer a similar coding environment for developing R.
In addition to installing an R GUI, the other software you will need to install is the Microsoft R client. This set of libraries is required to use the feature sets provided by Machine Learning Services in SQL Server. Even if you have SQL Server installed on the same hardware as you are developing your R code, the Microsoft R client must be installed in order to use the development features of machine learning services in SQL Server. These topics are discussed in detail in order that you may understand what these tools do and why you need them.
Visual Studio with R Tools
Since version 2015 Microsoft has provided a plug-in for Visual Studio [VS], R Tools for Visual Studio [RTVS], to write R. For VS2015 you can download it at https://visualstudio.microsoft.com/vs/features/rtvs/ for VS version 2017 it is now included as part of the Data Science Workload. If you are using 2019, you cannot use R tools as it has been removed from VS 2019.
To install RTVS in VS 2017, select the Tools menu from Visual Studio, then select the Get Tools and Features… item on the top of the menu as shown below in Yellow.
You will then have to click on an OK button to let VS be modified. If your installer has not been updated recently you may be prompted to update the installer as a prerequisite to being able to install the Data Science Workload, or anything else. The leftmost tab of the installer contains a section labeled Workloads. From this section, select Data Science and analytics applications.
From the data science workload optional list, select R language support, Runtime support for R development tools, and Microsoft R client. While there are other options listed for Python support, I do not recommend installing anything else, even if you want to support all of the languages for Machine Learning Services. For Python, the environment provided in Visual Studio Code, is better suited to writing Python for Machine Learning Services.
Visual Studio Code with RTVS
Visual Studio Code, or as it is more commonly known, VS Code, is a more recent GUI which Microsoft has created to edit code. It is natively designed to run on either Windows, Linux or Mac operating systems, and contains libraries which allow it to be extended to support a number of different languages, such as Python and of course R. To use R with VS Code, you will need to install an R extension from the marketplace, and the one which includes RTVS, is R Tools which is still in preview as of this writing.
The most popular method for writing R code is R studio. It has a smaller memory footprint than VS, which makes sense since it supports one language, R. R Studio has an open source license as well as paid versions. You can download the latest version here https://www.rstudio.com/products/rstudio/download/. Unlike VSRT, the R Studio install does not include the Microsoft R Client, so you will need to install that. The latest version is here: https://docs.microsoft.com/en-us/machine-learning-server/r-client/install-on-windows
RTVS was modeled after the R Studio, so the look and feel of the two applications is the same. Both have project capability, source control integration, help, intellisense, and a quadrant pane layout for writing, running, graphics and history. Starting from the top right, the four primary quadrants are File Workspace, Environment and History, Console and Files, Plots and Help. You can move these around, but this is the default space.
The top left holds all of your code. R is an interactive language, and you can run the code one line at a time or an entire block by highlighting and using the run button or Ctrl+Enter keys on the interface or the contents of the entire file by holding the Ctrl+Shift+Enter keys. The code appears in the Console quadrant on the bottom left of the screen. Lines of code can be executed in the console section as well. Among most common things entered in the console questions are commands to load help files.
Typing a question mark, help(), and an r command, for example "? head" or "help(“head”)", will load a description in the tab on the bottom right of the screen in the Help tab. The Plots tab is displayed in the picture and will include here contains all of the plots executed in your code. You can use the arrow keys in the window to scroll through them. You also have the ability to save the plots as jpeg or PDF files. The quadrant on the top right contains the Environment tab, which has listing of the variables and functions loaded by your code. The History tab contains a list of the items executed by your code so that you have a running history of what you have done. This history can be saved for future reference.
Once R is installed, there are a number of things that you will want to configure. Like many applications, you can use the defaults, but performance and maintainability will improve if you customize some of the functionality including setting the working directory and specifying the R executable that you want to use within your code.
One of the great things about R is the number of packages that exist. The R language contains many packages that provide encapsulated functionality to perform tasks like creating graphics or machine learning algorithms. These libraries are mostly maintained on the R Open Source Foundation’s Comprehensive R Archive Network site, https://cran.r-project.org/. The other place you might be using to download R libraries is https://mran.microsoft.com/ or you might be getting the libraries from Github. No matter where you download your libraries, R will access them based upon your working directory. If you have limited hard drive space or just want to monitor what your working directory is within R Studio you can look at the Tools->Global Options in R Studio. VSRT Tools->R Tools ->Options->RTools Advanced contains the working directory in 2015 but not in 2017.
Of course, you can always use an R command to look at the working directory, by typing getwd()in the console window, which is called R interactive in RTVS and then hitting enter. The location that you see is where all of the R libraries will be loaded. As a standard practice these are not removed. If you are running out of storage space, you can delete the libraries and reload them when you need to use them. If you are using both RStudio and VSRT you will want to set the working directory to the same location so that the libraries are not downloaded twice. You can change the text in the window or change the working directory using the R command setwd(DIR) where DIR is the location of your working directory, for example setwd(“D:\\RWorkDir”) to set the working directory to D:\RWorkDir. Notice that you must surround the location with double quotes and add an additional slash in the path.
Setting R executable
Since you have already installed Microsoft R Client, you now have 3 or 4 different versions of R installed on your computer depending on what you have installed on it. The default locations are listed here and your R and R Open version numbers may be slightly different.
- Open R distribution - c:\Program files\R\R-3.5.1
- SQL Server Client Tools - C:\Program files\Microsoft\R Client\R_Server
- SQL Server Machine Learning - C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\R_SERVICES
- Microsoft R Open - C:\Program Files\Microsoft\R Open\R-3.5.0
Of course, there are differences in each version of these executables. If you want to use the version which will allow you to use all of the additional features included in the SQL Server Machine Learning version, you will need to specify to use C:\Program files\Microsoft\R Client\R_Server. To ensure that the code you are writing will be using that version, you need to check what version your GUI is using. If you are using R Studio you can look at the Tools->Global Options, and see which version is being used.
Otherwise you can type the R command R.home() in the console window. R will likely return text similar to "C:/PROGRA~1/MIE74D~1/RCLIEN~1/R_SERVER" R natively only understands the 8.3 naming convention, which can be problematic if you have two different subdirectories where the first 8 characters are exactly the same. VSRT will natively select the appropriate location when the Microsoft R client tools are installed.
Writing R code for SQL Server
Now that you have R configured properly, it’s time to write some code. The first thing we want to do is create a connection to SQL Server. The # (hashtag) indicates a comment line
#Create a connection to your Machine Learning SQL Server instance sqlConnString <- "Driver=SQL Server;Server=(local);Database=TestR;Trusted_Connection=yes" #Connect to the database serverside <- RxInSqlServer(connectionString = sqlConnString)
Using the command RxInSqlServer, which is case sensitive, will allow you to make a connection to a SQL Server Database and set the context to where you will be running your R code, either using the local resources on your PC or the resources on the server where the SQL Server is located. Execute the following code to see what the current setting is
# Microsoft R command to see where the code is running rxGetComputeContext()
There are two possible answers to this command, either local or on the server. When you connect, you are running server side. If you want to run using the resources on your PC, set the compute context to local using this R code.
#Change the compute context to run locally rxSetComputeContext("local") # Validate where the code is running rxGetComputeContext() # Set the compute context back to the server. rxSetComputeContext(serverside)
R allows you to use libraries to be able to write SQL. Using the rxSQLServerData function, you are able to write native T-SQL and use it within R code. With this library, from within R you can create a CTE and use it within a query designed to return a complex dataset which can be used in R code. The code listed here saves T-SQL to a string variable named sqlQuery. The code will be executed on SQL Server as the previous R command changed the context to the server. The result set from the query will be stored in a dataset variable named data.
# Write a simple query and create a summary in R sqlQuery <- "WITH nb AS (SELECT 0 AS n UNION ALL SELECT n+1 FROM nb where n < 9) SELECT n1.n+10*n2.n+100*n3.n+1 AS n, ABS(CHECKSUM(NewId())) rxSummary( formula = ~ ., data = RxSqlServerData(sqlQuery = sqlQuery, connectionString = connectionString), computeContext = RxInSqlServer(connectionString = connectionString) )
In this part of the stairway, you configured your computer to run R and then wrote some code pointing to SQL Server to run it. In the next post we will be looking at the steps to configure your computer to run Python and look at writing some code for SQL Server as well.