Setting up R in SQL 2016 proved to be trickier than expected. After spending almost two weeks, I finally managed to get my R code (association rules using the apriori function) to work in SQL 2016 on my laptop. There were many errors that I ran into while getting the environment to work as a whole, so I thought that I would document everything in the hope that it will help someone else avoid the frustrations that I had.
The first thing that you need to do is install SQL Server 2016 (I installed the Developer edition) and make sure that you tick the “R Services (In-Database)” option
Do not worry about installing “R Server (Standalone)” as we will be installing a newer version in step 3.
Once your SQL 2016 instance is installed, you need to install SP1 with the latest cumulative update. I installed the “KB4013106” update (the earlier one didn’t work for me) which can be found here: https://support.microsoft.com/en-us/help/3177312
Make sure that “R Services (In-Database)” shows a green tick indicating that it was correctly updated, otherwise step 4 will not work.
The next step is to install R Server 9.1 for Windows which can be downloaded here: https://msdn.microsoft.com/en-us/microsoft-r/rserver-install-windows. Make sure that you are installing 9.1 or newer.
Once R server has finished installing, you need to bind the SQL 2016 R Services to it so that you have one central place from which R is kept up-to-date. To do this open the command line window as Administrator, navigate to your R Server Setup directory, and check the names of your SQL instances by typing:
Then to bind the instance you type:
SqlBindR.exe /bind <YOUR INSTANCE NAME>
This binding process will take quite a few minutes to run so be patient. If you receive an error when trying to bind the instance, it means that you have not updated your SQL Server instance properly so make sure that you installed the correct / latest update.
The next step is to open Management Studio and enable external scripts. After enabling the scripts, restart your server instance and check to see if the “run_value” is set to “1”.
EXEC sp_configure 'external scripts enabled', 1 Reconfigure with override --Restart the server instance, then check using: EXEC sp_configure 'external scripts enabled'
One of the major problems that I found was getting my R code to work with a large dataset (over 100,000 rows). By default, SQL Server only allocates 20% of your machine’s memory to external scripts. This can cause errors messages from R like:
“Error: cannot allocate vector of size X”
In order to increase the memory allocation to 35%, I had to run the following SQL script:
SELECT * FROM sys.resource_governor_resource_pools WHERE name = 'default' SELECT * FROM sys.resource_governor_external_resource_pools WHERE name = 'default' ALTER RESOURCE POOL "default" WITH (max_memory_percent = 65); ALTER EXTERNAL RESOURCE POOL "default" WITH (max_memory_percent = 35); ALTER RESOURCE GOVERNOR reconfigure; CREATE EXTERNAL RESOURCE POOL ds_ep WITH (max_memory_percent = 35); CREATE WORKLOAD GROUP ds_wg WITH (importance = medium) USING "default", EXTERNAL "ds_ep"; ALTER RESOURCE GOVERNOR WITH (classifier_function = NULL); ALTER RESOURCE GOVERNOR reconfigure; USE master GO CREATE FUNCTION is_ds_apps() RETURNS sysname WITH schemabinding AS BEGIN IF program_name() in ('Microsoft R Host', 'RStudio') RETURN 'ds_wg'; RETURN 'default' END; GO ALTER RESOURCE GOVERNOR WITH (classifier_function = dbo.is_ds_apps); ALTER RESOURCE GOVERNOR reconfigure; SELECT * FROM sys.resource_governor_workload_groups; SELECT * FROM sys.resource_governor_external_resource_pools;
For further information about this script, please check the following link: https://docs.microsoft.com/en-za/sql/advanced-analytics/r/how-to-create-a-resource-pool-for-r
If you need to load any R libraries for your R model, you’ll need to install. To do this open the command line window as administrator and run R.exe from your SQL 2016 instance’s R_Services directory:
In R you need to run the following commands in order to install the required package (eg. “arules”):
sql.lib <- “C:/Program Files/Microsoft SQL Server/MSSQL13.MSSQLSERVER2016/R_SERVICES/library”
Please note the “/” is used and not the “\” in the path
Finally, you’ll be able to run your R code in T-SQL as per the following example:
DECLARE @Rscript AS NVARCHAR(MAX) SET @Rscript = 'library(arules) trans <- as(split(inputData[,"Product"], inputData[,"InvoiceNumber"]), "transactions") rules <- apriori(trans, parameter = list(support = 0.01, confidence = 0.2, minlen = 2)) clean_rules <- rules[!is.redundant(rules)] filtered_rules <- head(as(sort(rules, decreasing = TRUE, na.last = NA, by = "confidence"), "data.frame"), 30); ' EXEC sp_execute_external_script @language = N'R' , @script = @Rscript , @input_data_1 = N'SELECT CAST([InvoiceNumber] AS VARCHAR(10)) AS InvoiceNumber, [Product] FROM [Test].[dbo].[Temp]' , @input_data_1_name = N'inputData' , @parallel = 1 , @output_data_1_name = N'filtered_rules' WITH RESULT SETS ((Rules VARCHAR(MAX), Support FLOAT, Confidence FLOAT, Lift FLOAT));
If you want to point your Power BI desktop designer to the same R server, you can set it under options: