Introduction to STRING_SPLIT function in SQL

,

Introduction

Recently, I was peer-reviewing some scripts written by my team and then I happened to stumble upon an interesting piece of code that I thought would be a good idea to share with the community. The script was a simple scalar-valued function that used to return records obtained after splitting a string with a specific separator.

Let me demonstrate the actual use case in simple words. We were working on a report that required us to extract the first 3 parts of an application version number. For e.g., if the version number for an application is 10.1.0.2, then the report had to display only the first 3 parts of the version and eliminate the last i.e. 10.1.0. In the screen print below, the left-hand column AppVersion is the original field that is available in the database, and on the right-hand, the BundleVersion is the derived column that needs to be displayed on the report.

Fig 1 - Original and Derived Fields

In this article, I'm going to describe how can we build a custom function using the STRING_SPLIT function in SQL, that'd help us to achieve the results expected. Alternatively, I'll also explain how to implement the same solution in a custom user-defined function in case the STRING_SPLIT is not available for use.

Solution (using STRING_SPLIT)

Let's take a turn on how this can be achieved in SQL using the STRING_SPLIT function. In order to display the results as expected, we need to build a custom UDF in which we will take advantage of the STRING_AGG function in addition to the  STRING_SPLIT function. I have broken down the steps that we'll be going through in order to achieve the final result.

Let us first create a temporary dataset on which we can apply the function. I have attached the script below which creates a temporary table and inserts the demo data into it as described in the screen print above (Fig 1).

SELECT * INTO #TempWork FROM (
	SELECT 'RCX_M01_SMD 1.0.6'		AS AppVersion UNION
	SELECT 'RCX_D00_BackOffice 7.2.1.0'	AS AppVersion UNION
	SELECT 'RCX_I04_OLAP 2.0.0.0 [2.0.0.5]' AS AppVersion UNION
	SELECT 'RCX_X02_MOB 10.6.2.3'		AS AppVersion 
) temp
GO
SELECT * FROM #TempWork
GO

And the results of this query is as below. As you can see, we now have 4 records inserted into our temp table.

AppVersion
------------------------------
RCX_D00_BackOffice 7.2.1.0
RCX_I04_OLAP 2.0.0.0 [2.0.0.5]
RCX_M01_SMD 1.0.6
RCX_X02_MOB 10.6.2.3
(4 rows affected)

 

Now, let us understand in a step-by-step manner, how can we fetch the only first three parts of the version number eliminating anything after that.

  1. Using the STRING_SPLIT function, extract the top 3 records only from the AppVersion.
  2. Store the results obtained in the previous step in a derived table or a CTE.
  3. Using the STRING_AGG function, concatenate all the records stored in the CTE using the separator value.

The user-defined function is as follows:

/*
Input:	@AppVersion	= '7.1.0.3'
Return:	@BundleVersion	= '7.1.0'
*/
CREATE FUNCTION [dbo].[udf_GetBundleVersionUsingStringSplit]
(
	/* Add the parameters for the function here */
	@AppVersion		VARCHAR(100)
)
RETURNS VARCHAR(100)
AS
BEGIN
	DECLARE @VersionSpliter	CHAR = '.';
	DECLARE @BundleSpliter	CHAR = '.';
	DECLARE @BundleVersion	VARCHAR(100) = '';

        /* 2. Store the results obtained in the previous step in a derived table or a CTE */
	WITH cte_StringSplit AS (
                /* 1. Using the STRING_SPLIT function, 
                      extract the top 3 records only from the AppVersion */
		SELECT TOP 3 [value] AS StringSplitValue
		FROM string_split (@AppVersion,@VersionSpliter)
	)

        /* 3. Using the STRING_AGG function,
              concatenate all the records stored in the CTE using the separator value */
	SELECT @BundleVersion = STRING_AGG(StringSplitValue, @BundleSpliter)
	FROM cte_StringSplit
	/* Return the result of the function */
	RETURN @BundleVersion
	
END
GO

In the script above, we have used the string function STRING_AGG. You can assume that this function works exactly opposite to that of the STRING_SPLIT function, although there are some differences. Let me explain the use of this function quickly.

/* Create a temp dataset
   Insert three records into it */
SELECT * INTO #TempStringAgg FROM (
	SELECT 'Apple'	AS Fruits UNION
	SELECT 'Mango'	AS Fruits UNION
	SELECT 'Banana' AS Fruits
) temp
GO
/* Select values using String_agg()
   All three records are concatenated and separated by a comma */
SELECT STRING_AGG(Fruits, ', ') AS FruitsWithStringAgg FROM #TempStringAgg
GO

And the results is as below.

(3 rows affected)
FruitsWithStringAgg
-------------------------
Apple, Banana, Mango
(1 row affected)

As you can see above, all the three records that are present in the temp table are concatenated into a single record.

 

Executing the Main Function

Now, when we execute the function with the values in the AppVersion column, the results obtained are as below.

SELECT 
	AppVersion
	,[dbo].[udf_GetBundleVersionUsingStringSplit](AppVersion) BundleVersion
FROM #TempWork
GO

Results: Notice, how the BundleVersion has extracted only the first three parts of the AppVersion number.

AppVersion                     BundleVersion
------------------------------ ----------------------------
RCX_D00_BackOffice 7.2.1.0     RCX_D00_BackOffice 7.2.1
RCX_I04_OLAP 2.0.0.0 [2.0.0.5] RCX_I04_OLAP 2.0.0
RCX_M01_SMD 1.0.6              RCX_M01_SMD 1.0.6
RCX_X02_MOB 10.6.2.3           RCX_X02_MOB 10.6.2
(4 rows affected)

Alternate Solution (without using STRING_SPLIT)

Although, it is true that the desired result set can be obtained by using the combination of STRING_SPLIT and STRING_AGG function in a custom UDF, however, there is a catch. The STRING_SPLIT function works only on databases that have a compatibility level of 130 or higher. In our case, the database compatibility level was 100, and so we could not use this function directly in our script. Also, we did not have permission to alter the compatibility level of the database, so this option was already ruled out.

You can use the following script to check the compatibility level of your database.

SELECT 
	[Name]
	,[Compatibility_Level]
FROM [sys].[databases]
WHERE [Name] = 'DatabaseName'

 

Fig 2 - Checking Compatibility Level

Since the compatibility level of our database did not support using the STRING_SPLIT function, we thought of implementing the workaround which is discussed below.

Workaround Explanation

Now, let's understand how can we achieve a similar solution as described in the previous section (using STRING_SPLIT), but, this time without using the STRING_SPLIT function. I have broken down the steps that we'll be going through in order to achieve the final result.

  1. Get the first individual part from the entire AppVersion based on the separator value. In this case, a ".".
  2. Store this initial part in a separate variable.
  3. Remove this part from the original AppVersion along with the ".".
  4. Repeat the steps 1-3 for the next three times until the 3 parts of the AppVersion is stored in the separate variable.

In order to achieve the above steps in SQL, we will be using a WHILE loop and implement it in the following manner. For a better understanding, I've also included a flowchart that will give you a clear idea about the logic that is being implemented in the script (Fig 3).

Fig 3 - While Loop Flowchart

 

The script for the user-defined function is provided below.

-- Input:	@AppVersion	= '7.1.0.3'
-- Return:	@BundleVersion	= '7.1.0'
CREATE FUNCTION [dbo].[udf_GetBundleVersionWithoutStringSplit]
(
	-- Add the parameters for the function here
	@AppVersion	VARCHAR(100)
)
RETURNS VARCHAR(100)
AS
BEGIN
	DECLARE @Number			VARCHAR(100);
	DECLARE @VersionSpliter	CHAR = '.';
	DECLARE @BundleSpliter	CHAR = '.';
	DECLARE @BundleVersion	VARCHAR(100) = '';
	DECLARE @BundlePart		INT = 0;
	DECLARE @CharIndex		INT;
	SET @AppVersion = @AppVersion + '.';
	SET @CharIndex	= CHARINDEX(@VersionSpliter, @AppVersion);
	WHILE @CharIndex > 0
	BEGIN
		SET @Number		= SUBSTRING(@AppVersion, 0, @CharIndex);
		SET @AppVersion = SUBSTRING(@AppVersion, @CharIndex + 1, LEN(@AppVersion));
		SET @CharIndex	= CHARINDEX(@VersionSpliter, @AppVersion);
		IF(@BundlePart < 2)
			SET @BundleSpliter	= '.'
		ELSE
			SET @BundleSpliter	= ''
		IF(@BundlePart < 3)
		BEGIN
			SET @BundleVersion = @BundleVersion + @Number + @BundleSpliter;
			SET @BundlePart = @BundlePart + 1;
		END
	END
	--SELECT @BundleVersion AS BundleVersion
	-- Return the result of the function
	RETURN @BundleVersion
END
GO

In the script above, we have used two string functions CHARINDEX and SUBSTRING. Let me explain the use of each of these functions.

CHARINDEX

The CHARINDEX is a string function that accepts two string values and searches for the occurrence of the first string in the second. If it is found in the second string then, the function will return the position of the first character of the first string in the second string.

For example, let us consider the first string - "SQL" and the second string  - "I love SQL Server Central". Using CHARINDEX in the script below.

SELECT CHARINDEX('SQL','I love SQL Server Central') SQLIndex

Results
SQLIndex
-----------
8
(1 row affected)

The result returned is "8" as the first letter of "SQL" occurs in the 8th position in the string "I love SQL Server Central".

SUBSTRING

This is another interesting function that is often used in string manipulations. It is used to extract a string from a given string value. This function accepts three parameters as follows -

  1. A string value - "I love SQL Server Central"
  2. An integer value for the start position from where to extract  - 8
  3. Another integer value to mark the position until where the string is to be extracted. - 10

Implementing the above function in the script below.

SELECT SUBSTRING('I love SQL Server Central',8,10) SQLSubString

Results
SQLSubString
------------
SQL Server
(1 row affected)

As you can see, the function accepts the string in the beginning and starts reading it from the 8th position, i.e. "S" and counts 10 positions from there. It ends on the "r" such that the entire string is returned between the start and the ending positions.

Executing the Main Function

Now, let us execute the custom function without using the STRING_SPLIT function.

SELECT 
	AppVersion
	,[dbo].[udf_GetBundleVersionWithoutStringSplit](AppVersion) BundleVersion
FROM #TempWork
GO

And, the result is as below.

AppVersion                     BundleVersion
------------------------------ -----------------------------
RCX_D00_BackOffice 7.2.1.0     RCX_D00_BackOffice 7.2.1 
RCX_I04_OLAP 2.0.0.0 [2.0.0.5] RCX_I04_OLAP 2.0.0 
RCX_M01_SMD 1.0.6              RCX_M01_SMD 1.0.6 
RCX_X02_MOB 10.6.2.3           RCX_X02_MOB 10.6.2 
(4 rows affected)

As you can see, we have obtained similar result-set as implemented in the previous section of this post.

Takeaway

In this article, we have seen how can we build a custom UDF by using the STRING_SPLIT and STRING_AGG functions in SQL. We have also explored and alternate solution to achieve the same results in case the STRING_SPLIT function is not available for use. Also, we have discussed in brief about the other string functions - STRING_AGG, CHARINDEX, and SUBSTRING.

Rate

2.6 (5)

Share

Share

Rate

2.6 (5)