Technical Article

A varchar(MAX) STRING_SPLIT function for SQL 2012 and above

,

Introduction

A table-valued function that will split a string into rows based on a delimiter, for example, a csv line. The result is a single-column table.

This function has the same signature as the SQL Server STRING_SPLIT function introduced in SQL Server 2016. this means if you have not yet upgraded to SQL 2016 you can upgrade your code in advance to use this function and when you do upgrade to SQL 2016 or higher there will be minimal changes to make (just remove 'dbo.' from the function call).

An advantage of this function over Microsoft's SPLIT_STRING function is that the delimiter can be longer than one character.

It will split strings up to nvarchar(MAX) or varchar(MAX) in length (2 GB).

Arguments

@string

Is a character type (for example, nvarchar, varchar, nchar, or char).

@separator

Is a character type (for example, nvarchar, varchar, nchar, or char).

Usage

To split a sentence into words:

SELECT value
  FROM dbo.STRING_SPLIT('Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.', ' ');

To split some text into sentences, partial sentences and words:

DECLARE @string    nvarchar(MAX)
SELECT @string='Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.'
SELECT @string FullText,
       Sentence.value Sentence,
       PartSentence.value PartSentence,
       Word.value Word
  FROM dbo.STRING_SPLIT(@string, '.') Sentence
 CROSS APPLY dbo.STRING_SPLIT(Sentence.value, ',') PartSentence
 CROSS APPLY dbo.STRING_SPLIT(PartSentence.value, ' ') Word
Split a column on a table:
SELECT *
  FROM myTable t
 CROSS APPLY dbo.SPLIT_STRING(t.csvColumn,',') X

Requirements

This function will split a string into multiple rows based on a delimiter within the string.

Its operation is very similar to Microsoft's SPLIT_STRING function, so if you intend to move to SQL Server 2016 or higher can use this then amend the code to run using Microsoft's SPLIT_STRING with very little effort (just change dbo.SPLIT_STRING to SPLIT_STRING).

This is a useful function if you need to split strings that are longer than 8,000 characters and you are running SQL Server 2012 or higher.

Pros and Cons

This is an alternative for SQL 2016's STRING_SPLIT function, it will work on SQL Server 2012 and above (SQL Server's SPLIT_STRING was introduced in SQL Server 2016).

The advantage of this function over the Microsoft's STRING_SPLIT function, apart from it working on SQL Server 2012, is it also allows the string delimiter to be more than one character in length.

The function also returns an integer position column to indicate the position of the column in the original string.

A disadvantage is that it is slightly slower than Microsoft's STRING_SPILT function for very large strings.

References

Jeff Moden:  Tally OH! An Improved SQL 8K “CSV Splitter” Function

Eirikur Eiriksson: Reaping the benefits of the Window functions in T-SQL

Microsoft:  STRING_SPLIT (Transact-SQL)

The Function dbo.SPLIT_STRING

IF OBJECT_ID('[dbo].[STRING_SPLIT]','IF') IS NULL BEGIN
    EXEC ('CREATE FUNCTION [dbo].[STRING_SPLIT] () RETURNS TABLE AS RETURN SELECT 1 X') 
END
GO
ALTER FUNCTION [dbo].[STRING_SPLIT]
(
    @string    nvarchar(MAX), 
    @separator nvarchar(MAX)
)
RETURNS TABLE WITH SCHEMABINDING 
AS RETURN
   WITH X(N) AS (SELECT 'Table1' FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) T(C)),
        Y(N) AS (SELECT 'Table2' FROM X A1, X A2, X A3, X A4, X A5, X A6, X A7, X A8) , -- Up to 16^8 = 4 billion
        T(N) AS (SELECT TOP(ISNULL(LEN(@string),0)+1) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 N FROM Y),
        Delim(Pos) AS (SELECT t.N FROM T WHERE SUBSTRING(@string, t.N, LEN(@separator+'x')-1) LIKE @separator OR t.N = 0),
        Separated(value,position) AS (SELECT SUBSTRING(@string, d.Pos + CASE WHEN Pos=0 THEN 1 ELSE LEN(@separator+'x')-1 END , LEAD(d.Pos,1,2147483647) OVER (ORDER BY (SELECT NULL)) - CASE WHEN d.Pos=0 THEN 2-LEN(@separator+'x') ELSE d.Pos END - (LEN(@separator+'x')-1)),
                                             d.pos      
                                        FROM Delim d
                                       WHERE @string IS NOT NULL)
       SELECT LTRIM(s.value) value,
              ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) position
         FROM Separated s
        WHERE s.value <> @separator;
GO

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating