A varchar(MAX) STRING_SPLIT function for SQL 2012 and above

  • Kristen-173977 wrote:

    Jonathan AC Roberts wrote:

    Or you could just add a WHERE clause to filter them out as it's a table valued function.

    Yes ... but my Splitter returns an Item No, and if I use WHERE afterwards I have gaps in the Item No, so I would then also have to add code to solve that, every time I have that requirement, whereas if I get my Splitter to remove them that function returns contiguous Item Nos.

    I still have the option of the Splitter not removing them, so I can have the data "raw" and post-process if that is more appropriate.

    I'm always interested in what other folks come up with in the area of splitters, especially those that can handle LOB datatypes.  Would you post your splitter, please?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Would you post your splitter, please?

    Sadly that requires more time than I have readily available to sanitise and remove any naming etc. that is "internal" ... which in turn would mean testing to make sure that I hadn't goofed something up ...

    What I can do more easily is to catalogue the splitters I have / use

    CREATE FUNCTION dbo.FN_DelimSplitChar
    (
    @strSourcevarchar(8000)
    , @strDelimitervarchar(10) = ','-- NOTE: Delimiter is CASE SENSITIVE
    , @strBlankvarchar(8000) = '[DELETE]'-- What to return if an item is blank? (e.g. '', NULL or '[BLANK]')
    -- Use '[DELETE]' to delete any NULL items
    )
    -- Split a string based on delimiter. Returns a Character String (which is NOT Trimmed)
    RETURNS @tblArray TABLE
    (
    S_Itemint IDENTITY(1,1) NOT NULL PRIMARY KEY
    , S_StrValuevarchar(8000) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
    )

    Standard string splitter. Optimised for <= 8,000 char

    CREATE FUNCTION dbo.FN_DelimSplitInt
    (
    @strSourcevarchar(8000)
    , @strDelimitervarchar(10) = ','-- NOTE: Delimiter is CASE SENSITIVE
    , @strBlankvarchar(8000) = '[DELETE]'-- What to return if an item is blank? (e.g. NULL or '-1')
    -- Use '[DELETE]' to delete any NULL items
    )
    -- Split a string based on delimiter. Returns an array of INTs.
    RETURNS @tblArray TABLE
    (
    S_Itemint IDENTITY(1,1) NOT NULL PRIMARY KEY
    , S_IntValueint NULL
    )

    Same as CHAR, optimised for INTs

    CREATE FUNCTION dbo.FN_DelimSplitIntMAX
    (
    @strSourcevarchar(MAX)
    , @strDelimitervarchar(10) = ','-- NOTE: Delimiter is CASE SENSITIVE
    , @strBlankvarchar(8000) = '[DELETE]'-- What to return if an item is blank? (e.g. NULL or '-1')
    -- Use '[DELETE]' to delete any NULL items
    )
    -- Split a string based on delimiter. Returns an array of INTs.
    RETURNS @tblArray TABLE
    (
    S_Itemint IDENTITY(1,1) NOT NULL PRIMARY KEY
    , S_IntValueint NULL
    )

    Same ... used if Source could exceed [and is optimised for] 8,000+ chars

    CREATE FUNCTION dbo.FN_DelimSplitNVarcharMax
    (
    @strSourcenvarchar(MAX)
    , @strDelimiternvarchar(10) = ','-- NOTE: Delimiter is CASE SENSITIVE
    , @strBlanknvarchar(4000) = '[DELETE]'-- What to return if an item is blank? (e.g. '', NULL or '[BLANK]')
    -- Use '[DELETE]' to delete any NULL items
    )
    -- Split a TEXT string based on delimiter. Returns a Character String.
    RETURNS @tblArray TABLE
    (
    S_Itemint IDENTITY(1,1) NOT NULL PRIMARY KEY
    , S_StrValuenvarchar(4000) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
    )

    Same, only used when Wide-Chars need to be accommodated (Input up to 8,000 char)

    CREATE FUNCTION dbo.FN_DelimSplitVarcharMax
    (
    @strSourcevarchar(MAX)
    , @strDelimitervarchar(10) = ','-- NOTE: Delimiter is CASE SENSITIVE
    , @strBlankvarchar(8000) = '[DELETE]'-- What to return if an item is blank? (e.g. '', NULL or '[BLANK]')
    -- Use '[DELETE]' to delete any NULL items
    )
    -- Split a TEXT string based on delimiter. Returns a Character String.
    RETURNS @tblArray TABLE
    (
    S_Itemint IDENTITY(1,1) NOT NULL PRIMARY KEY
    , S_StrValuevarchar(8000) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
    )

    Wide-Chars, more than 8,000 char

    CREATE FUNCTION dbo.FN_FixedSplitChar
    (
    @strValuevarchar(8000)-- String Value to split
    , @intWidthint-- Width (of each element)
    , @intOptionssmallint = 0-- RESERVED
    )
    -- Fixed-width string splitter
    RETURNS TABLE
    ...
    DECLARE@tblArray TABLE
    (
    T_Itemint NOT NULL
    , T_StrValuevarchar(8000) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
    , PRIMARY KEY
    (
    T_Item
    )
    )

    Split fixed width strings (i.e. no delimiter)

    CREATE FUNCTION dbo.FN_FixedSplitInt
    (
    @strValuevarchar(8000)-- String Value to split
    , @intWidthint-- Width (of each element)
    , @intStuffStart int-- Offset for STUFF
    , @intStuffLenint-- Length for STUFF
    , @intOptionssmallint = 0-- 0=Default, 2=Last item blank=ignore
    )
    -- Fixed-width string splitter
    RETURNS TABLE
    AS
    ...
    DECLARE@tblArray TABLE
    (
    T_Itemint NOT NULL
    , T_IntValueint NULL
    , PRIMARY KEY
    (
    T_Item
    )
    )

    Same, but for INTs. The STUFF option is where the INTs are fixed length, but also have delimiter - strings such as:

    '/123/45 /7 /0 /567'

     

    CREATE FUNCTION dbo.FN_PathSplit
    (
    @strSourcevarchar(8000)
    , @strDelimitervarchar(10) = '/'
    , @strBlankvarchar(8000) = '[DELETE]'-- What to return if an item is blank? (e.g. '', NULL or '[BLANK]')
    -- Use '[DELETE]' to delete any NULL items
    )
    -- Split a string into its Path components
    -- e.g. /L1/L2/L3 will be split into /L1, /L1/L2 and /L1/L2/L3
    RETURNS @tblArray TABLE
    (
    S_Itemint IDENTITY(1,1) NOT NULL
    , S_Valuevarchar(8000) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
    , PRIMARY KEY
    (
    S_Item
    )
    )
    CREATE FUNCTION dbo.FN_SplitWords
    (
    @strValuevarchar(8000)-- String Value to split into words
    , @intOptionssmallint = 0-- 0=Normal, 1=Append plurals
    )
    RETURNS TABLE

    Dictionary table available for significant number of English words, aliased for root/singular words and so on. Not a substitute for a proper lexicon! but we find it works well enough for general text searching where we need "story" to also match "stories", and ""tell" / "told" etc

    So take a search string, of one/multiple words, split it, run it through the dictionary to get "root" words, and then use the "word lookup table" to find (weighted) matches for one/many/all words in the search string, and then join that back to the original table/record. The central Word Lookup Table is maintained by trigger, splitting e.g. a "Description" into words and storing them (the "root" variant) in the Word Lookup Table (WordID, TableID, RecordPKeyID).

    For searching where no pre-processing to Word Lookup Table exists, then convert the words in the search string into all possible singular/plural, present/past tense etc. and then use THAT list (somehow!) to match strings - using LIKE/whatever

    CREATE FUNCTION dbo.FN_Trigram_Split
    (
    @stringvarchar(8000)
    )
    -- Find matching records in TrigramData Table
    RETURNS TABLE

    We use Trigrams for text searching in some circumstances

     

  • @jonathan-2,

    I recently had reason to revisit this article and I found a bug in the code.  It does not preserve leading spaces in the individual elements.  If an element is all spaces, it removes all the spaces even when the delimiter is a comma.

    Compare the outputs of items 2, 3, and 18 in the following test.

    --===== Conditionally drop the test tables to make reruns easier for testing.
    -- (this is NOT a part of the solution)
    DROP TABLE IF EXISTS #JBMTest
    ;
    --===== Create and populate a test table on the fly (this is NOT a part of the solution).
    -- In the following comments, "b" is a blank and "E" is an element in the left to right order.
    -- Double Quotes are used to encapsulate the output of "Item" so that you can see that all blanks
    -- are preserved no matter where they may appear.
    SELECT SomeID, SomeValue = CONVERT(NVARCHAR(4000),SomeValue)
    INTO #JBMTest
    FROM ( --# of returns & type of Return Row(s)
    SELECT 0, NULL UNION ALL --1 NULL
    SELECT 1, SPACE(0) UNION ALL --1 b (Empty String)
    SELECT 2, SPACE(1) UNION ALL --1 b (1 space)
    SELECT 3, SPACE(5) UNION ALL --1 b (5 spaces)
    SELECT 4, ',' UNION ALL --2 b b (both are empty strings)
    SELECT 5, '55555' UNION ALL --1 E
    SELECT 6, ',55555' UNION ALL --2 b E
    SELECT 7, ',55555,' UNION ALL --3 b E b
    SELECT 8, '55555,' UNION ALL --2 b B
    SELECT 9, '55555,1' UNION ALL --2 E E
    SELECT 10, '1,55555' UNION ALL --2 E E
    SELECT 11, '55555,4444,333,22,1' UNION ALL --5 E E E E E
    SELECT 12, '55555,4444,,333,22,1' UNION ALL --6 E E b E E E
    SELECT 13, ',55555,4444,,333,22,1,' UNION ALL --8 b E E b E E E b
    SELECT 14, ',55555,4444,,,333,22,1,' UNION ALL --9 b E E b b E E E b
    SELECT 15, ' 4444,55555 ' UNION ALL --2 E (w/Leading Space) E (w/Trailing Space)
    SELECT 16, 'This,is,a,test.' UNION ALL --4 E E E E
    SELECT 17, ',,,,,,' UNION ALL --7 (All Empty Strings)
    SELECT 18, ' a , b '
    ) d (SomeID, SomeValue)
    ;
    --===== Split the CSV column for the whole table using CROSS APPLY (this is the solution)
    SELECT test.SomeID, test.SomeValue, Value = QUOTENAME(split.value,N'"'), split.position
    FROM #JBMTest test
    CROSS APPLY dbo.STRING_SPLIT(test.SomeValue,N',') split
    ;
    --===== Split the CSV column for the whole table using CROSS APPLY (this is the solution)
    SELECT test.SomeID, test.SomeValue, Item = QUOTENAME(split.Item,N'"'), split.ItemNumber
    FROM #JBMTest test
    CROSS APPLY dbo.DelimitedSplitN4K(test.SomeValue,N',') split
    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for spotting that Jeff. I deliberately wrapped an LTRIM around the return value so if a manually written csv sometimes had comma space or just comma it would return the value with no leading space. I should really remove LTRIM as the T-SQL string_split function returns leading spaces. I should also rename the position column to ordinal to make it the same as the T-SQL function.

    I'll replace the function in my script article with this one:

    IF OBJECT_ID('[dbo].[STRING_SPLIT]','IF') IS NULL BEGIN
    EXEC ('CREATE FUNCTION [dbo].[STRING_SPLIT] () RETURNS TABLE AS RETURN SELECT 1 X')
    END
    GO
    ALTER FUNCTION [dbo].[STRING_SPLIT]
    (
    @string nvarchar(MAX),
    @separator nvarchar(MAX)
    )
    RETURNS TABLE WITH SCHEMABINDING
    AS RETURN
    WITH X(N) AS (SELECT 'Table1' FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) T(C)),
    Y(N) AS (SELECT 'Table2' FROM X A1, X A2, X A3, X A4, X A5, X A6, X A7, X A8) , -- Up to 16^8 = 4 billion
    T(N) AS (SELECT TOP(ISNULL(LEN(@string),0)+1) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 N FROM Y),
    Delim(Pos) AS (SELECT t.N FROM T WHERE SUBSTRING(@string, t.N, LEN(@separator+'x')-1) LIKE @separator OR t.N = 0),
    Separated(value,position) AS (SELECT SUBSTRING(@string, d.Pos + CASE WHEN Pos=0 THEN 1 ELSE LEN(@separator+'x')-1 END,
    LEAD(d.Pos,1,2147483647) OVER (ORDER BY (SELECT NULL))
    - CASE WHEN d.Pos=0 THEN 2-LEN(@separator+'x')
    ELSE d.Pos END - (LEN(@separator+'x')-1)),
    d.pos
    FROM Delim d
    WHERE @string IS NOT NULL)
    SELECT s.value value,
    ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) ordinal
    FROM Separated s
    WHERE s.value <> @separator;
    GO

     

     

     

  • That sounds good and right, Jonathan.  I tell people that good splitters shouldn't make any assumptions and that it should return EVERYTHING that's between the delimiters, an empty string if there's nothing between the delimiters and, hopefully, at NULL if a NULL was passed for the string value.

    Thank you for your efforts here.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This was removed by the editor as SPAM

  • Comments posted to this topic are about the item A varchar(MAX) STRING_SPLIT function for SQL 2012 and above

  • Hi folks,

    Some words about this interesting function.

    Now I call this function like this:

    SELECT * FROM dbo.STRING_SPLIT(N'Lorem DO ipsum BE dolor BO sit amet, consectetur adipiscing elit', N'DO');

    GO

    But what to do if I need separator1 or separator2 or separator3?

    In the example above: separators are :

    DO

    BE

    BO

    i.e. I want to split string separated by one of these separators?

    The call (sep2) after call(sep1) does not help in that case.

     

  • Oleg-461731 wrote:

    Hi folks,

    Some words about this interesting function.

    Now I call this function like this:

    SELECT *FROM dbo.STRING_SPLIT(N'Lorem DO ipsum BE dolor BO sit amet, consectetur adipiscing elit', N'DO');GO

    But what to do if I need separator1 or separator2 or separator3?

    In the example above: separators are :

    DO

    BE

    BO

    i.e. I want to split string separated by one of these separators?

    The call (sep2) after call(sep1) does not help in that case. 

    Not sure exactly what you want, but if you want to split by just one of the separators you could put the separator into a variable and change that variable to the value you want:

    DECLARE @Separator nvarchar(MAX) = 'BO'
    DECLARE @String nvarchar(MAX) = N'Lorem DO ipsum BE dolor BO sit amet, consectetur adipiscing elit'
    SELECT *
    FROM dbo.STRING_SPLIT(@String, @Separator)

    If you want to split by all the separators you could use REPLACE.

    DECLARE @String nvarchar(MAX) = N'Lorem DO ipsum BE dolor BO sit amet, consectetur adipiscing elit'
    SELECT *
    FROM dbo.STRING_SPLIT(REPLACE(REPLACE(REPLACE(@String, ' BO ', '|'), ' DO ', '|'), ' BE ', '|'), '|') t
  • Yes, the 2nd is what I have asked.

    But the replace

    DECLARE @String NVARCHAR(MAX) = N'Lorem DO ipsum BE dolor BO sit amet, consectetur adipiscing elit'
    SELECT *
    FROM dbo.STRING_SPLIT(REPLACE(REPLACE(REPLACE(@String, ' BO ', '|'), ' DO ', '|'), ' BE ', '|'), '|') t

    is not for all cases, need to be sure that no the character '|' in the text. May be replace all separators with the 1st one.

     

  • Oleg-461731 wrote:

    Yes, the 2nd is what I have asked.

    But the replace

    DECLARE @String NVARCHAR(MAX) = N'Lorem DO ipsum BE dolor BO sit amet, consectetur adipiscing elit'
    SELECT *
    FROM dbo.STRING_SPLIT(REPLACE(REPLACE(REPLACE(@String, ' BO ', '|'), ' DO ', '|'), ' BE ', '|'), '|') t

    is not for all cases, need to be sure that no the character '|' in the text. May be replace all separators with the 1st one.

    Yes, you could do that:

    DECLARE @String NVARCHAR(MAX) = N'Lorem DO ipsum BE dolor BO sit amet, consectetur adipiscing elit'
    SELECT *
    FROM dbo.STRING_SPLIT(REPLACE(REPLACE(@String, ' BO ', ' DO '), ' BE ', ' DO '), ' DO ' ) t

Viewing 11 posts - 31 through 40 (of 40 total)

You must be logged in to reply to this topic. Login to reply