split string by number of characters

  • I know about DelimitedSplit8K, however what I want is to split a long text (varchar(max)) field on a delimiter at a given length (2000).

    For Example:

    Say I have a value:

    DECLARE @myv VARCHAR(MAX) = 'AB,CD,EF,GH,IJ';

    if I want a max of five characters then the output would be:

    AB,CD

    EF,GH

    IJ

    if I want a max of ten characters then the output would be:

    AB,CD,EF

    GH,IJ

    So if this makes sense, how would I go about programming this? My web search skills are dismal so I did not find any direction.

    Thanks

    DJJ

  • The simple answer would be to use something like this:

    DECLARE @myv VARCHAR(MAX) = 'AB,CD,EF,GH,IJ';

    WITH

    E(n) AS(

    SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(n)

    ),

    E2(n) AS(

    SELECT a.n FROM E a, E b

    ),

    E4(n) AS(

    SELECT a.n FROM E2 a, E2 b

    ),

    cteTally(n) AS(

    SELECT TOP( CAST(CEILING(LEN(@myv) / 5.) AS int))

    ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) - 1 n

    FROM E4

    )

    SELECT SUBSTRING( @myv, (5 * n) + 1, 5)

    FROM cteTally;

    Of course, that would be to simple to be what you're looking for.

    So I opened the briefcase and found this thread that might help you:

    http://www.sqlservercentral.com/Forums/Topic1709361-149-1.aspx

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Thanks, Luis!

    I will study both your example and the link.

  • We have a script for exactly this kind of thing; have a look at nsplit2B which you can find here[/url].

    Here's a couple examples:

    DECLARE @myv VARCHAR(MAX) = 'AB,CD,EF,GH,IJ';

    SELECT * FROM dbo.nsplit2B(@myv,5,1);

    SELECT * FROM dbo.nsplit2B(@myv,8,1);

    Results:

    ItemNumber Token

    -------------------- ----------

    1 AB,CD

    2 EF,GH

    3 IJ

    ItemNumber Token

    -------------------- ----------

    1 AB,CD,EF

    2 GH,IJ

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Hi Alan, you're still missing the GetNumsAB definition. I'd be interested on checking out how everything works. 🙂

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Alan.B (10/3/2016)


    We have a script for exactly this kind of thing; have a look at nsplit2B which you can find here[/url].

    Here's a couple examples:

    DECLARE @myv VARCHAR(MAX) = 'AB,CD,EF,GH,IJ';

    SELECT * FROM dbo.nsplit2B(@myv,5,1);

    SELECT * FROM dbo.nsplit2B(@myv,8,1);

    Results:

    ItemNumber Token

    -------------------- ----------

    1 AB,CD

    2 EF,GH

    3 IJ

    ItemNumber Token

    -------------------- ----------

    1 AB,CD,EF

    2 GH,IJ

    This looks like it expects all of the tokens to be the same length. While this is true of the sample data, it wasn't clear that this would be true of the actual data.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Luis Cazares (10/3/2016)


    Hi Alan, you're still missing the GetNumsAB definition. I'd be interested on checking out how everything works. 🙂

    getnumsAB

    CREATE FUNCTION dbo.GetNumsAB

    (

    @low bigint,

    @high bigint,

    @gap bigint,

    @row1 bit

    )

    /****************************************************************************************

    Purpose:

    Creates up to 100,544,625 sequential numbers beginning with @low and ending with @high.

    Used to replace iterative methods such as loops, cursors and recursive CTEs to solve SQL

    problems. Based on Itzik Ben-Gan's getnums function with some tweeks and enhancements

    and added functionality. The logic for getting rn to begin at 0 or 1 is based comes from

    Jeff Moden's fnTally function.

    Compatibility:

    SQL Server 2008+ and Azure SQL Database

    Syntax:

    SELECT rn, n1, n2 FROM dbo.getnumsAB(@low,@high,@gap,@row1);

    Parameters:

    @low = a bigint that represents the lowest value for n1.

    @high = a bigint that represents the highest value for n1.

    @gap = a bigint that represents how much n1 and n2 will increase each row; @gap also

    represents the difference between n1 and n2.

    @row1 = a bit that represents the first value of rn. When @row = 0 then rn begins

    at 0, when @row = 1 then rn will begin at 1.

    Return Types:

    Inline Table Valued Function returns:

    rn = bigint; a row number that works just like T-SQL ROW_NUMBER() except that it can

    start at 0 or 1 which is dictated by @row1.

    n1 = bigint; a sequential number starting at the value of @low and incrimentingby the

    value of @gap until it is less than or equal to the value of @high.

    n2 = bigint; a sequential number starting at the value of @low+@gap and incrimenting

    by the value of @gap.

    Developer Notes:

    1. The lowest and highest possible numbers returned are whatever is allowable by a

    bigint. The function, however, returns no more than 100,544,625 rows (465^3).

    2. @gap does not affect rn, rn will begin at @row1 and increase by 1 until the last row

    unless its used in a query where a filter is applied to rn.

    3. @gap must be greater than 0 or the function will not return any rows.

    4. Keep in mind that when @row1 is 0 then the highest row-number will be the number of

    rows returned minus 1

    5. If you only need is a sequential set beginning at 0 or 1 then, for best performance

    use the RN column. Use N1 and/or N2 when you need to begin your sequence at any

    number other than 0 or 1 or if you need a gap between your sequence of numbers.

    6. Although @gap is a bigint it must be a positive integer or the function will

    not return any rows.

    7. @high must be equal or higher than @low or the function won't return any rows.

    8. There is no performance penalty for sorting by rn ASC; there is a large performance

    penalty for sorting in descending order. To sort in descending order, first make sure

    that @gap = 1. Next create an alias column (called "n_desc" in the example below) and

    use the formula: n_desc = ABS(rn-(@high+1)). The example below will return the numbers

    1 to 10 in descending order:

    DECLARE @low int = 1, @high int = 10, @gap int = 1, @row1 bit = 1;

    SELECT n_desc = ABS(rn-(@high+(@row1)))

    FROM dbo.getnumsAB(@low,@high,1,@ro1)

    ORDER BY rn;

    Examples:

    --===== 1. Using RN (rownumber)

    -- (1.1) The best way to get the numbers 1,2,3...@high (e.g. 1 to 5):

    SELECT RN FROM dbo.getnumsAB(1,5,1,1);

    -- (1.2) The best way to get the numbers 0,1,2...@high-1 (e.g. 0 to 5):

    SELECT RN FROM dbo.getnumsAB(0,5,1,0);

    --===== 2. Using N1

    -- (2.1) To begin with numbers other than 0 or 1 use N1 (e.g. -3 to 3):

    SELECT N1 FROM dbo.getnumsAB(-3,3,1,1);

    -- (2.2) ROW_NUMBER() is built in. If you want a ROW_NUMBER() include RN:

    SELECT RN, N1 FROM dbo.getnumsAB(-3,3,1,1);

    -- (2.3) If you wanted a ROW_NUMBER() that started at 0 you would do this:

    SELECT RN, N1 FROM dbo.getnumsAB(-3,3,1,0);

    --===== 3. Using N2 and @gap

    -- (3.1) To get 0,10,20,30...100, set @low to 0, @high to 100 and @gap to 10:

    SELECT N1 FROM dbo.getnumsAB(0,100,10,1);

    -- (3.2) Note that N2=N1+@gap; this allows you to create a sequence of ranges.

    -- For example, to get (0,10),(10,20),(20,30).... (90,100):

    SELECT N1, N2 FROM dbo.getnumsAB(0,90,10,1);

    -- (3.3) Remember that a rownumber is included and it can begin at 0 or 1:

    SELECT RN, N1, N2 FROM dbo.getnumsAB(0,90,10,1);

    --===== (4) A real life example using RN, N1 and N2:

    -- Beginning with @StartDate, to generate ranges of weeks that occur between

    -- @startDate & @EndDate:

    DECLARE @StartDate datetime = '1/1/2015', @EndDate datetime = '2/28/2015';

    SELECT

    WeekNbr = 'Week #'+CAST(RN AS varchar(2)),

    WeekStart = CONVERT(DATE, DATEADD(DAY,N1,@StartDate)),

    WeekEnd = CONVERT(DATE, DATEADD(DAY,N2-1,@StartDate))

    FROM dbo.getnumsAB(0,datediff(DAY,@StartDate,@EndDate),7,1);

    ---------------------------------------------------------------------------------------

    Revision History:

    Rev 00 - 20140518 - Initial Development - Alan Burstein

    Rev 01 - 20151029 - Added 65 rows to make L1=465; 465^3=100.5M. Updated comment section

    - Alan Burstein

    ****************************************************************************************/

    RETURNS TABLE WITH SCHEMABINDING AS RETURN

    WITH

    L1(N) AS

    (

    SELECT 1

    FROM (VALUES

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),

    (NULL),(NULL),(NULL)) t(N) -- 465 values

    ),

    iTally AS

    (

    SELECT rn = 0 WHERE @row1 = 0

    UNION ALL

    SELECT rn = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM L1 a CROSS APPLY L1 b CROSS APPLY L1 c

    ) --Up to 100,544,625 numbers (select 465*465*465 = 100,544,625)

    SELECT TOP (ABS((@high-@low)/ISNULL(NULLIF(@gap,0),1)+1))

    rn,

    n1 = ((rn-( @row1 ))*@gap+@low),

    n2 = ((rn-(@row1-1))*@gap+@low)

    FROM iTally

    WHERE @high >= @low and @gap > 0

    ORDER BY rn;

    dbo.NSplit2B

    CREATE FUNCTION dbo.NSplit2B

    (

    @String varchar(max),

    @TokenLen bigint,

    @DelimLen bigint

    )

    /****************************************************************************************

    Purpose:

    Identical to NSplit8K but handles an input string of data type varchar(max). NSplit2B

    splits an input string (@String) into @TokenLen sized tokens with @DelimLen sized gaps

    between each token. Intended to be a better performing alternative to the traditional

    "splitter" for cases where the size of the token (AKA "item") is always the same.

    Compatibility:

    SQL Server 2008+, Azure SQL Database, Azure SQL Data Warehouse & Parallel Data Warehouse

    Syntax:

    SELECT ItemNumber, Token

    FROM dbo.NSplit2B(@string, @length, @delimiterLength)

    Parameters:

    @String = varchar(max); the input string to "split" into tokens

    @TokenLen = bigint; the size of the output token

    @DelimLen = bigint; the size of the delimiter. Can be set to 0 for when there's

    no delimiter or set to negetive if you desire overlap

    Returns:

    ItemNumber = bigint; represents the order that the token appears in the input string

    Token = varchar(max); the @lengh-sized token returned by the function.

    Developer notes:

    1. Requires GetNumsAB: http://www.sqlservercentral.com/scripts/Set+Based/139370/

    Alternatively you can incorporate your own tally table logic at your own risk.

    2. If the last token is shorter then @TokenLen then it will be truncated.

    3. Tends to perform better with a serial execution plan; it may be adventageous to test

    with OPTION (MAXDOP 1) and compare that performance with a parallel plan. To force a

    parallel execution plan see make_parallel by Adam Machanic:

    sqlblog.com/blogs/adam_machanic/archive/2013/07/11/next-level-parallel-plan-porcing.aspx

    4. There are no safeguards built into the function to make sure the tokens are all the

    same size. The only safeguard is to test and know your data.

    5. If any of the parameters are NULL the function will return a a single NULL value.

    6. Will perform slower than NSplit8K due to the fact that it's handling varchar(max)

    strings. If you're sure all strings contain 8,000 characters or less use NSPlit8K.

    7. NSplit2B is deterministic. For more about deterministic functions see:

    https://msdn.microsoft.com/en-us/library/ms178091.aspx

    Examples:

    --===== (1) extract the values "ab", "cd", "ef", and "gg"; token-size=2; delimiter=","

    SELECT ItemNumber, Token FROM dbo.NSplit2B('ab,cd,ef,gg',2,1);

    --===== (2) extract the values "xxx", "yyy", "bbb", etc... token-size=3; delimiter=","

    DECLARE @string1 varchar(100) = 'xxx,yyy,bbb,123,zzz';

    SELECT ItemNumber, Token FROM dbo.NSplit2B(@string1,3,1);

    --===== (3) extract the values "client01", "client02", etc...

    DECLARE @string varchar(100) = '<client01>,<client02>,<client03>,<client04>';

    SELECT ItemNumber, Token FROM dbo.NSplit2B(SUBSTRING(@string,2,LEN(@string)-2),8,3);

    --===== (4) If you dont need a delimiter to split the string

    SELECT ItemNumber, Token FROM dbo.NSplit2B('abcdefghi',3,0);

    --===== (5) Turn it into an N-Gram function

    DECLARE @string varchar(max) = 'abcdefghi',

    @N int = 1; -- change this to 2 for bigrams, 3 for trigrams, 4 for 4-grams, etc...

    SELECT TOP (LEN(@string)-(@N-1)) ItemNumber, Token

    FROM dbo.NSplit2B('abcdefghi',@N,-(@N-1));

    --===== (6) Split a string into unigrams then count the vowels

    SELECT VowelCount = COUNT(*)

    FROM dbo.NSplit2B('abcdefghi',1,0)

    WHERE Token LIKE '[aeiou]';

    --===== (7) Behavior for NULL inputs

    SELECT ItemNumber, Token FROM dbo.NSplit2B(NULL,1,0);

    SELECT ItemNumber, Token FROM dbo.NSplit2B('ABC',NULL,0);

    SELECT ItemNumber, Token FROM dbo.NSplit2B('ABC',1,NULL);

    ----------------------------------------------------------------------------------------

    Rev 00 - 20151030 - Alan Burstein - Initial Development

    Rev 01 - 20160407 - Alan Burstein

    - Rewrote using getnumsAB

    - Added logic for handling NULL inputs

    ****************************************************************************************/

    RETURNS TABLE WITH SCHEMABINDING AS RETURN

    SELECT

    ItemNumber = rn,

    Token = SUBSTRING(@string,n1,@TokenLen)

    FROM dbo.GetNumsAB(1,CONVERT(bigint,DATALENGTH(@string),0),@TokenLen+@DelimLen,1)

    WHERE @string IS NOT NULL AND @TokenLen+@DelimLen IS NOT NULL

    UNION ALL

    SELECT 1, NULL -- if any of the parameters are NULL then return a single NULL value

    WHERE REPLICATE(@string,@TokenLen+@DelimLen) IS NULL;

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • drew.allen (10/3/2016)


    Alan.B (10/3/2016)


    We have a script for exactly this kind of thing; have a look at nsplit2B which you can find here[/url].

    Here's a couple examples:

    DECLARE @myv VARCHAR(MAX) = 'AB,CD,EF,GH,IJ';

    SELECT * FROM dbo.nsplit2B(@myv,5,1);

    SELECT * FROM dbo.nsplit2B(@myv,8,1);

    Results:

    ItemNumber Token

    -------------------- ----------

    1 AB,CD

    2 EF,GH

    3 IJ

    ItemNumber Token

    -------------------- ----------

    1 AB,CD,EF

    2 GH,IJ

    This looks like it expects all of the tokens to be the same length. While this is true of the sample data, it wasn't clear that this would be true of the actual data.

    Drew

    I did not fully understand the requirement so I'm guessing a little (waiting on OP feedback). Yes - if tokens are not the same length then what I posted wont work.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • I used the link Luis presented and have it working. The size of each string varies greatly but the needed split is around 2000 characters.

    So, I modified to add the extra column I needed, and use the different delimiter.

    Thank you, everyone.

  • djj (10/3/2016)


    I used the link Luis presented and have it working. The size of each string varies greatly but the needed split is around 2000 characters.

    So, I modified to add the extra column I needed, and use the different delimiter.

    Thank you, everyone.

    EDIT***

    Never mind I realize now where my confusion lay

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply