parse string recursively

  • My table has data like:

    record 1 -->  12 | 34 |

    record 2 --->536|

    In this data, the numbers are separated by pipes "|" and there can be any number of numbers. I need to parse each of these records and separate the numbers to different records. The output from the above data should be:

    12

    34

    536

    How can I get this? I tried a recursive function. It returns a table if i give it an input string of the format "1 | 2 | 3 | 4|". But I have to call the function as " select * from recursive_fn ('1 | 2| 3|4|') ". I can not provide the column of the source table as input to the function. Please advise on a better approach.

    Thanks to all.

  • You should fix INSERT, not SELECT.

    _____________
    Code for TallyGenerator

  • First and foremost, Serqiy is correct... delimited data should not be allowed to be stored in the database.  The inserts to the database should be done via a staging table, normalized, and only then should it be stored in the database.

    Now, if I read you correctly, you want to split a whole table's worth of PSV (pipe seperated values).

    The following method will allow you to either create the normalized data or continue to calculate the data from a delimited column.  Again, I strongly recommend the former rather than the latter...

    Creating a "Helper Table"

    Ok... before we even get started, we need a helper table to do this.  It's a "numbers" table and it consists of nothing more than, you guessed it, a column of well indexed sequential numbers.  I call it a "Tally" table just because it's easier to say and is a unique enough word so as not to confuse anyone in conversation.   Here's how to make one... it should be a permanent table because it has lots of other uses...

    --===== Create and populate the Tally table on the fly

     SELECT TOP 11000 --equates to more than 30 years of dates

            IDENTITY(INT,1,1) AS N

       INTO dbo.Tally

       FROM Master.dbo.SysColumns sc1,

            Master.dbo.SysColumns sc2

    --===== Add a Primary Key to maximize performance

      ALTER TABLE dbo.Tally

        ADD CONSTRAINT PK_Tally_N

            PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

    --===== Allow the general public to use it

      GRANT SELECT ON dbo.Tally TO PUBLIC

    Creating Some Test Data

    Ok... You didn't really provide any test data or DDL to speak of, but we need something we test on.  The following code creates a test table called "jbmTest"... read the comments in the code to see what it's doing...

    --===== Create and populate a test table.

         -- Column RowNum contains unique seqential numbers starting at 1

         -- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

         -- Column "SomeString" has a range of "AA" to "ZZ" non-unique 2 character strings

         -- Column "SomeNumber has a range of 0.0000 to 99.9999 non-unique numbers

         -- Column "SomeDate" has a range of  >=01/01/2000 and <01/01/2010 non-unique date/times

         -- Takes about 11 seconds to execute for a million rows and < 1 second for 10 thousand.

     SELECT TOP 10000  --<<< Controls how many rows to create

            RowNum     = IDENTITY(INT,1,1),

            SomeInt    = CAST(RAND(CAST(NEWID() AS VARBINARY))*50000+1 AS INT),

            SomePSV    = 'Part1|Part2|Part3|Part4|Part5||Part7|Part8|Part9|Part10|' --Pipe separated

       INTO dbo.JBMTest

       FROM Master.dbo.SysColumns sc1,

            Master.dbo.SysColumns sc2 --Lack of join criteria makes this a CROSS-JOIN

    --===== A table is not properly formed unless a Primary Key has been assigned

      ALTER TABLE dbo.JBMTest

            ADD PRIMARY KEY CLUSTERED (RowNum)

    --===== Allow nulls in the PSV column

      ALTER TABLE dbo.JBMTest

      ALTER COLUMN SomePSV VARCHAR(70) NULL

    --===== Change some of the rows (evenly divisible by 10) to only have 1 parameter

     UPDATE dbo.JBMTest

        SET SomePSV = 'Part1|'

      WHERE RowNum % 10 = 0

    --===== Change one row to not have anything

     UPDATE dbo.JBMTest

        SET SomePSV = NULL

      WHERE RowNum = 1

    --===== Change one row to have an empty string

     UPDATE dbo.JBMTest

        SET SomePSV = ''

      WHERE RowNum = 2

    --===== Change one row to have just a delimiter

     UPDATE dbo.JBMTest

        SET SomePSV = '|'

      WHERE RowNum = 3

    Split a Whole Table's Worth of Delimited Values

    And now, the solution to your problem... this solution uses a cross-join between your table (JBMTest in the code above and below) at the character level of the SomePSV column and the numbers in the Tally table.  It'll expand 100,000 rows of 10 PSV's (Pipe separated values) in about 26 seconds and 10,000 PSV's in about 3 seconds, the results of which are simply displayed on screen (use the grid mode).  The output of the solution could certainly be directed to a table or a temp table for additional use without having to recalculate (hint: could be used to normalize the data you store as a child table).

    --===== Declare and start a time to measure duration with

    DECLARE @StartTime DATETIME

        SET @StartTime = GETDATE()

    --===== Declare a variable to hold the delimiter

    DECLARE @Delim CHAR(1)

        SET @Delim = '|'

    --===== Do the split

         -- Do nulls and pipeless blanks

     SELECT RowNum,SomeInt,SomePsv,1 AS Position

       FROM dbo.JBMTest

      WHERE SomePsv IS NULL

         OR SomePsv = ''

      UNION ALL

         -- Do everything that has a delimiter

     SELECT RowNum,

            SomeInt,

            LTRIM(RTRIM(NULLIF(SUBSTRING(@Delim+h.SomePsv+@Delim, t.N+1,

                 CHARINDEX(@Delim, @Delim+h.SomePsv+@Delim, t.N+1)-t.N-1),''))) AS Val,

            t.N-LEN(REPLACE(LEFT(@Delim+h.SomePsv+@Delim,t.N), @Delim, '')) AS Position

       FROM dbo.Tally t,

            dbo.JBMTest h

      WHERE SUBSTRING(@Delim+h.SomePsv+@Delim, t.N, 1) = @Delim

        AND t.N < LEN(@Delim+h.SomePsv)

    --===== Display the duration

      PRINT DATEDIFF(ms,@StartTime,GETDATE())

    A word of caution... I had to change some pretty well established code to allow for the trailing delimiter (a bit different than normal) and haven't spent a great deal of time testing it.  You should.   It does trim the leading and training blanks found in each element (also, highly unusual)...

    In SQL Server 2005, I imagine some sort of recursive CTE would be in order.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff, Serqiy. I really appreciate the detailed reply that Jeff gave. Also, I realise that delimited data is good in files, not in database tables. I have learnt it the hard way

    However, i was up last night and finally created two functions to do the job. the first is a recursive function which splits the input string until the string ends and returns a table with each element of the input string as a row.

    The second function is a cursor on the source table and passes each record from the source table to the recursive function. Finally, from this second function I get the entire source table parsed as I want it.

    Once again, thankyou to all you guys for putting in the time and effort.

  • Thanks.  It would be nice if you shared your solution with us... both the function and the cursor, please.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sure Jeff...The recursive parsing function followed by the function to count and summarise the entire table based on input dates.

    CREATE FUNCTION dbo.MPM_PARSE_ERROR(@error_number VARCHAR(100))

    RETURNS @ERROR_TABLE TABLE (ERROR VARCHAR(5))

    AS

    BEGIN

    /*------------------------------------------------------------

    Name   : MPM_PARSE_ERROR

    Input  : Error numbers as pipe delimited sring from AUD_ERROR_LOG

    Purpose: Parses pipe delimited error strings, returns a table

             with each error as a record of the output table.

    ------------------------------------------------------------*/

        DECLARE @pos_pipe INT

        DECLARE @current_error VARCHAR(5)

        DECLARE @new_error_str VARCHAR(100)

        SET @pos_pipe = CHARINDEX('|', @error_number)

        --Set Exit criteria for recursive function

        IF (@pos_pipe = 0 AND LEN(LTRIM(RTRIM(@error_number))) = 0) OR @error_number = '|' or @error_number = '||'

        BEGIN

            RETURN

        END

        --Error string may begin with a pipe

        IF @pos_pipe = 1

        BEGIN

            SET @error_number = RIGHT(@error_number, len(@error_number) - 1) + '|'

            SET @pos_pipe = CHARINDEX('|', @error_number)       

        END

        IF @pos_pipe = 0 --last error in string

        BEGIN

           SET @current_error = @error_number

           SET @new_error_str = '|'

        END

        ELSE --may be another error number after current error

        BEGIN

           SET @current_error = substring (@error_number, 1,@pos_pipe - 1)

           SET @new_error_str = replace (@error_number, @current_error + '|', '')

        END

          INSERT INTO @ERROR_TABLE

          SELECT LTRIM(RTRIM(@current_error)) UNION

          SELECT * from dbo.MPM_PARSE_ERROR(@new_error_str)

       RETURN

    END

    --------------------------------------------------------------------------

    CREATE FUNCTION MPM_ERROR_SUMMARY (@start_date DATETIME, @end_date DATETIME)

    RETURNS @ERROR_SUMMARY TABLE (ERROR VARCHAR(5), MASTER_SEQ_ID INT, LOAD_ID INT, ERROR_DATE_TIME DATETIME)

    AS

    BEGIN

    /*------------------------------------------------------------

    Name   : MPM_ERROR_SUMMARY

    Input  : Start and end dates for which errors have to be summarized.

    Purpose: Provide a summary of Errors in the AUD_ERROR_LOG table

             by process and count of errors.

    ------------------------------------------------------------*/

        DECLARE @error_number VARCHAR(100)

        DECLARE @seq INT, @load_id INT, @er_date DATETIME

        DECLARE CUR_ERROR CURSOR FOR

        SELECT ERROR_NUMBER,

            MASTER_SEQ_ID,

            LOAD_ID,

            ERROR_DATE_TIME

        FROM AUD_ERROR_LOG

        WHERE ERROR_DATE_TIME >= @start_date

            AND ERROR_DATE_TIME <= @end_date

       

        OPEN CUR_ERROR

        FETCH NEXT FROM CUR_ERROR INTO @error_number, @seq, @load_id, @er_date

        WHILE @@FETCH_STATUS = 0

        BEGIN

            --Pipe delimited errors have to be parsed. call function for this.

            INSERT INTO @ERROR_SUMMARY

            SELECT ERROR, @seq, @load_id, @er_date FROM DBO.MPM_PARSE_ERROR(@error_number)

            FETCH NEXT FROM CUR_ERROR INTO @error_number, @seq, @load_id, @er_date

        END

        CLOSE CUR

        DEALLOCATE CUR

        RETURN

    END

  • Thank you...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • shailendra;

    You might be more efficient having WHILE loop in the function instead of a recursive function.  Recursive functions certainly have their place, but I'm not sure this is one of them.

    CREATE

    FUNCTION dbo.MPM_PARSE_ERROR(@error_number VARCHAR(100))

    RETURNS

    @ERROR_TABLE TABLE (ERROR VARCHAR(5))

    AS

    BEGIN

    /*------------------------------------------------------------

    Name : MPM_PARSE_ERROR

    Input : Error numbers as pipe delimited sring from AUD_ERROR_LOG

    Purpose: Parses pipe delimited error strings, returns a table

    with each error as a record of the output table.

    ------------------------------------------------------------*/

    DECLARE @pos_pipe INTDECLARE @current_error VARCHAR(5)DECLARE @new_error_str VARCHAR(100)SET @new_error_str = LTRIM(RTRIM(@error_number))SET @pos_pipe = CHARINDEX('|', @new_error_str) WHILE (@pos_pipe > 0) BEGIN --Error string may begin with a pipe IF @pos_pipe = 1 SET @current_error = '' ELSE SET @current_error = LEFT(@new_error_str, @pos_pipe-1) --comment out this line if you do want empty values returned IF(LEN(@current_error) > 0) INSERT INTO @ERROR_TABLE SELECT LTRIM(RTRIM(@current_error)) SET @new_error_str = SUBSTRING(@new_error_str, @pos_pipe + 1, 100) SET @pos_pipe = CHARINDEX('|', @new_error_str) END--last error in string?IF(LEN(@new_error_str) > 0) INSERT INTO @ERROR_TABLE SELECT LTRIM(RTRIM(@new_error_str))RETURN

    END

    Hope this helps



    Mark

  • Thanks Mark,

    You solution also looks good, will try it out. Thanks

  • I'd be interested in seeing the performance characteristics of each of the three solutions.

    --------------------
    Colt 45 - the original point and click interface

  • Yeah, exactly what i had in mind. But i need to wait until the weekend to do that.

  • Or... just use the set based solution I provided to do the whole table at once

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Heh... that's why I asked to see the code... I'll be doing the ol' million row testing on these solutions tonight... Ladies and Gentlemen, place your bets

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply