Extracting Rows from Delimited Strings

  • Gus,

    This is an SQL Server 2000 forum... 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff, I'd missed that completely.:blush:

    Is there a better way than just adding CASE statments to filter these correctly?

    Ade.



    Ade

    A Freudian Slip is when you say one thing and mean your mother.
    For detail-enriched answers, ask detail-enriched questions...[/url]

  • Try this... it's Adrian's original code with some conditions added...

    SELECT LEFT(raw_payload, CHARINDEX('+', raw_payload, 1)-1),

    CASE

    WHEN LEN(raw_payload)-LEN(REPLACE(raw_payload,'+','')) >= 2

    THEN

    CAST(SUBSTRING(raw_payload,

    CHARINDEX('+', raw_payload, 1) + 1,

    CHARINDEX('+', raw_payload, CHARINDEX('+', raw_payload, 1) +1) - (CHARINDEX('+', raw_payload, 1) + 1))

    AS FLOAT)

    ELSE NULL END,

    CASE

    WHEN LEN(raw_payload)-LEN(REPLACE(raw_payload,'+','')) >= 3

    THEN

    CAST(SUBSTRING(raw_payload,

    CHARINDEX('+', raw_payload, CHARINDEX('+', raw_payload, 1) +1) + 1,

    LEN(raw_payload) - CHARINDEX('+', raw_payload, CHARINDEX('+', raw_payload, 1) +1) + 1)

    AS FLOAT)

    ELSE NULL END

    FROM #SplitMe

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • CODE PASSED:

    DECLARE @table TABLE (raw_payload VARCHAR(200))

    INSERT INTO @Table SELECT raw_payload FROM raw_data

    SELECT LEFT(raw_payload, CHARINDEX('+', raw_payload, 1)-1),

    CASE

    WHEN LEN(raw_payload)-LEN(REPLACE(raw_payload,'+','')) >= 2

    THEN

    CAST(SUBSTRING(raw_payload,

    CHARINDEX('+', raw_payload, 1) + 1,

    CHARINDEX('+', raw_payload, CHARINDEX('+', raw_payload, 1) +1) - (CHARINDEX('+', raw_payload, 1) + 1))

    AS FLOAT)

    ELSE NULL END,

    CASE

    WHEN LEN(raw_payload)-LEN(REPLACE(raw_payload,'+','')) >= 3

    THEN

    CAST(SUBSTRING(raw_payload,

    CHARINDEX('+', raw_payload, CHARINDEX('+', raw_payload, 1) +1) + 1,

    LEN(raw_payload) - CHARINDEX('+', raw_payload, CHARINDEX('+', raw_payload, 1) +1) + 1)

    AS FLOAT)

    ELSE NULL END

    RESULTS:

    (2551671 row(s) affected)

    (624 row(s) affected)

    Server: Msg 536, Level 16, State 3, Line 5

    Invalid length parameter passed to the substring function.

    SAMPLE RESULTS:

    S9000NULLNULL

    UV100.0NULL

    UV101.0NULL

    UV102.0NULL

    UV103.0NULL

    UV104.0NULL

    Third column is not null in these cases but that is what is diplayed.

  • Jeff - that last code nulls out the 0's in the 3rd column - is it supposed to?

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • All right - one more variation on Adrian's original solution, avoiding the need for the CASE statements....

    [font="Courier New"]SELECT

            LEFT(raw_payload+'+++', CHARINDEX('+', raw_payload+'+++', 1)-1),

            CAST(--NULLIF(

                   SUBSTRING(raw_payload+'+++',

                       CHARINDEX('+', raw_payload+'+++', 1) + 1,

                           CHARINDEX('+', raw_payload+'+++',

                               CHARINDEX('+', raw_payload, 1) +1) -

                                   (CHARINDEX('+', raw_payload+'+++', 1) + 1))

                   --,'')

                   AS FLOAT),

           CAST(--NULLIF(

                   REPLACE(

                       SUBSTRING(raw_payload+'+++',

                           CHARINDEX('+', raw_payload+'+++',

                               CHARINDEX('+', raw_payload+'+++', 1) +1) + 1,

                                   LEN(raw_payload) - CHARINDEX('+', raw_payload,

                                   CHARINDEX('+', raw_payload+'+++', 1) +1) + 1)

                       ,'+','')

                   --,'')

               AS FLOAT)

    FROM #mytable[/font]

    Now - I'm not crazy that the NULL columns are being converted to 0 (so you can't tell what was passed in as 0 versus no input), so you might want the NULLIF's uncommmented, but they're fairly costly (they double processing time).

    Perf-wise - it looks like:

    Jeff's: 5,907ms

    This one (without NULLIF): 7,015ms

    This one (with NULLIF): 11,536ms

    I still haven't quite figured out where the flaw is in Jeff's that ends up nulling out the third column.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Jeff Moden (4/25/2008)


    Gus,

    This is an SQL Server 2000 forum... 😉

    Well, that'll certainly change a few things! (Like maybe I'll start paying attention.... nah ... that'll never happen.)

    For that, you'll need a numbers-based string parser (there's one on the scripts section of SQL Server Central), instead of the XML version.

    Also, would need to use regular derived tables in the From clause, instead of a CTE. Should work with those modifications.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • SELECT raw_payload,

    LEFT(raw_payload+'+++', CHARINDEX('+', raw_payload+'+++', 1)-1),

    CAST(--NULLIF(

    SUBSTRING(raw_payload+'+++',

    CHARINDEX('+', raw_payload+'+++', 1) + 1,

    CHARINDEX('+', raw_payload+'+++',

    CHARINDEX('+', raw_payload, 1) +1) -

    (CHARINDEX('+', raw_payload+'+++', 1) + 1))

    --,'')

    AS FLOAT),

    CAST(--NULLIF(

    REPLACE(

    SUBSTRING(raw_payload+'+++',

    CHARINDEX('+', raw_payload+'+++',

    CHARINDEX('+', raw_payload+'+++', 1) +1) + 1,

    LEN(raw_payload) - CHARINDEX('+', raw_payload,

    CHARINDEX('+', raw_payload+'+++', 1) +1) + 1)

    ,'+','')

    --,'')

    AS FLOAT)

    FROM raw_data

    RESULT:

    Server: Msg 8114, Level 16, State 5, Line 1

    Error converting data type varchar to float.

  • n1pcman (4/25/2008)


    SELECT raw_payload,

    LEFT(raw_payload+'+++', CHARINDEX('+', raw_payload+'+++', 1)-1),

    CAST(--NULLIF(

    SUBSTRING(raw_payload+'+++',

    CHARINDEX('+', raw_payload+'+++', 1) + 1,

    CHARINDEX('+', raw_payload+'+++',

    CHARINDEX('+', raw_payload, 1) +1) -

    (CHARINDEX('+', raw_payload+'+++', 1) + 1))

    --,'')

    AS FLOAT),

    CAST(--NULLIF(

    REPLACE(

    SUBSTRING(raw_payload+'+++',

    CHARINDEX('+', raw_payload+'+++',

    CHARINDEX('+', raw_payload+'+++', 1) +1) + 1,

    LEN(raw_payload) - CHARINDEX('+', raw_payload,

    CHARINDEX('+', raw_payload+'+++', 1) +1) + 1)

    ,'+','')

    --,'')

    AS FLOAT)

    FROM raw_data

    RESULT:

    Server: Msg 8114, Level 16, State 5, Line 1

    Error converting data type varchar to float.

    hmm - it works for me as long as the second and third parameters are numeric or missing. Are you sure you don't have any of those that are NOT numeric?

    Perhaps try commenting out one of the casts and then the other, just to find out which one is causing the headache.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Yes that is the problem I do have some bad data, 1 or 2 lines per every 600 is stepped on data its un unavoidable buffering issue with the device dumping the data. I don't want that data if I do this process manually I delete it. I am just hoping to come up with a stored proc or something that cleans the data for my queries and reports. I want the first row text, second row number and third row number in the finished deal.

  • Matt Miller (4/25/2008)


    Jeff - that last code nulls out the 0's in the 3rd column - is it supposed to?

    Nah... I screwed up.

    The real key to this whole thing is that the data must be "equalized". That is, every row should be updated to have the same number of delimiters before any splitting is attempted. Sure, we can hack all day with CHARINDEX and the like, but if the original data is "equalized", this becomes a simple split and reassemble problem.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • What if we could write a query that says pickup all data that meets this standard

    text(*) '+' NUM(*) '+' NUM(*) copy to new table then parse into a new table with 3 fields

    "Description Field would be Text" & "Usage Field would be a Number" & "Adjustable field would be a Number"

  • n1pcman (4/25/2008)


    Yes that is the problem I do have some bad data, 1 or 2 lines per every 600 is stepped on data its un unavoidable buffering issue with the device dumping the data. I don't want that data if I do this process manually I delete it. I am just hoping to come up with a stored proc or something that cleans the data for my queries and reports. I want the first row text, second row number and third row number in the finished deal.

    Then - try one of the above methods, and dump the data into varchar(20) fields FIRST. Then figure out if what you got in those fields is convertible to FLOAT: if it is - keep it - if not - chunk it. As of now - the parsing is working, it's just that it's parsing non-numbers in a couple of records.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I dont know enough about SQL to any of this I just know what I want, the bad data or non standard data can be trashed with no problems.

  • Select *

    FROM raw_data[raw_payload]

    Where raw_payload not like '%+%+%'

    This finds all the bad data is there some thing I can add to auto delete based on this query like a stored procedure or something and then run the parser on clean data all on a scheduled cycle???

    or all the good data with this

    Select *

    FROM raw_data[raw_payload]

    Where raw_payload like '%+%+%'

Viewing 15 posts - 16 through 30 (of 37 total)

You must be logged in to reply to this topic. Login to reply