Pipe Delimmeted Data To Shred Into A Table

  • Hi,

    Need help to shred pipe delimmited data into a table, cant seem to find anything that will help me do this.

    for example the data looks roughly like this

    234|2015-6-11|234aa|1

    ID1|a|A||234|ASDFASDF|2015-06-11|A|s||NULL|||||TEST|TEST|TEST|TEST|1543211

    ID2|B|A||234|ASDFASDF|2015-06-11|A|s||NULL|||||TEST|TEST|TEST|TEST|1234

    I have approximatly 200 rows of these type of lines of data can anyone provide a potential awnser that can help me add each column into the same row of a temp table

  • xxxy2k (11/6/2015)


    Hi,

    Need help to shred pipe delimmited data into a table, cant seem to find anything that will help me do this.

    for example the data looks roughly like this

    234|2015-6-11|234aa|

    ID1|a|A||234|ASDFASDF|2015-06-11|A|s||NULL|||||TEST|TEST|TEST|TEST|1543211

    ID2|B|A||234|ASDFASDF|2015-06-11|A|s||NULL|||||TEST|TEST|TEST|TEST|1234

    I have approximatly 200 rows of these type of lines of data can anyone provide a potential awnser that can help me add each column into the same row of a temp table

    Considering that the first row only has 4 elements (one of them is a trailing empty element) and the rest of more than that, what do you want to do about such a disparity in the number of elements?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The sample of data im using is test data, i wouldnt get hung up about the quality of the data. I've amended the data you highlighted to explain the data further first line is a header row, 2nd row of data and onwards is the actual data.

  • There's two parts to this. First is getting it into SQL, which you can do like this: (props to whoever provided this sample. it didn't say)

    DECLARE @FileContents VARCHAR(MAX)

    SELECT @FileContents=BulkColumn

    FROM OPENROWSET(BULK'PathToYourFile.sql',SINGLE_BLOB) x;

    Second part is just use Moden's string splitter and insert the result into a table.

  • Final peice will be something like this

    select

    dbo.GetDomain(t.test,'|',0) as col2,

    dbo.GetDomain(t.test,'|',1) as col3,

    dbo.GetDomain(t.test,'|',2) as col4

    from #test t

    go

  • xxxy2k (11/13/2015)


    Final peice will be something like this

    select

    dbo.GetDomain(t.test,'|',0) as col2,

    dbo.GetDomain(t.test,'|',1) as col3,

    dbo.GetDomain(t.test,'|',2) as col4

    from #test t

    go

    Please post your "GetDomain" function so that I can show you why you shouldn't actually be doing it that way. 😉

    Also, you said...

    The sample of data im using is test data, i wouldn't get hung up about the quality of the data.

    That's totally obvious. I just wanted to know what you wanted to do with that first row in the data because you never explicitly stated what you want to do with it. Do you just want to ignore it? This is a fairly easy problem. I just need to know the answer to that question.

    And, again, the reason why I want to see your "GetDomain" function is to show you why you don't want to do such a thing in the future especially if you ever have a large amount of data to do this to.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If i could ignore it then great if not i would just filter it out in the where clause once in a table.

    The getdomain is a function see below

    create function dbo.GetDomain(

    @source varchar(1024),

    @delimiter varchar(10),

    @domain int

    ) returns varchar(1024) as begin

    declare @returnValue varchar(1024)

    declare @workingOn int

    declare @length int

    set @workingOn=0

    while @workingOn<@domain begin

    set @source=substring(@source,charindex(@delimiter,@source)+1,1024)

    set @workingOn+=1

    end

    set @length=charindex(@delimiter,@source)

    set @returnValue=substring(@source,1,case when @length=0 then 1024 else @length-1 end)

    return @returnValue

    end

    GO

  • Ok... first things first. Part of the reason why you've gotten no actual code to help you for the split portion of this problem is because you haven't posted readily consumable test data. Folks like to test their stuff before posting it and I'm no exception. I've converted your test data to readily consumable data to demonstrate with because you're new. I strongly recommend that you read the article at the first link in my signature line below under "Helpful Links" for future posts. You get tested, coded replies that way.

    Thank you for posting your GetDomain splitter. As I suspected, it contains a slow While Loop to do the splits, not to mention that it's a Scalar Function rather than a high performance iTVF (Inline Table Valued Function).

    Without further ado, take a trip over to the following article and get yourself a copy of the "DelimitedSplit8K" function from the "Resources" section near the bottom of the article. Once you've setup that function, this problem becomes child's play and will only be beat for performance by a CLR in the pre-2012 world.

    http://www.sqlservercentral.com/articles/Tally+Table/72993/

    Here's the code for a solution. Change column names and the table names to suit you. The result is stored in a Temp Table that's created on-the-fly. It also assumes the "IDx" at the beginning of each row of data is going to be unique. If it's not, there's a fix we can apply for that.

    And, yes, I included a test table like you should in the future. You'll get tested answers much more quickly that way.

    --===== Create the test table.

    -- This is NOT a part of the solution. We're just creating test data here.

    IF OBJECT_ID('tempdb..#TestTable','U') IS NOT NULL

    DROP TABLE #TestTable

    ;

    SELECT *

    INTO #TestTable

    FROM (

    SELECT '234|2015-6-11|234aa|1' UNION ALL

    SELECT 'ID1|a|A||234|ASDFASDF|2015-06-11|A|s||NULL|||||TEST|TEST|TEST|TEST|1543211' UNION ALL

    SELECT 'ID2|B|A||234|ASDFASDF|2015-06-11|A|s||NULL|||||TEST|TEST|TEST|TEST|1234'

    ) d (PipeData)

    ;

    --===== This does the split and repivot as well as creating and populating

    -- the desired Temp Table on-the-fly.

    SELECT C01 = SUBSTRING(tt.PipeData,1,CHARINDEX('|',tt.PipeData)-1)

    ,C02 = MAX(CASE WHEN split.ItemNumber = 2 THEN split.Item ELSE '' END)

    ,C03 = MAX(CASE WHEN split.ItemNumber = 3 THEN split.Item ELSE '' END)

    ,C04 = MAX(CASE WHEN split.ItemNumber = 4 THEN split.Item ELSE '' END)

    ,C05 = MAX(CASE WHEN split.ItemNumber = 5 THEN split.Item ELSE '' END)

    ,C06 = MAX(CASE WHEN split.ItemNumber = 6 THEN split.Item ELSE '' END)

    ,C07 = MAX(CASE WHEN split.ItemNumber = 7 THEN split.Item ELSE '' END)

    ,C08 = MAX(CASE WHEN split.ItemNumber = 8 THEN split.Item ELSE '' END)

    ,C09 = MAX(CASE WHEN split.ItemNumber = 9 THEN split.Item ELSE '' END)

    ,C10 = MAX(CASE WHEN split.ItemNumber = 10 THEN split.Item ELSE '' END)

    ,C11 = MAX(CASE WHEN split.ItemNumber = 11 THEN split.Item ELSE '' END)

    ,C12 = MAX(CASE WHEN split.ItemNumber = 12 THEN split.Item ELSE '' END)

    ,C13 = MAX(CASE WHEN split.ItemNumber = 13 THEN split.Item ELSE '' END)

    ,C14 = MAX(CASE WHEN split.ItemNumber = 14 THEN split.Item ELSE '' END)

    ,C15 = MAX(CASE WHEN split.ItemNumber = 15 THEN split.Item ELSE '' END)

    ,C16 = MAX(CASE WHEN split.ItemNumber = 16 THEN split.Item ELSE '' END)

    ,C17 = MAX(CASE WHEN split.ItemNumber = 17 THEN split.Item ELSE '' END)

    ,C18 = MAX(CASE WHEN split.ItemNumber = 18 THEN split.Item ELSE '' END)

    ,C19 = MAX(CASE WHEN split.ItemNumber = 19 THEN split.Item ELSE '' END)

    ,C20 = MAX(CASE WHEN split.ItemNumber = 20 THEN split.Item ELSE '' END)

    INTO #SplitResult

    FROM #TestTable tt

    CROSS APPLY dbo.DelimitedSplit8k(tt.PipeData,'|') split

    WHERE tt.PipeData LIKE 'ID%'

    GROUP BY SUBSTRING(tt.PipeData,1,CHARINDEX('|',tt.PipeData)-1)

    ;

    --===== Show that our data has been correctly split

    SELECT * FROM #SplitResult

    ;

    The "pivot" method that uses MAX(CASE)) that I used is an ancient but very fast method known as a CROSS TAB. You can read about that at the following link.

    http://www.sqlservercentral.com/articles/T-SQL/63681/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply