How to split string without delimiter into multiple rows

  • Good day,

    I have a situation where I have one row of data, that needs to be split into multiple rows, based on a string in one column. This is beyond my scope of knowledge and I have struggled with it long enough to hopefully have someone provide some guidance.

    For example, my data might look like:

    InvoiceID          Amount    Provider1        Provider2       Provider3 
    ABCDEF 100 Acme
    LMNOPQRSTUVW 500 GlobalExports AcmeWidgets FooBar


    Output required into separate table:

    InvoiceID Amount Provider
    ABCDEF 100 Acme
    LMNOPQ 500 GlobalExports
    OPQRST 500 AcmeWidgets
    RSTUVW 500 FooBar
    create table Testdata
    (
    invoiceID INT,
    amount float,
    Provider1 varchar(255),
    Provider2 varchar(255),
    Provider3 varchar(255),
    Provider4 varchar(255),
    Provider5 varchar(255)
    );

    insert into Testdata values
    ('ABCDEF','100','Acme'),
    ('LMNOPQRSTUVW','500','GlobalExports','AcmeWidgets','FooBar');

    Some caveats are that the unique InvoiceID will be a minimum of six characters, and always a multiple of three characters. If the InvoiceID is six characters, there is no need to split it - only when it is greater than six characters. The InvoiceID is 21 characters maximum length.

    As in the above example, the first invoiceID does not get split since it is six characters. The second InvoiceID row, being greater than six characters, gets split. The first split row will contain the first six characters, the second row will contain characters 4-9. The third row contains characters 7-12, etc.

    Any guidance would be greatly appreciated. I hope I formatted this question properly and clearly. I struggled with a CTE, but there might be a more refined approach using substring?

    Thank you for your time!!

     

     

  • Problems like this make me really like normalization. =)

    After failing at this several times, this is as far as I've gotten. I'd love to see the right answer, because this might be close, but it ain't it.

    SELECT p.InvoiceID 
     ,  p.Amount
     ,  p.ProviderName
     ,  invNos.InvNo
    FROM
     (SELECT InvoiceID
      , Amount
      , ca.ProviderName
     FROM dbo.Invoice
     CROSS APPLY (VALUES (Provider1), (Provider2), (Provider3)) ca(ProviderName)) p
    -- inner join the above to (invoiceID, split invoice)
    INNER JOIN
    (SELECT invSplit.InvoiceID
     , invSplit.InvNo
     , invSplit.Amount
    FROM
     (SELECT InvoiceID
      , LEFT(InvoiceID,6) AS InvNo
      , Amount
     FROM dbo.Invoice
     UNION ALL
     SELECT InvoiceID
      , SUBSTRING(InvoiceID,4,6)
      , Amount
     FROM dbo.invoice) invSplit
    -- WHERE LEN(invSplit.InvNo)=6
     ) invNos ON invNos.InvoiceID = p.InvoiceID
    WHERE ProviderName IS NOT NULL
    AND LEN(InvNo)=6;

    Either your expected answer is wrong, or my code is screwy somewhere, because I get 7 records back.

    • This reply was modified 4 years, 9 months ago by  pietlinden. Reason: forgot to include the link the to the "Many uses of CROSS APPLY" article
    • This reply was modified 4 years, 9 months ago by  pietlinden.
    • This reply was modified 4 years, 9 months ago by  pietlinden.
  • Your test data doesn't match what you posted in your question.  In your question, you have 3 provider columns.  In your test data, you have 5 provider columns AND your test data code has an error in it.

    Please provide the test data that actually works and actually matches what you really want to do.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • you could try un pivoting the data, something like this

     

    select invoiceID,prov as provider,amount from (
    SELECT *
    FROM Testdata ) p
    UNPIVOT
    (prov FOR provider IN
    (provider1,provider2,provider3,provider4,provider5)
    )AS unpvt;



    select left (invoiceID,6),prov as provider,amount from (
    SELECT *
    FROM Testdata
    --where len(invoiceid)>6
    ) p
    UNPIVOT
    (prov FOR provider IN
    (provider1,provider2,provider3,provider4,provider5)
    )AS unpvt;

    • This reply was modified 4 years, 9 months ago by  SGT_squeequal.

    ***The first step is always the hardest *******

  • The following code appears to fulfill your requirements

    SELECT  invoiceID      = SUBSTRING(t.invoiceID, sq.StartPos, 6)
    , t.amount
    , [Provider] = x.Val
    , Orig_invoiceID = t.invoiceID
    FROM Testdata AS t
    -- A list of possible start positions for multi-part invoiceID
    CROSS APPLY (SELECT pos.StartPos FROM ( VALUES (1), (4), (7), (10), (13), (16) ) AS pos(StartPos) ) AS sq
    -- A list Provider columns to extract data from
    CROSS APPLY (VALUES ( 1, [Provider1] )
    , ( 4, [Provider2] )
    , ( 7, [Provider3] )
    , ( 10, [Provider4] )
    , ( 13, [Provider5] )
    ) x(StartPos, Val)
    WHERE LEN(SUBSTRING(t.invoiceID, sq.StartPos, 6)) = 6 -- Filter to ensure that the split invoiceID = 6 characters
    AND sq.StartPos = x.StartPos -- Filter to ensure that we get the Provider data that matches the partial invoiceID
    ORDER BY t.invoiceID, sq.StartPos;
  • The code can be re-written with 1 less CROSS APLY

    SELECT  invoiceID      = SUBSTRING(t.invoiceID, x.StartPos, 6)
    , t.amount
    , [Provider] = x.Val
    , Orig_invoiceID = t.invoiceID
    FROM #Testdata AS t
    -- A list of start positions for multi-part invoiceID and Provider columns to extract data from
    CROSS APPLY (VALUES ( 1, t.Provider1 )
    , ( 4, t.Provider2 )
    , ( 7, t.Provider3 )
    , ( 10, t.Provider4 )
    , ( 13, t.Provider5 )
    ) x(StartPos, Val)
    WHERE LEN(SUBSTRING(t.invoiceID, x.StartPos, 6)) = 6 -- Filter to ensure that the split invoiceID = 6 characters
    ORDER BY t.invoiceID, x.StartPos;
  • Thanks Des! (I hope that was a hard query... otherwise, I'm in trouble!)

  • pietlinden wrote:

    Thanks Des! (I hope that was a hard query... otherwise, I'm in trouble!)

    Piet, we all have days where we battle with the task at hand.  Once the solution has been found, it looks so simple and obvious.

    Having seen Dwain Camp's An Alternative (Better?) Method to UNPIVOT, I knew we should be looking at CROSS APPLY.  That said, it took a few attempts before I found a workable solution.  Then some more tweaking to look for a better solution.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply