Compare delimited data - same data but in a different order

  • Hello All,

    I am currently working on comparing two versions of data for a comparison report. I have a delimited string of characters and I am trying to figure out if the data is the same but in a different order, I should not mark the record as different. I eliminated all data with different lengths. Edited...with data creation scripts

    The approach I am using is to split the values using a cross apply to a "Split by delimiter function" into a new table and doing left outer join on part number between the

    CURRENT AND PREVIOUS record sets and looking for where records are missing in the second table/record set.

    For example:

    -- Original Data

    CREATE TABLE #Table_Orig (PartNumber int, FILEVERSION varchar(100), VALUE varchar(100))

    INSERT INTO #Table_Orig VALUES (1234, 'CURRENT', 'A B')

    INSERT INTO #Table_Orig VALUES (1234, 'PREVIOUS', 'B A')

    INSERT INTO #Table_Orig VALUES (456, 'CURRENT', 'A B')

    INSERT INTO #Table_Orig VALUES (456, 'PREVIOUS', 'A D')

    -- Split Table - Once run through a splitter function

    CREATE TABLE #Table_A (PartNumber int, FILEVERSION varchar(100), SplitVALUE varchar(100))

    INSERT INTO #Table_A VALUES (1234, 'CURRENT', 'A')

    INSERT INTO #Table_A VALUES (1234, 'CURRENT', 'B')

    INSERT INTO #Table_A VALUES (1234, 'PREVIOUS', 'B')

    INSERT INTO #Table_A VALUES (1234, 'PREVIOUS', 'A')

    INSERT INTO #Table_A VALUES (456, 'CURRENT', 'A')

    INSERT INTO #Table_A VALUES (456, 'CURRENT', 'B')

    INSERT INTO #Table_A VALUES (456, 'PREVIOUS', 'A')

    INSERT INTO #Table_A VALUES (456, 'PREVIOUS', 'D')

    -- Query

    select *

    from #Table_A prev

    LEFT OUTER JOIN #Table_A curr on prev.PartNumber = curr.PartNumber and curr.FileVersion = 'CURRENT' AND prev.SplitValue = curr.SplitValue

    WHERE prev.FileVersion = 'PREVIOUS'

    AND curr.PartNumber IS NULL

    -- Expected Output

    PartNumber: 456

    SplitValue: D

    This seems to work fine, but is a bit cumbersome. I did some searching on Stackoverflow and here with no real matches. Does anyone have any other "better" suggestions?

    Thanks,

    Anton

  • You have been around long enough to know what to expect from a post with this little detail...

    In order to help we will need a few things:

    1. Sample DDL in the form of CREATE TABLE statements

    2. Sample data in the form of INSERT INTO statements

    3. Expected results based on the sample data

    Please take a few minutes and read the first article in my signature for best practices when posting questions.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • To help you, we would need some consumable DDL, sample data and expected results based on the sample data provided as noted in the article linked on my signature.

    I know you posted something but it becomes confusing for me when it's something that might not be the actual input/output.

    Even if you give us the original delimited strings, we could work on them.

    Be sure to avoid "Split by delimiter functions" that use while loops or are not inline table functions.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • As requested...Sorry about the lousy format...

    Anton

    -- Original Data

    CREATE TABLE #Table_Orig (PartNumber int, FILEVERSION varchar(100), VALUE varchar(100))

    INSERT INTO #Table_Orig VALUES (1234, 'CURRENT', 'A B')

    INSERT INTO #Table_Orig VALUES (1234, 'PREVIOUS', 'B A')

    INSERT INTO #Table_Orig VALUES (456, 'CURRENT', 'A B')

    INSERT INTO #Table_Orig VALUES (456, 'PREVIOUS', 'A D')

    -- Split Table

    CREATE TABLE #Table_A (PartNumber int, FILEVERSION varchar(100), SplitVALUE varchar(100))

    INSERT INTO #Table_A VALUES (1234, 'CURRENT', 'A')

    INSERT INTO #Table_A VALUES (1234, 'CURRENT', 'B')

    INSERT INTO #Table_A VALUES (1234, 'PREVIOUS', 'B')

    INSERT INTO #Table_A VALUES (1234, 'PREVIOUS', 'A')

    INSERT INTO #Table_A VALUES (456, 'CURRENT', 'A')

    INSERT INTO #Table_A VALUES (456, 'CURRENT', 'B')

    INSERT INTO #Table_A VALUES (456, 'PREVIOUS', 'A')

    INSERT INTO #Table_A VALUES (456, 'PREVIOUS', 'D')

    -- Query

    select *

    from #Table_A prev

    LEFT OUTER JOIN #Table_A curr on prev.PartNumber = curr.PartNumber and curr.FileVersion = 'CURRENT' AND prev.SplitValue = curr.SplitValue

    WHERE prev.FileVersion = 'PREVIOUS'

    AND curr.PartNumber IS NULL

    -- Expected Output

    PartNumber: 456

    SplitValue: D

  • So, you have a current solution but would like a better approach?

    You could try with EXCEPT or NOT EXISTS.

    Here are the examples.

    select PArtNumber,

    Item

    from #Table_Orig prev

    CROSS APPLY dbo.DelimitedSplit8K(VALUE, ' ')

    WHERE prev.FileVersion = 'PREVIOUS'

    EXCEPT

    select PArtNumber,

    Item

    from #Table_Orig prev

    CROSS APPLY dbo.DelimitedSplit8K(VALUE, ' ')

    WHERE prev.FileVersion = 'CURRENT'

    select *

    from #Table_Orig prev

    CROSS APPLY dbo.DelimitedSplit8K(VALUE, ' ') splitprev

    WHERE prev.FileVersion = 'PREVIOUS'

    AND NOT EXISTS( select 1

    from #Table_Orig cur

    CROSS APPLY dbo.DelimitedSplit8K(VALUE, ' ') splitcur

    WHERE cur.FileVersion = 'CURRENT'

    AND cur.PArtNumber = prev.PArtNumber

    AND splitcur.Item = splitprev.Item)

    I'm using a splitter function that will be very fast compared with most methods. You can read about it and get the code from this article. http://www.sqlservercentral.com/articles/Tally+Table/72993/

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Here's my version to do only a single pass of the table.

    But I'm not sure why you want to list only "D" in the result, since both "B" and "D" did not have a match. I list both -- if you can explain the reasoning behind how to select only "D", I'd be happy to adjust the query accordingly:

    SELECT PartNumber, SplitVALUE

    FROM #Table_A

    --WHERE FILEVERSION IN ('CURRENT', 'PREVIOUS') --uncomment if needed

    GROUP BY PartNumber, SplitVALUE

    HAVING COUNT(*) <> 2

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Luis Cazares (9/30/2013)


    So, you have a current solution but would like a better approach?

    You could try with EXCEPT or NOT EXISTS.

    Luis,

    Yes, I already have the splitter and comparisons working using a LEFT OUTER JOIN. The EXCEPT works better than the left outer join and is a lot easier to comprehend.

    I was making sure that I haven't missed anything that might be new in the 2008R2 or 2012 or totally didn't think outside of the box enough.

    Thanks,

    Anton

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply