Comparing Tables (Match, Exists, and Not Exists)

  • Hello,

    I have the following table with sample data:

    create table ##test_table (LP varchar(100), Original varchar (250), New varchar (250))

    insert into ##test_table

    select '/page1' as LP, 'the quick brown fox jumped over lazy dog' as Original, '' as New

    insert into ##test_table

    select '/page1' as LP, '' as Original, 'the fat pig couldnt jumped over lazy dog' as New

    What I'd like to do is have a query with 3 separate columns which displays the results, separated by commas:

    Matching Words

    Words In Original Not in New

    Words in New Not in Original

    The Results for

    Matching Words should be: the, jumped, over, lazy, dog

    Words In Original Not in New should be: quick, brown, fox

    Words In NEW Not in Original should be: fat, pig, couldnt

    Output Columns Should Be:

    LP Original New Words In Original Not Words In NEW Not in Original

    Thank You

  • Split the sentence into words, store each word in a new row. Then you can use INTERSECT and EXCEPT to get the results you want.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks GilaMonster. So it's not possible as is for a query to answer this problem? As I will have many data sets, in rows, so not possible to split sentences into words.

  • Err, yes it is possible.

    Query splits the sentences into words (look for a split function, Delimited8ksplit is fastest). If large, insert into temp tables, then use INTERSECT and EXCEPT to get your results.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Such an odd requirement could only come as homework. What Gail has suggested will work.

    Here is another way that gets you the matching words (MW) column only.

    create table #test_table (LP varchar(100), Original varchar (250), New varchar (250));

    insert into #test_table

    select '/page1' as LP, 'the quick brown fox jumped over lazy dog' as Original, '' as New

    UNION ALL select '/page1' as LP, '' as Original, 'the fat pig couldnt jumped over lazy dog' as New;

    WITH BothSentences AS

    (

    SELECT LP, Original=MAX(Original), New=MAX(New)

    FROM #test_table

    GROUP BY LP

    ),

    AllWords AS

    (

    SELECT LP, item, grp=SUM(grp), itemnumber=MAX(itemnumber)

    FROM

    (

    SELECT LP, grp=-1, item, itemnumber

    FROM BothSentences

    CROSS APPLY DelimitedSplit8K(Original, ' ')

    UNION

    SELECT LP, 1, item, itemnumber

    FROM BothSentences

    CROSS APPLY DelimitedSplit8K(New, ' ')

    ) a

    GROUP BY LP, item

    )

    SELECT MW=STUFF(

    (

    SELECT ', ' + item

    FROM AllWords b

    WHERE grp = 0 AND a.LP = b.LP

    ORDER BY itemnumber

    FOR XML PATH('')

    ), 1, 2, '')

    FROM AllWords a

    GROUP BY LP;

    GO

    DROP TABLE #test_table;

    Everything you need to get the other columns is available in the above. You just have to figure out what I'm doing and extract it. If you can do that, you get an A.

    Edit: Post your final query so that we can all see what you learned.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Hi Dwain,

    I tried you're query but am getting the following error:

    (2 row(s) affected)

    Msg 208, Level 16, State 1, Line 4

    Invalid object name 'DelimitedSplit8K'.

    By the thanks for helping..its appreciate.

  • Ok I read Jeff's article on http://www.sqlservercentral.com/articles/Tally+Table/72993/ and at first put the code in a seperate window and it didn't work, since i suppose the the #test table contained only one # tag. when i put it in same window as code you wrote, dwain it worked. i know there is a difference between sessions on # and ##, so thank you now let me try to understand it.

    Thank You both.

  • CELKO (2/2/2015)


    This is a garbage pile; it has not key and no way to have a key.

    I thought they called that a heap?


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Dwain.C,

    I modified the grp=0, to a -1 and + 1, seperately, and the query got me the result expected, even though it wasn't in seperate columns, which is fine.

    SELECT LP, item, grp=SUM(grp), itemnumber=MAX(itemnumber)

    FROM

    (

    SELECT LP, grp=0, item, itemnumber

    FROM BothSentences

    CROSS APPLY DelimitedSplit8K(Original, ' ')

    UNION

    SELECT LP, 1, item, itemnumber

    FROM BothSentences

    CROSS APPLY DelimitedSplit8K(New, ' ')

    ) a

    GROUP BY LP, item

    )

    I am wanting to add a new column, to display the word count next to the result of the column for matching / non matching...., i added this line to the query in front of the [MW]= STUFF that's all, but didn't work:

    SELECT 1 + len([new] - len(replace ([new]),' ','')) AS WordCount, [MW]= STUFF

    would you know how to add a word count column next to you're result set?

  • Actually this is more like what I was thinking about.

    create table #test_table (LP varchar(100), Original varchar (250), New varchar (250));

    insert into #test_table

    select '/page1' as LP, 'the quick brown fox jumped over lazy dog' as Original, '' as New

    UNION ALL select '/page1' as LP, '' as Original, 'the fat pig couldnt jumped over lazy dog' as New;

    WITH BothSentences AS

    (

    SELECT LP, Original=MAX(Original), New=MAX(New)

    FROM #test_table

    GROUP BY LP

    ),

    AllWords AS

    (

    SELECT LP, item, grp=SUM(grp), itemnumber=MAX(itemnumber)

    FROM

    (

    SELECT LP, grp=-1, item, itemnumber

    FROM BothSentences

    CROSS APPLY DelimitedSplit8K(Original, ' ')

    UNION

    SELECT LP, 1, item, itemnumber

    FROM BothSentences

    CROSS APPLY DelimitedSplit8K(New, ' ')

    ) a

    GROUP BY LP, item

    )

    SELECT MW=STUFF(

    (

    SELECT ', ' + item

    FROM AllWords b

    WHERE grp = 0 AND a.LP = b.LP

    ORDER BY itemnumber

    FOR XML PATH('')

    ), 1, 2, '')

    ,ONotInN=STUFF(

    (

    SELECT ', ' + item

    FROM AllWords b

    WHERE grp = -1 AND a.LP = b.LP

    ORDER BY itemnumber

    FOR XML PATH('')

    ), 1, 2, '')

    ,NNotInO=STUFF(

    (

    SELECT ', ' + item

    FROM AllWords b

    WHERE grp = 1 AND a.LP = b.LP

    ORDER BY itemnumber

    FOR XML PATH('')

    ), 1, 2, '')

    FROM AllWords a

    GROUP BY LP;

    GO

    DROP TABLE #test_table;

    Does that get you everything you want?


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • CELKO (2/2/2015)

    --------------------------------------------------------------------------------

    This is a garbage pile; it has not key and no way to have a key.

    I thought they called that a heap?

    A heap is a table without a clustered index. A table with a primary key may also be a heap. A table without a primary key is not good, or to use Celko's words, a garbage pile.

  • Hello Dwain.C,

    Yes, thank you for your last update, it was spot on.

    I modified you're original query and added a column to No_Of_Words, for the column result set of mw, but got 1 for answer, it should be 5, since i'm wanting it to count the result "the, jumped, over, lazy, dog", would you know what's wrong? ty

    I put:

    create table #test_table (LP varchar(100), Original varchar (250), New varchar (250));

    insert into #test_table

    select '/page1' as LP, 'the quick brown fox jumped over lazy dog' as Original, '' as New

    UNION ALL select '/page1' as LP, '' as Original, 'the fat pig couldnt jumped over lazy dog' as New;

    WITH BothSentences AS

    (

    SELECT LP, Original=MAX(Original), New=MAX(New)

    FROM #test_table

    GROUP BY LP

    ),

    AllWords AS

    (

    SELECT LP, item, grp=SUM(grp), itemnumber=MAX(itemnumber)

    FROM

    (

    SELECT LP, grp=-1, item, itemnumber

    FROM BothSentences

    CROSS APPLY DelimitedSplit8K(Original, ' ')

    UNION

    SELECT LP, 1, item, itemnumber

    FROM BothSentences

    CROSS APPLY DelimitedSplit8K(New, ' ')

    ) a

    GROUP BY LP, item

    )

    SELECT len(LP) - len(replace(LP, ' ', '')) + 1 No_Of_Words, MW=STUFF(

    (

    SELECT ', ' + item

    FROM AllWords b

    WHERE grp = 0 AND a.LP = b.LP

    ORDER BY itemnumber

    FOR XML PATH('')

    ), 1, 2, '')

    FROM AllWords a

    GROUP BY LP;

  • VegasL (2/22/2015)


    Hello Dwain.C,

    Yes, thank you for your last update, it was spot on.

    I modified you're original query and added a column to No_Of_Words, for the column result set of mw, but got 1 for answer, it should be 5, since i'm wanting it to count the result "the, jumped, over, lazy, dog", would you know what's wrong? ty

    I put:

    create table #test_table (LP varchar(100), Original varchar (250), New varchar (250));

    insert into #test_table

    select '/page1' as LP, 'the quick brown fox jumped over lazy dog' as Original, '' as New

    UNION ALL select '/page1' as LP, '' as Original, 'the fat pig couldnt jumped over lazy dog' as New;

    WITH BothSentences AS

    (

    SELECT LP, Original=MAX(Original), New=MAX(New)

    FROM #test_table

    GROUP BY LP

    ),

    AllWords AS

    (

    SELECT LP, item, grp=SUM(grp), itemnumber=MAX(itemnumber)

    FROM

    (

    SELECT LP, grp=-1, item, itemnumber

    FROM BothSentences

    CROSS APPLY DelimitedSplit8K(Original, ' ')

    UNION

    SELECT LP, 1, item, itemnumber

    FROM BothSentences

    CROSS APPLY DelimitedSplit8K(New, ' ')

    ) a

    GROUP BY LP, item

    )

    SELECT len(LP) - len(replace(LP, ' ', '')) + 1 No_Of_Words, MW=STUFF(

    (

    SELECT ', ' + item

    FROM AllWords b

    WHERE grp = 0 AND a.LP = b.LP

    ORDER BY itemnumber

    FOR XML PATH('')

    ), 1, 2, '')

    FROM AllWords a

    GROUP BY LP;

    Try using New or Original instead of LP in the LEN functions (not sure which words you are counting). Beware that this method may not work if you have repeating blanks in the data.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply