find missing records only

  • Hi.

    I have a situation where I need to find the missing records exclusively from a table.

    Here is my first table. This contains a set of account IDs that can have multiple expense headers:

    create table #foo (

    accountID int

    , headerID int

    , headerText varchar(40)

    )

    insert into #foo(accountID, headerID, headerText)

    select 1, 1, 'abc' union all

    select 1, 2, 'efg' union all

    select 1, 3, 'iol' union all

    select 2, 4, 'thy' union all

    select 3, 1, 'abc' union all

    select 3, 1, 'def'

    Here are a set of mandatory headers. There can be other headers too, but they are not mandatory:

    create table #headers(headerid int, headerText varchar(40))

    insert into #headers(headerid, headerText)

    select 1, 'abc' union all

    select 1, 'def' union all

    select 2, 'efg' union all

    select 3, 'iol'

    Now my requirement is to retrieve EXACTLY those rows in the account table that DO NOT have the mandatory

    headers. I just need the account id and the missing header info.

    After some attempts, here was some code that I came up with, but its neither fast, not does it return exactly what I want.

    select distinct f.accountid, h.headerid, h.headertext

    from #headers h, #foo f

    where f.headerid not in (h.headerid)

    and f.headertext not in (h.headertext)

    Any ideas, gurus?

    It seems like I am just not cut for T-SQL. I am an otherwise decent coder, who just can't think in T-SQL'ese'... :blink:

    How To Post[/url]

  • I'm only missing one thing to really help you, what is the expected results based on the sample data provided above?

  • But, I also think this is what you are looking for:

    create table #foo (

    accountID int

    , headerID int

    , headerText varchar(40)

    );

    insert into #foo(accountID, headerID, headerText)

    select 1, 1, 'abc' union all

    select 1, 2, 'efg' union all

    select 1, 3, 'iol' union all

    select 2, 4, 'thy' union all

    select 3, 1, 'abc' union all

    select 3, 1, 'def';

    create table #headers(headerID int, headerText varchar(40));

    insert into #headers(headerID, headerText)

    select 1, 'abc' union all

    select 1, 'def' union all

    select 2, 'efg' union all

    select 3, 'iol';

    select distinct

    f.accountID,

    f.headerID,

    f.headerText

    from

    #foo f

    where

    f.headerID not in (select h.headerID from #headers h)

    or f.headerText not in (select h.headerText from #headers h);

    drop table #foo;

    drop table #headers;

  • First, why does your #header table have a duplicate ID? The purpose of an ID field is to uniquely identify a particular record. Your headerID with a value one does not uniquely identify a record.

    Second, it is good coding practice to have a single field as a primary key. Since your headerID does not uniquely specify a record, you are forced to have a compound key.

    Third, because your #header table has a compound key, you are required to have both of those fields in your #foo table to uniquely identify a row. If you had a single field as a primary key, you would only need one field.

    All that being said, the reason that you are having problems is that no single row in your #foo table contains enough information to make a decision about the required headers. You will need to do something to gather all of the necessary information in one place. Here is a start to a solution with a PIVOT, but this may not be practical with a large number of required headers. NOTE: I used the headerText as the "unique" identifier.

    SELECT AccountID, [abc], [def], [efg], [iol]

    FROM (

    SELECT #foo.AccountID, #headers.headerID, #headers.headerText

    FROM #foo

    FULL OUTER JOIN #headers

    ON #foo.headerID = #headers.headerID

    AND #foo.headerText = #headers.headerText

    ) p

    PIVOT (

    Count(HeaderID)

    FOR headerText

    IN ( [abc], [def], [efg], [iol] )

    ) AS Pvt

    and here is the beginning of one using a CTE. Again I used the headerText as a unique identifier.

    WITH ReqHeaders AS (

    SELECT Row_Number() OVER ( ORDER BY headerText ) AS RowNum, headerText

    FROM #headers

    )

    SELECT accountID, #foo.headerText, RowNum

    , Row_Number() OVER (PARTITION BY accountID ORDER BY RowNum ) As AcctHeaderNum

    FROM ReqHeaders

    FULL OUTER JOIN #foo

    ON #foo.HeaderText = ReqHeaders.HeaderText

    Notice that the row number from the header field does not match the row number for the account id when there is a required header missing. The problem is that it won't show the last required header if that is the only one missing, but at least this gives you an idea about a different way to think about the problem.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • I realized what I was missing when I first approached this problem. You have to get a result for every single combination of account ID and required header. This either means a pivot (as in my first approach) or a cross join. Here is what I came up with.

    WITH Accounts AS (

    SELECT DISTINCT accountID

    FROM #foo

    )

    , ReqHeaders AS (

    SELECT #headers.headerid, #headers.headerText, Accounts.accountID

    FROM #headers

    CROSS JOIN Accounts

    )

    SELECT ReqHeaders.accountID, ReqHeaders.headerid, ReqHeaders.HeaderText

    FROM ReqHeaders

    LEFT OUTER JOIN #foo

    ON ReqHeaders.headerid = #foo.headerid

    AND ReqHeaders.headerText = #foo.headerText

    AND ReqHeaders.accountID = #foo.accountID

    WHERE #foo.accountID is Null -- Only display the missing headers

    ORDER BY ReqHeaders.AccountID, ReqHeaders.headerid, ReqHeaders.HeaderText

    This will work for any number of required headers.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Ouch .... I thought it was apparent.... but I should have been clearer about what I am expecting.

    I am expecting just the combinations that are missing from the #foo table.

    So I am expecting the following:

    AccountID : headerID: headerText

    1: 1: def < -- There is only 1 mandatory header missing for this account id

    2: 1: abc < -- Account id 2 has none of the mandatory headers

    2: 1: def

    2: 2: efg

    2: 3: iol

    3: 2: efg < -- Account id 3 has 2 missing mandatory headers

    3: 3: iol

    How To Post[/url]

  • Thanks for the reply Drew.

    The data set I have given is artificial - so that we can attack the core quickly. Trust me, I have 'heavily' simplified the

    structure of the tables that i am dealing with.

    You might want to reformulate your query based on the expected result set that I added to this thread.

    I am extremely sorry for not adding it to the original post ....

    How To Post[/url]

  • Bingo - this one works just as I expected it to....

    How To Post[/url]

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply