Help with a tricky query

  • Hi,

    I'm stuck with a following task.

    Here is my problem (simplified):

    I have 3 columns in my table: 2008, 2009, 2010

    Here is what the data looks like:

    2008 2009 2010

    2 2 2

    3 3 5

    5 5 NULL

    I need to compare values in each row, and find those rows where values are the same. The tricky part is that if the value is NULL, it needs to be ignored.

    So based on the example above, my query would return rows 1 and 3.

    Does it make sense?

    Thank you very much in advance.

  • A nasty but successful solution:

    WHERE

    COALESCE( 2008, 2009, 2010) = COALESCE( 2009, 2008, 2010)

    AND COALESCE( 2008, 2009, 2010) = COALESCE( 2010, 2009, 2008)

    ....

    That Null is a problem. If you could guarantee one of the columns as not null, just one, this would be a lot cleaner.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • CREATE TABLE #T (

    ID INT IDENTITY

    PRIMARY KEY,

    [2008] INT,

    [2009] INT,

    [2010] INT) ;

    INSERT INTO #T

    ([2008],

    [2009],

    [2010])

    SELECT 2,

    2,

    2

    UNION ALL

    SELECT 3,

    3,

    5

    UNION ALL

    SELECT 5,

    5,

    NULL ;

    SELECT *

    FROM #T

    WHERE ([2008] = [2009]

    AND [2009] = [2010])

    OR ([2008] = [2009]

    AND [2010] IS NULL)

    OR ([2008] IS NULL

    AND [2009] = [2010])

    OR ([2008] = [2010]

    AND [2009] IS NULL) ;

    There are more complex ways to get this done, which may be better if you have more columns than just three.

    I have to ask, why columns for years (that's what they look like)? Why not rows? That would make much more sense in most relational databases.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (3/3/2011)


    CREATE TABLE #T (

    ID INT IDENTITY

    PRIMARY KEY,

    [2008] INT,

    [2009] INT,

    [2010] INT) ;

    INSERT INTO #T

    ([2008],

    [2009],

    [2010])

    SELECT 2,

    2,

    2

    UNION ALL

    SELECT 3,

    3,

    5

    UNION ALL

    SELECT 5,

    5,

    NULL ;

    SELECT *

    FROM #T

    WHERE ([2008] = [2009]

    AND [2009] = [2010])

    OR ([2008] = [2009]

    AND [2010] IS NULL)

    OR ([2008] IS NULL

    AND [2009] = [2010])

    OR ([2008] = [2010]

    AND [2009] IS NULL) ;

    There are more complex ways to get this done, which may be better if you have more columns than just three.

    I have to ask, why columns for years (that's what they look like)? Why not rows? That would make much more sense in most relational databases.

    Thanks for your help, guys! There are actually 25 columns, not 3.

    Let me explain exactly what I'm working with, so you will see why I need such a strange task. I have a list of companies. The timeline is 25 years. Some companies existed all of these 25 years, some were created at any point during these 25 years, some ceased to exist during these 25 years. So the nulls are in the columns where company didn't exist. What I need to find are those companies, that change their names during their existence, no matter what the longevity of that existence was. Does this make more sense now?

  • Makes sense :-); how many rows are there in the table ?

  • lee rite (3/3/2011)


    Thanks for your help, guys! There are actually 25 columns, not 3.

    Let me explain exactly what I'm working with, so you will see why I need such a strange task. I have a list of companies. The timeline is 25 years. Some companies existed all of these 25 years, some were created at any point during these 25 years, some ceased to exist during these 25 years. So the nulls are in the columns where company didn't exist. What I need to find are those companies, that change their names during their existence, no matter what the longevity of that existence was. Does this make more sense now?

    I would sincerely follow Gus's advice and look into normalizing this data. An UnPivot would work well here, and would allow for you to use more common conventions in locating this data.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • About 5,000....

  • Craig Farrell (3/3/2011)


    lee rite (3/3/2011)


    Thanks for your help, guys! There are actually 25 columns, not 3.

    Let me explain exactly what I'm working with, so you will see why I need such a strange task. I have a list of companies. The timeline is 25 years. Some companies existed all of these 25 years, some were created at any point during these 25 years, some ceased to exist during these 25 years. So the nulls are in the columns where company didn't exist. What I need to find are those companies, that change their names during their existence, no matter what the longevity of that existence was. Does this make more sense now?

    I would sincerely follow Gus's advice and look into normalizing this data. An UnPivot would work well here, and would allow for you to use more common conventions in locating this data.

    I actually created that table myself from the original table, thinking that this will somehow help me to find the solution. This was the structure of the original table that consists all data:

    GUID, YEAR, CompanyName

    111 2002 Company1

    111 2003 Company1

    111 2004 NULL

    222 2002 Company2

    222 2003 Company2

    222 2004 Company2

    333 2002 Company3

    333 2003 Company3

    333 2004 Company34

    and so on...

    I couldnt think of other ways to deal with it. Maybe you guys have some ideas? Thanks!

  • Try this:

    SELECT ID

    FROM

    ( SELECT * FROM #T ) PIVOT_TABLE

    UNPIVOT

    (

    Vals FOR ColNames IN ([2008],[2009],[2010])

    ) PIVOT_HANDLE

    GROUP BY ID

    HAVING COUNT( DISTINCT PIVOT_HANDLE.Vals) = 1

  • lee rite (3/3/2011)


    Craig Farrell (3/3/2011)


    lee rite (3/3/2011)


    Thanks for your help, guys! There are actually 25 columns, not 3.

    Let me explain exactly what I'm working with, so you will see why I need such a strange task. I have a list of companies. The timeline is 25 years. Some companies existed all of these 25 years, some were created at any point during these 25 years, some ceased to exist during these 25 years. So the nulls are in the columns where company didn't exist. What I need to find are those companies, that change their names during their existence, no matter what the longevity of that existence was. Does this make more sense now?

    I would sincerely follow Gus's advice and look into normalizing this data. An UnPivot would work well here, and would allow for you to use more common conventions in locating this data.

    I actually created that table myself from the original table, thinking that this will somehow help me to find the solution. This was the structure of the original table that consists all data:

    GUID, YEAR, CompanyName

    111 2002 Company1

    111 2003 Company1

    111 2004 NULL

    222 2002 Company2

    222 2003 Company2

    222 2004 Company2

    333 2002 Company3

    333 2003 Company3

    333 2004 Company34

    and so on...

    I couldnt think of other ways to deal with it. Maybe you guys have some ideas? Thanks!

    If this is your data, then

    SELECT GUID

    FROM #T

    GROUP BY GUID

    HAVING COUNT( DISTINCT CompanyName) = 1

    will do...

  • ColdCoffee (3/3/2011)


    Try this:

    SELECT ID

    FROM

    ( SELECT * FROM #T ) PIVOT_TABLE

    UNPIVOT

    (

    Vals FOR ColNames IN ([2008],[2009],[2010])

    ) PIVOT_HANDLE

    GROUP BY ID

    HAVING COUNT( DISTINCT PIVOT_HANDLE.Vals) = 1

    Wow! This is way over my head! Impressive! But I get following error when trying to run this:

    Incorrect syntax near 'UNPIVOT'. You may need to set the compatibility level of the current database to a higher value to enable this feature. See help for the stored procedure sp_dbcmptlevel.

    I have SQL Server 2005.

  • lee rite (3/3/2011)


    Craig Farrell (3/3/2011)


    lee rite (3/3/2011)


    Thanks for your help, guys! There are actually 25 columns, not 3.

    Let me explain exactly what I'm working with, so you will see why I need such a strange task. I have a list of companies. The timeline is 25 years. Some companies existed all of these 25 years, some were created at any point during these 25 years, some ceased to exist during these 25 years. So the nulls are in the columns where company didn't exist. What I need to find are those companies, that change their names during their existence, no matter what the longevity of that existence was. Does this make more sense now?

    I would sincerely follow Gus's advice and look into normalizing this data. An UnPivot would work well here, and would allow for you to use more common conventions in locating this data.

    I actually created that table myself from the original table, thinking that this will somehow help me to find the solution. This was the structure of the original table that consists all data:

    GUID, YEAR, CompanyName

    111 2002 Company1

    111 2003 Company1

    111 2004 NULL

    222 2002 Company2

    222 2003 Company2

    222 2004 Company2

    333 2002 Company3

    333 2003 Company3

    333 2004 Company34

    and so on...

    I couldnt think of other ways to deal with it. Maybe you guys have some ideas? Thanks!

    Is the "GUID" column some sort of identifier for the company, or is it something else? I.e.: Did Company3 become Company34, since both have "GUID" 333, in your example?

    Or is there some other way of identifying which prior-year rows are tied to which later-year rows, other than just CompanyName?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (3/3/2011)


    lee rite (3/3/2011)


    Craig Farrell (3/3/2011)


    lee rite (3/3/2011)


    Thanks for your help, guys! There are actually 25 columns, not 3.

    Let me explain exactly what I'm working with, so you will see why I need such a strange task. I have a list of companies. The timeline is 25 years. Some companies existed all of these 25 years, some were created at any point during these 25 years, some ceased to exist during these 25 years. So the nulls are in the columns where company didn't exist. What I need to find are those companies, that change their names during their existence, no matter what the longevity of that existence was. Does this make more sense now?

    I would sincerely follow Gus's advice and look into normalizing this data. An UnPivot would work well here, and would allow for you to use more common conventions in locating this data.

    I actually created that table myself from the original table, thinking that this will somehow help me to find the solution. This was the structure of the original table that consists all data:

    GUID, YEAR, CompanyName

    111 2002 Company1

    111 2003 Company1

    111 2004 NULL

    222 2002 Company2

    222 2003 Company2

    222 2004 Company2

    333 2002 Company3

    333 2003 Company3

    333 2004 Company34

    and so on...

    I couldnt think of other ways to deal with it. Maybe you guys have some ideas? Thanks!

    Is the "GUID" column some sort of identifier for the company, or is it something else? I.e.: Did Company3 become Company34, since both have "GUID" 333, in your example?

    Or is there some other way of identifying which prior-year rows are tied to which later-year rows, other than just CompanyName?

    GUID is the unique identifier for the company which lets me know that Company3 and Company34 IS in fact the same company, that changed its name.

  • SELECT GUID

    FROM #T

    GROUP BY GUID

    HAVING COUNT( DISTINCT CompanyName) = 1

  • ColdCoffee (3/3/2011)


    SELECT GUID

    FROM #T

    GROUP BY GUID

    HAVING COUNT( DISTINCT CompanyName) = 1

    Thanks ColdCoffee! I think you nailed it. Your query selects all companies that never changed their name. I think I can work with that!

    Thank you very much, guys!

Viewing 15 posts - 1 through 15 (of 29 total)

You must be logged in to reply to this topic. Login to reply