Help with a tricky query

Question

Post reply

Help with a tricky query

lee rite

SSC Enthusiast

Points: 136
More actions
March 3, 2011 at 12:03 pm

#235416

Hi,
I'm stuck with a following task.
Here is my problem (simplified):
I have 3 columns in my table: 2008, 2009, 2010
Here is what the data looks like:
2008 2009 2010
2 2 2
3 3 5
5 5 NULL
I need to compare values in each row, and find those rows where values are the same. The tricky part is that if the value is NULL, it needs to be ignored.
So based on the example above, my query would return rows 1 and 3.
Does it make sense?
Thank you very much in advance.

Viewing 15 posts - 1 through 15 (of 29 total)

You must be logged in to reply to this topic. Login to reply

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 1

A nasty but successful solution:

WHERE

COALESCE( 2008, 2009, 2010) = COALESCE( 2009, 2008, 2010)

AND COALESCE( 2008, 2009, 2010) = COALESCE( 2010, 2009, 2008)

....

That Null is a problem. If you could guarantee one of the columns as not null, just one, this would be a lot cleaner.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

GSquared SSC Guru Points: 260824 More actions · Answer 2

CREATE TABLE #T (

ID INT IDENTITY

PRIMARY KEY,

[2008] INT,

[2009] INT,

[2010] INT) ;

INSERT INTO #T

([2008],

[2009],

[2010])

SELECT 2,

2,

2

UNION ALL

SELECT 3,

3,

5

UNION ALL

SELECT 5,

5,

NULL ;

SELECT *

FROM #T

WHERE ([2008] = [2009]

AND [2009] = [2010])

OR ([2008] = [2009]

AND [2010] IS NULL)

OR ([2008] IS NULL

AND [2009] = [2010])

OR ([2008] = [2010]

AND [2009] IS NULL) ;

There are more complex ways to get this done, which may be better if you have more columns than just three.

I have to ask, why columns for years (that's what they look like)? Why not rows? That would make much more sense in most relational databases.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

lee rite SSC Enthusiast Points: 136 More actions · Answer 3

GSquared (3/3/2011)
CREATE TABLE #T ( ID INT IDENTITY PRIMARY KEY, [2008] INT, [2009] INT, [2010] INT) ; INSERT INTO #T ([2008], [2009], [2010]) SELECT 2, 2, 2 UNION ALL SELECT 3, 3, 5 UNION ALL SELECT 5, 5, NULL ; SELECT * FROM #T WHERE ([2008] = [2009] AND [2009] = [2010]) OR ([2008] = [2009] AND [2010] IS NULL) OR ([2008] IS NULL AND [2009] = [2010]) OR ([2008] = [2010]
AND [2009] IS NULL) ;
There are more complex ways to get this done, which may be better if you have more columns than just three.
I have to ask, why columns for years (that's what they look like)? Why not rows? That would make much more sense in most relational databases.

Thanks for your help, guys! There are actually 25 columns, not 3.

Let me explain exactly what I'm working with, so you will see why I need such a strange task. I have a list of companies. The timeline is 25 years. Some companies existed all of these 25 years, some were created at any point during these 25 years, some ceased to exist during these 25 years. So the nulls are in the columns where company didn't exist. What I need to find are those companies, that change their names during their existence, no matter what the longevity of that existence was. Does this make more sense now?

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 4

Makes sense :-); how many rows are there in the table ?

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 5

lee rite (3/3/2011)
Thanks for your help, guys! There are actually 25 columns, not 3.
Let me explain exactly what I'm working with, so you will see why I need such a strange task. I have a list of companies. The timeline is 25 years. Some companies existed all of these 25 years, some were created at any point during these 25 years, some ceased to exist during these 25 years. So the nulls are in the columns where company didn't exist. What I need to find are those companies, that change their names during their existence, no matter what the longevity of that existence was. Does this make more sense now?

I would sincerely follow Gus's advice and look into normalizing this data. An UnPivot would work well here, and would allow for you to use more common conventions in locating this data.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

lee rite SSC Enthusiast Points: 136 More actions · Answer 6

lee rite

SSC Enthusiast

Points: 136

March 3, 2011 at 12:32 pm

#1294249

About 5,000....

lee rite SSC Enthusiast Points: 136 More actions · Answer 7

Craig Farrell (3/3/2011)
lee rite (3/3/2011)
Thanks for your help, guys! There are actually 25 columns, not 3.
Let me explain exactly what I'm working with, so you will see why I need such a strange task. I have a list of companies. The timeline is 25 years. Some companies existed all of these 25 years, some were created at any point during these 25 years, some ceased to exist during these 25 years. So the nulls are in the columns where company didn't exist. What I need to find are those companies, that change their names during their existence, no matter what the longevity of that existence was. Does this make more sense now?
I would sincerely follow Gus's advice and look into normalizing this data. An UnPivot would work well here, and would allow for you to use more common conventions in locating this data.

I actually created that table myself from the original table, thinking that this will somehow help me to find the solution. This was the structure of the original table that consists all data:

GUID, YEAR, CompanyName

111 2002 Company1

111 2003 Company1

111 2004 NULL

222 2002 Company2

222 2003 Company2

222 2004 Company2

333 2002 Company3

333 2003 Company3

333 2004 Company34

and so on...

I couldnt think of other ways to deal with it. Maybe you guys have some ideas? Thanks!

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 8

Try this:

SELECT ID

FROM

( SELECT * FROM #T ) PIVOT_TABLE

UNPIVOT

(

Vals FOR ColNames IN ([2008],[2009],[2010])

) PIVOT_HANDLE

GROUP BY ID

HAVING COUNT( DISTINCT PIVOT_HANDLE.Vals) = 1

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 9

lee rite (3/3/2011)
Craig Farrell (3/3/2011)
lee rite (3/3/2011)
Thanks for your help, guys! There are actually 25 columns, not 3.
Let me explain exactly what I'm working with, so you will see why I need such a strange task. I have a list of companies. The timeline is 25 years. Some companies existed all of these 25 years, some were created at any point during these 25 years, some ceased to exist during these 25 years. So the nulls are in the columns where company didn't exist. What I need to find are those companies, that change their names during their existence, no matter what the longevity of that existence was. Does this make more sense now?
I would sincerely follow Gus's advice and look into normalizing this data. An UnPivot would work well here, and would allow for you to use more common conventions in locating this data.
I actually created that table myself from the original table, thinking that this will somehow help me to find the solution. This was the structure of the original table that consists all data:
GUID, YEAR, CompanyName
111 2002 Company1
111 2003 Company1
111 2004 NULL
222 2002 Company2
222 2003 Company2
222 2004 Company2
333 2002 Company3
333 2003 Company3
333 2004 Company34
and so on...
I couldnt think of other ways to deal with it. Maybe you guys have some ideas? Thanks!

If this is your data, then

SELECT GUID

FROM #T

GROUP BY GUID

HAVING COUNT( DISTINCT CompanyName) = 1

will do...

lee rite SSC Enthusiast Points: 136 More actions · Answer 10

ColdCoffee (3/3/2011)
Try this:
SELECT ID
FROM
( SELECT * FROM #T ) PIVOT_TABLE
UNPIVOT
(
Vals FOR ColNames IN ([2008],[2009],[2010])
) PIVOT_HANDLE
GROUP BY ID
HAVING COUNT( DISTINCT PIVOT_HANDLE.Vals) = 1

Wow! This is way over my head! Impressive! But I get following error when trying to run this:

Incorrect syntax near 'UNPIVOT'. You may need to set the compatibility level of the current database to a higher value to enable this feature. See help for the stored procedure sp_dbcmptlevel.

I have SQL Server 2005.

GSquared SSC Guru Points: 260824 More actions · Answer 11

lee rite (3/3/2011)
Craig Farrell (3/3/2011)
lee rite (3/3/2011)
Thanks for your help, guys! There are actually 25 columns, not 3.
Let me explain exactly what I'm working with, so you will see why I need such a strange task. I have a list of companies. The timeline is 25 years. Some companies existed all of these 25 years, some were created at any point during these 25 years, some ceased to exist during these 25 years. So the nulls are in the columns where company didn't exist. What I need to find are those companies, that change their names during their existence, no matter what the longevity of that existence was. Does this make more sense now?
I would sincerely follow Gus's advice and look into normalizing this data. An UnPivot would work well here, and would allow for you to use more common conventions in locating this data.
I actually created that table myself from the original table, thinking that this will somehow help me to find the solution. This was the structure of the original table that consists all data:
GUID, YEAR, CompanyName
111 2002 Company1
111 2003 Company1
111 2004 NULL
222 2002 Company2
222 2003 Company2
222 2004 Company2
333 2002 Company3
333 2003 Company3
333 2004 Company34
and so on...
I couldnt think of other ways to deal with it. Maybe you guys have some ideas? Thanks!

Is the "GUID" column some sort of identifier for the company, or is it something else? I.e.: Did Company3 become Company34, since both have "GUID" 333, in your example?

Or is there some other way of identifying which prior-year rows are tied to which later-year rows, other than just CompanyName?

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

lee rite SSC Enthusiast Points: 136 More actions · Answer 12

GSquared (3/3/2011)
lee rite (3/3/2011)
Craig Farrell (3/3/2011)
lee rite (3/3/2011)
Thanks for your help, guys! There are actually 25 columns, not 3.
Let me explain exactly what I'm working with, so you will see why I need such a strange task. I have a list of companies. The timeline is 25 years. Some companies existed all of these 25 years, some were created at any point during these 25 years, some ceased to exist during these 25 years. So the nulls are in the columns where company didn't exist. What I need to find are those companies, that change their names during their existence, no matter what the longevity of that existence was. Does this make more sense now?
I would sincerely follow Gus's advice and look into normalizing this data. An UnPivot would work well here, and would allow for you to use more common conventions in locating this data.
I actually created that table myself from the original table, thinking that this will somehow help me to find the solution. This was the structure of the original table that consists all data:
GUID, YEAR, CompanyName
111 2002 Company1
111 2003 Company1
111 2004 NULL
222 2002 Company2
222 2003 Company2
222 2004 Company2
333 2002 Company3
333 2003 Company3
333 2004 Company34
and so on...
I couldnt think of other ways to deal with it. Maybe you guys have some ideas? Thanks!
Is the "GUID" column some sort of identifier for the company, or is it something else? I.e.: Did Company3 become Company34, since both have "GUID" 333, in your example?
Or is there some other way of identifying which prior-year rows are tied to which later-year rows, other than just CompanyName?

GUID is the unique identifier for the company which lets me know that Company3 and Company34 IS in fact the same company, that changed its name.

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 13

SELECT GUID

FROM #T

GROUP BY GUID

HAVING COUNT( DISTINCT CompanyName) = 1

lee rite SSC Enthusiast Points: 136 More actions · Answer 14

ColdCoffee (3/3/2011)
SELECT GUID FROM #T GROUP BY GUID
HAVING COUNT( DISTINCT CompanyName) = 1

Thanks ColdCoffee! I think you nailed it. Your query selects all companies that never changed their name. I think I can work with that!

Thank you very much, guys!