Most efficient way to get a single unique combined record for distributed information corresponding to same ID

Question

Post reply

Most efficient way to get a single unique combined record for distributed information corresponding to same ID

ksatpute123

Hall of Fame

Points: 3325
More actions
November 14, 2013 at 7:32 am

#286240

I have a scenario like this:
create table #temp(id int, name varchar(50), age int, contact_number varchar(50))
insert into #temp(id,name)
select 1,'John'
go
insert into #temp(id,age)
select 1,34
go
insert into #temp(id,contact_number)
select 1,'222-444-5555'
If we query this temporary table we get following output:
id,name,age,contact_number
1,John,NULL,NULL
1,NULL,34,NULL
1,NULL,NULL,222-444-5555
What is the most efficient way of getting the output like this:
id,name,age,contact_number
1,John,34,222-444-5555

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Sean Lange SSC Guru Points: 286573 More actions · Answer 1

Thanks for posting ddl and sample data. This goes a LONG way to making it easy on us.

This should work for you.

select ID, MAX(name), MAX(age), MAX(contact_number)

from #temp

group by ID

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

ksatpute123 Hall of Fame Points: 3325 More actions · Answer 2

Thank you for the quick reply.

I have already tried using this approach. It is based on the assumption that there will be only one value per column corresponding to a primary key.

Suppose if I insert one more record in same table like this:

insert into #temp(id,contact_number)

select 1,'565-888-9654'

Then select * from #temp will be interpreted as ID=1 has 2 contact numbers.

Actual output in this case should be

id,name,age,contact_number

1,John,34,222-444-5555

1,John,34,565-888-9654

Aggregation will give me only 1 record.

I know this is not the proper way of storing data but right now I am stuck with something like this.

Sean Lange SSC Guru Points: 286573 More actions · Answer 3

I know this is not the proper way of storing data

Phew!!! That is an understatement!!!

So do you have duplicates only on contact_number? If so, you could do something like this:

select ID, MAX(name), MAX(age), x.contact_number

from #temp

cross apply (select contact_number from #temp) x

where x.contact_number is not null

group by ID, x.contact_number

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Jason Wolfkill SSCrazy Eights Points: 9832 More actions · Answer 4

ksatpute123 (11/14/2013)
Thank you for the quick reply.
I have already tried using this approach. It is based on the assumption that there will be only one value per column corresponding to a primary key.
Suppose if I insert one more record in same table like this:
insert into #temp(id,contact_number)
select 1,'565-888-9654'
Then select * from #temp will be interpreted as ID=1 has 2 contact numbers.
Actual output in this case should be
id,name,age,contact_number
1,John,34,222-444-5555
1,John,34,565-888-9654
Aggregation will give me only 1 record.
I know this is not the proper way of storing data but right now I am stuck with something like this.

This sounds like a nightmare.

Given the insanity you'd have to confront every time you try to query this data, could you possible do something like this? It's cumbersome, but with some indexing, you could probably get it to perform decently for queries that aren't *too* intensive.

create table data (id int, name varchar(10), age int, phone varchar(8) )

go

insert into data

values (1, 'jason', null, '987-4522'),

(1, 'jason', 12, '887-5563'),

(2, 'harvey', null, '777-9999'),

(2, null, 37, null),

(2, null, null, '444-5555'),

(3, 'harold', 26, null),

(3, null, 34, '558-8745'),

(3, null, null, '333-7777')

go

create view names as

select id, name from data where name is not null

go

create view age as

select id, age from data where age is not null

go

create view phone as

select id, phone from data where phone is not null

go

create view nameAgePhone as

select distinct coalesce(n.id, a.id, p.id) as id, n.name, a.age, p.phone

from names n

full join age a

on a.id = n.id

full join phone p

on p.id = n.id

go

select * from nameAgePhone

go

Jason Wolfkill

Dwain Camps SSC Guru Points: 86908 More actions · Answer 5

If contact number is the only column that may have duplicates, you can also do it like this.

DECLARE @temp TABLE(id int, name varchar(50), age int, contact_number varchar(50));

insert into @temp(id,name) select 1,'John';

insert into @temp(id,age) select 1,34;

insert into @temp(id,contact_number) select 1,'222-444-5555';

insert into @temp(id,contact_number) select 1,'333-444-5555';

SELECT id, name, age, contact_number

FROM

(

SELECT id

,name=MAX(name) OVER (PARTITION BY id)

,age=MAX(age) OVER (PARTITION BY id)

,contact_number

FROM @temp

) a

WHERE contact_number IS NOT NULL;

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

ksatpute123 Hall of Fame Points: 3325 More actions · Answer 6

dwain.c (11/17/2013)
If contact number is the only column that may have duplicates, you can also do it like this.
DECLARE @temp TABLE(id int, name varchar(50), age int, contact_number varchar(50));
insert into @temp(id,name) select 1,'John';
insert into @temp(id,age) select 1,34;
insert into @temp(id,contact_number) select 1,'222-444-5555';
insert into @temp(id,contact_number) select 1,'333-444-5555';
SELECT id, name, age, contact_number
FROM
(
SELECT id
,name=MAX(name) OVER (PARTITION BY id)
,age=MAX(age) OVER (PARTITION BY id)
,contact_number
FROM @temp
) a
WHERE contact_number IS NOT NULL;

This is a very good example but what will happen if I have thousands of IDs with such records. This approach is clearly not feasible in a transnational system where these records may keep increasing or updating.

ksatpute123 Hall of Fame Points: 3325 More actions · Answer 7

wolfkillj (11/15/2013)
ksatpute123 (11/14/2013)
Thank you for the quick reply.
I have already tried using this approach. It is based on the assumption that there will be only one value per column corresponding to a primary key.
Suppose if I insert one more record in same table like this:
insert into #temp(id,contact_number)
select 1,'565-888-9654'
Then select * from #temp will be interpreted as ID=1 has 2 contact numbers.
Actual output in this case should be
id,name,age,contact_number
1,John,34,222-444-5555
1,John,34,565-888-9654
Aggregation will give me only 1 record.
I know this is not the proper way of storing data but right now I am stuck with something like this.
This sounds like a nightmare.
Given the insanity you'd have to confront every time you try to query this data, could you possible do something like this? It's cumbersome, but with some indexing, you could probably get it to perform decently for queries that aren't *too* intensive.
create table data (id int, name varchar(10), age int, phone varchar(8) )
go
insert into data
values (1, 'jason', null, '987-4522'),
(1, 'jason', 12, '887-5563'),
(2, 'harvey', null, '777-9999'),
(2, null, 37, null),
(2, null, null, '444-5555'),
(3, 'harold', 26, null),
(3, null, 34, '558-8745'),
(3, null, null, '333-7777')
go
create view names as
select id, name from data where name is not null
go
create view age as
select id, age from data where age is not null
go
create view phone as
select id, phone from data where phone is not null
go
create view nameAgePhone as
select distinct coalesce(n.id, a.id, p.id) as id, n.name, a.age, p.phone
from names n
full join age a
on a.id = n.id
full join phone p
on p.id = n.id
go
select * from nameAgePhone
go

I am right now doing something on the similar lines. Truly a nightmare. I have put in a request to revamp the structure and redesign the architecture but it will take time. Till then stuck with this.

Dwain Camps SSC Guru Points: 86908 More actions · Answer 8

Hold on... rethinking my answer.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Dwain Camps SSC Guru Points: 86908 More actions · Answer 9

ksatpute123 (11/18/2013)
dwain.c (11/17/2013)
If contact number is the only column that may have duplicates, you can also do it like this.
DECLARE @temp TABLE(id int, name varchar(50), age int, contact_number varchar(50));
insert into @temp(id,name) select 1,'John';
insert into @temp(id,age) select 1,34;
insert into @temp(id,contact_number) select 1,'222-444-5555';
insert into @temp(id,contact_number) select 1,'333-444-5555';
SELECT id, name, age, contact_number
FROM
(
SELECT id
,name=MAX(name) OVER (PARTITION BY id)
,age=MAX(age) OVER (PARTITION BY id)
,contact_number
FROM @temp
) a
WHERE contact_number IS NOT NULL;
This is a very good example but what will happen if I have thousands of IDs with such records. This approach is clearly not feasible in a transnational system where these records may keep increasing or updating.

Maybe this...

DECLARE @temp TABLE(id int, name varchar(50), age int, contact_number varchar(50));

insert into @temp(id,name) select 1,'John';

insert into @temp(id,age) select 1,34;

insert into @temp(id,contact_number) select 1,'222-444-5555';

insert into @temp(id,contact_number) select 1,'333-444-5555';

SELECT id, name=MAX(name), age=MAX(age), contact_number

FROM

(

SELECT id

,name=MAX(name) OVER (PARTITION BY id)

,age=MAX(age) OVER (PARTITION BY id)

,contact_number

FROM @temp

) a

WHERE contact_number IS NOT NULL

GROUP BY ID, contact_number;

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

ScottPletcher SSC Guru Points: 100948 More actions · Answer 10

MAX(age) seems safe enough -- no one gets any younger :-).

Combining MIN(name) and MAX(name) would allow one name change ... that should handle everyone, right!?

/*

insert into #temp(id,name)

select 2,'Jane Single'

insert into #temp(id,age)

select 2,28

insert into #temp(id,contact_number)

select 2,'123-456-7890'

insert into #temp(id,contact_number)

select 2,'987-654-3210'

insert into #temp(id,name)

select 2,'Jane Married'

*/

SELECT

tGroup.id, ca1.name, tGroup.age, t.contact_number

FROM (

SELECT id, MIN(name) AS name_min, MAX(name) AS name_max, MAX(age) AS age

FROM #temp

GROUP BY id

) AS tGroup

LEFT OUTER JOIN #temp t ON

t.id = tGroup.id

CROSS APPLY (

SELECT DISTINCT name

FROM (

SELECT name_min AS name UNION ALL

SELECT name_max

) AS derived

) AS ca1

WHERE

t.contact_number IS NOT NULL

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.