Most efficient way to get a single unique combined record for distributed information corresponding to same ID

  • I have a scenario like this:

    create table #temp(id int, name varchar(50), age int, contact_number varchar(50))

    insert into #temp(id,name)

    select 1,'John'

    go

    insert into #temp(id,age)

    select 1,34

    go

    insert into #temp(id,contact_number)

    select 1,'222-444-5555'

    If we query this temporary table we get following output:

    id,name,age,contact_number

    1,John,NULL,NULL

    1,NULL,34,NULL

    1,NULL,NULL,222-444-5555

    What is the most efficient way of getting the output like this:

    id,name,age,contact_number

    1,John,34,222-444-5555

  • Thanks for posting ddl and sample data. This goes a LONG way to making it easy on us.

    This should work for you.

    select ID, MAX(name), MAX(age), MAX(contact_number)

    from #temp

    group by ID

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Thank you for the quick reply.

    I have already tried using this approach. It is based on the assumption that there will be only one value per column corresponding to a primary key.

    Suppose if I insert one more record in same table like this:

    insert into #temp(id,contact_number)

    select 1,'565-888-9654'

    Then select * from #temp will be interpreted as ID=1 has 2 contact numbers.

    Actual output in this case should be

    id,name,age,contact_number

    1,John,34,222-444-5555

    1,John,34,565-888-9654

    Aggregation will give me only 1 record.

    I know this is not the proper way of storing data but right now I am stuck with something like this.

  • I know this is not the proper way of storing data

    Phew!!! That is an understatement!!!

    So do you have duplicates only on contact_number? If so, you could do something like this:

    select ID, MAX(name), MAX(age), x.contact_number

    from #temp

    cross apply (select contact_number from #temp) x

    where x.contact_number is not null

    group by ID, x.contact_number

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • ksatpute123 (11/14/2013)


    Thank you for the quick reply.

    I have already tried using this approach. It is based on the assumption that there will be only one value per column corresponding to a primary key.

    Suppose if I insert one more record in same table like this:

    insert into #temp(id,contact_number)

    select 1,'565-888-9654'

    Then select * from #temp will be interpreted as ID=1 has 2 contact numbers.

    Actual output in this case should be

    id,name,age,contact_number

    1,John,34,222-444-5555

    1,John,34,565-888-9654

    Aggregation will give me only 1 record.

    I know this is not the proper way of storing data but right now I am stuck with something like this.

    This sounds like a nightmare.

    Given the insanity you'd have to confront every time you try to query this data, could you possible do something like this? It's cumbersome, but with some indexing, you could probably get it to perform decently for queries that aren't *too* intensive.

    create table data (id int, name varchar(10), age int, phone varchar(8) )

    go

    insert into data

    values (1, 'jason', null, '987-4522'),

    (1, 'jason', 12, '887-5563'),

    (2, 'harvey', null, '777-9999'),

    (2, null, 37, null),

    (2, null, null, '444-5555'),

    (3, 'harold', 26, null),

    (3, null, 34, '558-8745'),

    (3, null, null, '333-7777')

    go

    create view names as

    select id, name from data where name is not null

    go

    create view age as

    select id, age from data where age is not null

    go

    create view phone as

    select id, phone from data where phone is not null

    go

    create view nameAgePhone as

    select distinct coalesce(n.id, a.id, p.id) as id, n.name, a.age, p.phone

    from names n

    full join age a

    on a.id = n.id

    full join phone p

    on p.id = n.id

    go

    select * from nameAgePhone

    go

    Jason Wolfkill

  • If contact number is the only column that may have duplicates, you can also do it like this.

    DECLARE @temp TABLE(id int, name varchar(50), age int, contact_number varchar(50));

    insert into @temp(id,name) select 1,'John';

    insert into @temp(id,age) select 1,34;

    insert into @temp(id,contact_number) select 1,'222-444-5555';

    insert into @temp(id,contact_number) select 1,'333-444-5555';

    SELECT id, name, age, contact_number

    FROM

    (

    SELECT id

    ,name=MAX(name) OVER (PARTITION BY id)

    ,age=MAX(age) OVER (PARTITION BY id)

    ,contact_number

    FROM @temp

    ) a

    WHERE contact_number IS NOT NULL;


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (11/17/2013)


    If contact number is the only column that may have duplicates, you can also do it like this.

    DECLARE @temp TABLE(id int, name varchar(50), age int, contact_number varchar(50));

    insert into @temp(id,name) select 1,'John';

    insert into @temp(id,age) select 1,34;

    insert into @temp(id,contact_number) select 1,'222-444-5555';

    insert into @temp(id,contact_number) select 1,'333-444-5555';

    SELECT id, name, age, contact_number

    FROM

    (

    SELECT id

    ,name=MAX(name) OVER (PARTITION BY id)

    ,age=MAX(age) OVER (PARTITION BY id)

    ,contact_number

    FROM @temp

    ) a

    WHERE contact_number IS NOT NULL;

    This is a very good example but what will happen if I have thousands of IDs with such records. This approach is clearly not feasible in a transnational system where these records may keep increasing or updating.

  • wolfkillj (11/15/2013)


    ksatpute123 (11/14/2013)


    Thank you for the quick reply.

    I have already tried using this approach. It is based on the assumption that there will be only one value per column corresponding to a primary key.

    Suppose if I insert one more record in same table like this:

    insert into #temp(id,contact_number)

    select 1,'565-888-9654'

    Then select * from #temp will be interpreted as ID=1 has 2 contact numbers.

    Actual output in this case should be

    id,name,age,contact_number

    1,John,34,222-444-5555

    1,John,34,565-888-9654

    Aggregation will give me only 1 record.

    I know this is not the proper way of storing data but right now I am stuck with something like this.

    This sounds like a nightmare.

    Given the insanity you'd have to confront every time you try to query this data, could you possible do something like this? It's cumbersome, but with some indexing, you could probably get it to perform decently for queries that aren't *too* intensive.

    create table data (id int, name varchar(10), age int, phone varchar(8) )

    go

    insert into data

    values (1, 'jason', null, '987-4522'),

    (1, 'jason', 12, '887-5563'),

    (2, 'harvey', null, '777-9999'),

    (2, null, 37, null),

    (2, null, null, '444-5555'),

    (3, 'harold', 26, null),

    (3, null, 34, '558-8745'),

    (3, null, null, '333-7777')

    go

    create view names as

    select id, name from data where name is not null

    go

    create view age as

    select id, age from data where age is not null

    go

    create view phone as

    select id, phone from data where phone is not null

    go

    create view nameAgePhone as

    select distinct coalesce(n.id, a.id, p.id) as id, n.name, a.age, p.phone

    from names n

    full join age a

    on a.id = n.id

    full join phone p

    on p.id = n.id

    go

    select * from nameAgePhone

    go

    I am right now doing something on the similar lines. Truly a nightmare. I have put in a request to revamp the structure and redesign the architecture but it will take time. Till then stuck with this.

  • Hold on... rethinking my answer.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • ksatpute123 (11/18/2013)


    dwain.c (11/17/2013)


    If contact number is the only column that may have duplicates, you can also do it like this.

    DECLARE @temp TABLE(id int, name varchar(50), age int, contact_number varchar(50));

    insert into @temp(id,name) select 1,'John';

    insert into @temp(id,age) select 1,34;

    insert into @temp(id,contact_number) select 1,'222-444-5555';

    insert into @temp(id,contact_number) select 1,'333-444-5555';

    SELECT id, name, age, contact_number

    FROM

    (

    SELECT id

    ,name=MAX(name) OVER (PARTITION BY id)

    ,age=MAX(age) OVER (PARTITION BY id)

    ,contact_number

    FROM @temp

    ) a

    WHERE contact_number IS NOT NULL;

    This is a very good example but what will happen if I have thousands of IDs with such records. This approach is clearly not feasible in a transnational system where these records may keep increasing or updating.

    Maybe this...

    DECLARE @temp TABLE(id int, name varchar(50), age int, contact_number varchar(50));

    insert into @temp(id,name) select 1,'John';

    insert into @temp(id,age) select 1,34;

    insert into @temp(id,contact_number) select 1,'222-444-5555';

    insert into @temp(id,contact_number) select 1,'333-444-5555';

    SELECT id, name=MAX(name), age=MAX(age), contact_number

    FROM

    (

    SELECT id

    ,name=MAX(name) OVER (PARTITION BY id)

    ,age=MAX(age) OVER (PARTITION BY id)

    ,contact_number

    FROM @temp

    ) a

    WHERE contact_number IS NOT NULL

    GROUP BY ID, contact_number;


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • MAX(age) seems safe enough -- no one gets any younger :-).

    Combining MIN(name) and MAX(name) would allow one name change ... that should handle everyone, right!?

    /*

    insert into #temp(id,name)

    select 2,'Jane Single'

    insert into #temp(id,age)

    select 2,28

    insert into #temp(id,contact_number)

    select 2,'123-456-7890'

    insert into #temp(id,contact_number)

    select 2,'987-654-3210'

    insert into #temp(id,name)

    select 2,'Jane Married'

    */

    SELECT

    tGroup.id, ca1.name, tGroup.age, t.contact_number

    FROM (

    SELECT id, MIN(name) AS name_min, MAX(name) AS name_max, MAX(age) AS age

    FROM #temp

    GROUP BY id

    ) AS tGroup

    LEFT OUTER JOIN #temp t ON

    t.id = tGroup.id

    CROSS APPLY (

    SELECT DISTINCT name

    FROM (

    SELECT name_min AS name UNION ALL

    SELECT name_max

    ) AS derived

    ) AS ca1

    WHERE

    t.contact_number IS NOT NULL

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply