Deduplicating data question

  • Hi all - I've been tasked with de-duplicating some customer data and am having difficultly getting the query right, I'm hoping someone can help me out. I have a table that looks something like this:

    DECLARE @data TABLE (CustomerID INT IDENTITY(1,1), LastName NVARCHAR(100), FirstName NVARCHAR(100))

    INSERT INTO @data (LastName,FirstName)

    SELECT 'Smith','John'

    UNION ALL

    SELECT 'Smith','John'

    UNION ALL

    SELECT 'Ford','Sam'

    UNION ALL

    SELECT 'Knox','Katie'

    UNION ALL

    SELECT 'Knox','Katie'

    The output I'm trying to achieve would look like this where "CustomerString" would be the concatenation of the CustomerID values:

    LastNameFirstNameRecordCountMinCustomerIDCustomerString

    FordSam1 33

    KnoxKatie244,5

    SmithJohn211,2

    I can get the first four columns easily enough and have been trying variations of STUFF FOR XML PATH but the GROUP BY CustomerID requirement is expanding the result set.

    SELECT T1.LastName,T1.FirstName,

    COUNT(T1.CustomerID) AS RecordCount,MIN(T1.CustomerID) AS MinCustomerID,

    STUFF((SELECT DISTINCT ', ' + CONVERT(NVARCHAR,T2.CustomerID) FROM @data T2 WHERE T1.CustomerID = T2.CustomerID FOR XML PATH('')),1,2,'') AS CustomerString

    FROM @data T1

    GROUP BY T1.CustomerID,T1.LastName,T1.FirstName

    ORDER BY T1.LastName,T1.FirstName

    Any help would be greatly appreciated!

  • Try this article for some assistance

    http://jasonbrimhall.info/2011/03/21/dedupe-data-cte/

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Coupla things. You needed to stop grouping by the customer ID in the Group BY and you needed to have the REPLACE/STUFF component work off FirstName/LastName combination instead of the CustomerID so it'd group properly.

    Try this:

    SELECT T1.LastName, T1.FirstName,

    COUNT(T1.CustomerID) AS RecordCount,

    MIN(T1.CustomerID) AS MinCustomerID,

    REPLACE((SELECT DISTINCT CONVERT(NVARCHAR,T2.CustomerID) + ','

    FROM @Data T2

    WHERE T1.FirstName = T2.FirstName

    AND t1.LastName = T2.LastName

    FOR XML PATH(''))+ '$' , ',$', '') AS CustomerString

    FROM @Data T1

    GROUP BY T1.LastName,T1.FirstName

    ORDER BY T1.LastName,T1.FirstName


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Perfect, thanks so much for the quick response!

  • TheRealDJ (1/20/2012)


    Perfect, thanks so much for the quick response!

    My pleasure, thanks for setting up sample data and structure and showing your work so we could help you quickly and efficiently. 🙂 See ya next time!


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • An alternative, that I find easier to read (and write!):

    DECLARE @Data TABLE

    (

    CustomerID integer IDENTITY(1,1) PRIMARY KEY,

    LastName nvarchar(100) NOT NULL,

    FirstName nvarchar(100) NOT NULL

    );

    INSERT @Data

    (LastName, FirstName)

    VALUES

    ('Smith', 'John'),

    ('Smith', 'John'),

    ('Ford', 'Sam'),

    ('Knox', 'Katie'),

    ('Knox', 'Katie');

    SELECT

    d.LastName,

    d.FirstName,

    RecordCount = COUNT_BIG(*),

    MinCustomerID = MIN(d.CustomerID),

    CustomerString = dbo.Concatenate(d.CustomerID)

    FROM @Data AS d

    GROUP BY

    d.LastName, d.FirstName

    ORDER BY

    d.LastName, d.FirstName;

    Uses:

    CREATE ASSEMBLY [Concatenation]

    AUTHORIZATION [dbo]

    FROM 

    WITH PERMISSION_SET = SAFE;

    CREATE AGGREGATE [dbo].[Concatenate] (@Value [int])

    RETURNS [nvarchar](4000)

    EXTERNAL NAME [Concatenation].[Concatenate];

    Source:

    using System.Collections.Generic;

    using System.Data.SqlTypes;

    using System.IO;

    using System.Text;

    using Microsoft.SqlServer.Server;

    [SqlUserDefinedAggregate

    (

    Format.UserDefined,

    IsInvariantToDuplicates = false,

    IsInvariantToNulls = true,

    IsInvariantToOrder = true,

    IsNullIfEmpty = true,

    MaxByteSize = -1

    )

    ]

    public struct Concatenate : IBinarySerialize

    {

    private List<int> contents;

    public void Init()

    {

    contents = new List<int>();

    }

    public void Accumulate(SqlInt32 Value)

    {

    if (!Value.IsNull)

    {

    contents.Add(Value.Value);

    }

    }

    public void Merge(Concatenate Group)

    {

    contents.AddRange(Group.contents);

    }

    public SqlString Terminate()

    {

    if (contents.Count == 0)

    {

    return new SqlString();

    }

    else

    {

    contents.Sort();

    var sb = new StringBuilder();

    contents.ForEach(element => sb.Append(',').Append(element));

    return new SqlString(sb.ToString(1, sb.Length - 1));

    }

    }

    void IBinarySerialize.Write(BinaryWriter w)

    {

    w.Write(contents.Count);

    contents.ForEach(element => w.Write(element));

    }

    void IBinarySerialize.Read(BinaryReader r)

    {

    int count = r.ReadInt32();

    contents = new List<int>(count);

    for (int i = 0; i < count; i++)

    {

    contents.Add(r.ReadInt32());

    }

    }

    }

  • Here is another option in the SQLCLR-space for concatenating grouped strings:

    GROUP_CONCAT string aggregate for SQL Server[/url]

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply