January 20, 2012 at 2:17 pm
Hi all - I've been tasked with de-duplicating some customer data and am having difficultly getting the query right, I'm hoping someone can help me out. I have a table that looks something like this:
DECLARE @data TABLE (CustomerID INT IDENTITY(1,1), LastName NVARCHAR(100), FirstName NVARCHAR(100))
INSERT INTO @data (LastName,FirstName)
SELECT 'Smith','John'
UNION ALL
SELECT 'Smith','John'
UNION ALL
SELECT 'Ford','Sam'
UNION ALL
SELECT 'Knox','Katie'
UNION ALL
SELECT 'Knox','Katie'
The output I'm trying to achieve would look like this where "CustomerString" would be the concatenation of the CustomerID values:
LastNameFirstNameRecordCountMinCustomerIDCustomerString
FordSam1 33
KnoxKatie244,5
SmithJohn211,2
I can get the first four columns easily enough and have been trying variations of STUFF FOR XML PATH but the GROUP BY CustomerID requirement is expanding the result set.
SELECT T1.LastName,T1.FirstName,
COUNT(T1.CustomerID) AS RecordCount,MIN(T1.CustomerID) AS MinCustomerID,
STUFF((SELECT DISTINCT ', ' + CONVERT(NVARCHAR,T2.CustomerID) FROM @data T2 WHERE T1.CustomerID = T2.CustomerID FOR XML PATH('')),1,2,'') AS CustomerString
FROM @data T1
GROUP BY T1.CustomerID,T1.LastName,T1.FirstName
ORDER BY T1.LastName,T1.FirstName
Any help would be greatly appreciated!
January 20, 2012 at 2:21 pm
Try this article for some assistance
http://jasonbrimhall.info/2011/03/21/dedupe-data-cte/
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
January 20, 2012 at 2:26 pm
Coupla things. You needed to stop grouping by the customer ID in the Group BY and you needed to have the REPLACE/STUFF component work off FirstName/LastName combination instead of the CustomerID so it'd group properly.
Try this:
SELECT T1.LastName, T1.FirstName,
COUNT(T1.CustomerID) AS RecordCount,
MIN(T1.CustomerID) AS MinCustomerID,
REPLACE((SELECT DISTINCT CONVERT(NVARCHAR,T2.CustomerID) + ','
FROM @Data T2
WHERE T1.FirstName = T2.FirstName
AND t1.LastName = T2.LastName
FOR XML PATH(''))+ '$' , ',$', '') AS CustomerString
FROM @Data T1
GROUP BY T1.LastName,T1.FirstName
ORDER BY T1.LastName,T1.FirstName
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
January 20, 2012 at 2:31 pm
Perfect, thanks so much for the quick response!
January 20, 2012 at 2:38 pm
TheRealDJ (1/20/2012)
Perfect, thanks so much for the quick response!
My pleasure, thanks for setting up sample data and structure and showing your work so we could help you quickly and efficiently. 🙂 See ya next time!
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
January 22, 2012 at 8:01 pm
An alternative, that I find easier to read (and write!):
DECLARE @Data TABLE
(
CustomerID integer IDENTITY(1,1) PRIMARY KEY,
LastName nvarchar(100) NOT NULL,
FirstName nvarchar(100) NOT NULL
);
INSERT @Data
(LastName, FirstName)
VALUES
('Smith', 'John'),
('Smith', 'John'),
('Ford', 'Sam'),
('Knox', 'Katie'),
('Knox', 'Katie');
SELECT
d.LastName,
d.FirstName,
RecordCount = COUNT_BIG(*),
MinCustomerID = MIN(d.CustomerID),
CustomerString = dbo.Concatenate(d.CustomerID)
FROM @Data AS d
GROUP BY
d.LastName, d.FirstName
ORDER BY
d.LastName, d.FirstName;
Uses:
CREATE ASSEMBLY [Concatenation]
AUTHORIZATION [dbo]
FROM 
WITH PERMISSION_SET = SAFE;
CREATE AGGREGATE [dbo].[Concatenate] (@Value [int])
RETURNS [nvarchar](4000)
EXTERNAL NAME [Concatenation].[Concatenate];
Source:
using System.Collections.Generic;
using System.Data.SqlTypes;
using System.IO;
using System.Text;
using Microsoft.SqlServer.Server;
[SqlUserDefinedAggregate
(
Format.UserDefined,
IsInvariantToDuplicates = false,
IsInvariantToNulls = true,
IsInvariantToOrder = true,
IsNullIfEmpty = true,
MaxByteSize = -1
)
]
public struct Concatenate : IBinarySerialize
{
private List<int> contents;
public void Init()
{
contents = new List<int>();
}
public void Accumulate(SqlInt32 Value)
{
if (!Value.IsNull)
{
contents.Add(Value.Value);
}
}
public void Merge(Concatenate Group)
{
contents.AddRange(Group.contents);
}
public SqlString Terminate()
{
if (contents.Count == 0)
{
return new SqlString();
}
else
{
contents.Sort();
var sb = new StringBuilder();
contents.ForEach(element => sb.Append(',').Append(element));
return new SqlString(sb.ToString(1, sb.Length - 1));
}
}
void IBinarySerialize.Write(BinaryWriter w)
{
w.Write(contents.Count);
contents.ForEach(element => w.Write(element));
}
void IBinarySerialize.Read(BinaryReader r)
{
int count = r.ReadInt32();
contents = new List<int>(count);
for (int i = 0; i < count; i++)
{
contents.Add(r.ReadInt32());
}
}
}
April 20, 2012 at 1:32 pm
Viewing 7 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply