November 7, 2024 at 6:06 pm
Hello T-SQL experts
I have a table containing team codes and descriptions. Unfortunately, many of the team codes are duplicated, often with different descriptions.
I want to deduplicate the table (in T-SQL) by selecting the shortest description (in number of characters) for each team code. How do I do this in a non-manual fashion?
I could obviously group in the team code, but there is no suitable aggregate function that I can think of for pulling out the shortest description in each group. MIN() won’t work as it sorts (e.g.) ‘aa’ before ‘b’. Also, T-SQL has no FIRST() or LAST() aggregate function.
Any ideas anyone?
Yours hopefully
Mark Dalley
November 7, 2024 at 6:33 pm
If your version of SQL has use of FIRST_VALUE, you could try this:
SELECT DISTINCT team_code, /* , ...*/
FIRST_VALUE(description) OVER(PARTITION BY team_code ORDER BY LEN(description)) AS description
FROM dbo.table_name
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
November 7, 2024 at 7:02 pm
Maybe something like this?
declare @Teams table (
TeamCode varchar(10),
Description varchar(30)
)
INSERT INTO @Teams (TeamCode, Description)
VALUES
('AC', 'Action Committee'),
('AC', 'Active Cowards'),
('AC', 'Attentive Collegeboys'),
('AC', 'Act. Com.'),
('WC', 'World-class Citizens'),
('WC', 'World Citizens'),
('WC', 'Wretched Cowards'),
('WC', 'Wicked Consultants')
;
WITH TeamData AS (
SELECT
TeamCode,
Description,
MIN(RIGHT(SPACE(10)+CAST(LEN(Description) AS VARCHAR(10)),10)+Description) OVER(PARTITION BY TeamCode) AS NewDescription
FROM @Teams)
select
TeamCode,
Description,
substring(NewDescription,11,255) AS NewDescription
from TeamData
TeamCode Description NewDescription
---------- ------------------------------ ----------------------------------------
AC Action Committee Act. Com.
AC Active Cowards Act. Com.
AC Attentive Collegeboys Act. Com.
AC Act. Com. Act. Com.
WC World-class Citizens World Citizens
WC World Citizens World Citizens
WC Wretched Cowards World Citizens
WC Wicked Consultants World Citizens
(8 rows affected)
November 7, 2024 at 9:19 pm
I love a good reason to use any of the widow functions. Thank you to Kaj for some sample data. I can't tell you how many times I've had to dedupe and have leveraged this.
DECLARE @Teams TABLE (
TeamCode VARCHAR(10),
Description VARCHAR(30)
)
INSERT INTO @Teams (TeamCode, Description)
VALUES
('AC', 'Action Committee'),
('AC', 'Active Cowards'),
('AC', 'Attentive Collegeboys'),
('AC', 'Act. Com.'),
('WC', 'World-class Citizens'),
('WC', 'World Citizens'),
('WC', 'Wretched Cowards'),
('WC', 'Wicked Consultants')
;
-- Show how the data is grouped and numbered
SELECT *,
RowNum = ROW_NUMBER() OVER (PARTITION BY TeamCode ORDER BY LEN([Description]))
FROM @Teams
-- Remove anything with RowNum > 1
; WITH Dupes AS
(SELECT *,
RowNum = ROW_NUMBER() OVER (PARTITION BY TeamCode ORDER BY LEN([Description]))
FROM @Teams
)
DELETE FROM Dupes
WHERE RowNum > 1
-- Final result
SELECT *
FROM @Teams
November 7, 2024 at 10:22 pm
If you have any empty strings you need to ensure they are not chosen as the shortest description.
SELECT TeamCode, Description,
RowNum = ROW_NUMBER() OVER (PARTITION BY TeamCode
ORDER BY IIF(TRIM(ISNULL(TeamCode,'')) = '', 1000,LEN(TeamCode)))
FROM @Teams
November 8, 2024 at 1:37 am
Ha, I thought that the answer might involve window functions. I have seen them used but hardly used them myself.
After posting this question I came across Kathi Kellenberger's article here. Following her lead I arrived at a bit of code exactly analogous to the approach used by SoCal_DBD of deleting duplicates via a CTE. (Though I think you meant to call them window functions).
Delightful bit of test data there, kaj! And thanks Scott for flagging up FIRST_VALUE. I have a recent SQL Server version so it works fine. Also good shout Ed for your comment about empty strings. Fortunately I didn't have any, but next time, who knows.
Here's to the great community on SSC -- cheers!
MarkD
November 11, 2024 at 2:05 pm
As it happens, a lookup table is precisely what I am creating.
The descriptions are supposed to be unique. If the originating organisation puts in duplicates because they cannot decide on a single description for their team, and I choose the wrong one, that is on them.
Fortunately I am in touch with said organisation, albeit indirectly. I just needed an interim fix for what I hope is an interim issue.
I have a vision of perfect data sometime in the indefinite future.
MarkD
November 11, 2024 at 2:18 pm
As it happens, a lookup table is precisely what I am creating.
The descriptions are supposed to be unique. If the originating organisation puts in duplicates because they cannot decide on a single description for their team, and I choose the wrong one, that is on them.
Fortunately I am in touch with said organisation, albeit indirectly. I just needed an interim fix for what I hope is an interim issue.
I have a vision of perfect data sometime in the indefinite future.
MarkD
Good luck mate, still think that there will be some manual input required.
Viewing 9 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy