December 18, 2016 at 7:45 pm
I have a table in my database that has some product dimension information, however some of it is duplicated in a way and I want to eliminate the duplicates. The problem is that the duplication is not exact, so I've not been able to figure out a way to do this.
As an example, I queried the following results:
In this example, using only one ProductId, you can see Depth=9.00 for both ObjectId 17555 and 37575. However, that same Depth=9.00 is also included in the entry where Width=20.00 for the same ObjectId. I would like to figure out some sort of script that can check to see if the single field of the Depth is included already in another row with more dimensional information (Width=20 & Depth=9). If so, then delete the line that has only Depth=9.
In other words, the last column in the example is a CheckNumber that shows if there is only 1 dimension in that row or not. If there is only 1 dimension in that ObjectId row & that same dimension is listed in a row with the same ObjectId and MORE, i.e., CheckNumber>1, kill the row with the single dimension.
My SQL for this query so far is:
select m.ProductId,m.ObjectId,Height,Width,Depth,Length,[Height (Min)],[Height (Max)],[Width (Min)],[Width (Max)],[Depth (Min)],[Depth (Max)],[Length (Min)],[Length (Max)], CASE WHEN height IS NULL THEN 0 ELSE 1 END+CASE WHEN width IS NULL THEN 0 ELSE 1 END+CASE WHEN depth IS NULL THEN 0 ELSE 1 END+CASE WHEN length IS NULL THEN 0 ELSE 1 END+CASE WHEN [height (min)] IS NULL THEN 0 ELSE 1 END
+CASE WHEN [height (max)] IS NULL THEN 0 ELSE 1 END+CASE WHEN [width (min)] IS NULL THEN 0 ELSE 1 END+CASE WHEN [width (max)] IS NULL THEN 0 ELSE 1 END+CASE WHEN [depth (min)] IS NULL THEN 0 ELSE 1 END+CASE WHEN [depth (max)] IS NULL THEN 0 ELSE 1 END
+CASE WHEN [length (min)] IS NULL THEN 0 ELSE 1 END+CASE WHEN [length (max)] IS NULL THEN 0 ELSE 1 END as CheckNumber
from EuroLuxProductBE.dbo.pdt_multidimensions m
inner join
(select productid,objectid
from EuroLuxProductBE.dbo.pdt_MultiDimensions
group by productid, objectid
having count(*)>1) multidups
on m.productid=multidups.ProductId
and m.ObjectId=multidups.ObjectId
where m.ProductId=125319
Any help would be appreciated. I just can't get my head around this one, and I have 1000s of records like this and don't want to have to spend tons of time deleting them manually.
Thanks!
December 19, 2016 at 2:06 am
Try ROW_NUMBER(). Comment out the outer WHERE clause to see how all rows are numbered:
SELECT *
FROM (
SELECT
*,
rn = ROW_NUMBER() OVER(PARTITION BY ProductId, ObjectId ORDER BY CheckNumber DESC)
FROM EuroLuxProductBE.dbo.pdt_multidimensions m
CROSS APPLY (
SELECT CheckNumber =
CASE WHEN height IS NULL THEN 0 ELSE 1 END +
CASE WHEN width IS NULL THEN 0 ELSE 1 END +
CASE WHEN depth IS NULL THEN 0 ELSE 1 END +
CASE WHEN [length] IS NULL THEN 0 ELSE 1 END +
CASE WHEN [height (min)] IS NULL THEN 0 ELSE 1 END +
CASE WHEN [height (max)] IS NULL THEN 0 ELSE 1 END +
CASE WHEN [width (min)] IS NULL THEN 0 ELSE 1 END +
CASE WHEN [width (max)] IS NULL THEN 0 ELSE 1 END +
CASE WHEN [depth (min)] IS NULL THEN 0 ELSE 1 END +
CASE WHEN [depth (max)] IS NULL THEN 0 ELSE 1 END +
CASE WHEN [length (min)] IS NULL THEN 0 ELSE 1 END +
CASE WHEN [length (max)] IS NULL THEN 0 ELSE 1 END
) x
WHERE m.ProductId = 125319
) d
WHERE rn > 1
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
December 19, 2016 at 8:53 am
Aggregate functions automatically exclude NULL values, so this can be simplified.
SELECT *
FROM (
SELECT
*,
rn = ROW_NUMBER() OVER(PARTITION BY ProductId, ObjectId ORDER BY CheckNumber DESC)
FROM EuroLuxProductBE.dbo.pdt_multidimensions m
CROSS APPLY (
SELECT COUNT(*) AS CheckNumber
FROM (
VALUES
(height),
(width),
(depth),
([length]),
([height (min)]),
([height (max)]),
([width (min)]),
([width (max)]),
([depth (min)]),
([depth (max)]),
([length (min)]),
([length (max)]
) x
WHERE m.ProductId = 125319
) d
WHERE rn > 1
Although I prefer using CTEs rather than derived tables.
Drew
Edit: forgot to include column alias.
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
December 27, 2016 at 3:13 pm
Thanks to both of you! I'll check it out tomorrow. I had to run out of town unexpectedly for a family funeral, so sorry for the late reply.
December 28, 2016 at 2:49 am
drew.allen (12/19/2016)
Aggregate functions automatically exclude NULL values, so this can be simplified.
SELECT *
FROM (
SELECT
*,
rn = ROW_NUMBER() OVER(PARTITION BY ProductId, ObjectId ORDER BY CheckNumber DESC)
FROM EuroLuxProductBE.dbo.pdt_multidimensions m
CROSS APPLY (
SELECT COUNT(*) AS CheckNumber
FROM (
VALUES
(height),
(width),
(depth),
([length]),
([height (min)]),
([height (max)]),
([width (min)]),
([width (max)]),
([depth (min)]),
([depth (max)]),
([length (min)]),
([length (max)]
) x
WHERE m.ProductId = 125319
) d
WHERE rn > 1
Although I prefer using CTEs rather than derived tables.
Drew
Edit: forgot to include column alias.
As it stands Drew this won't work - from BOL:
"COUNT(*) does not require an expression parameter because, by definition, it does not use information about any particular column. COUNT(*) returns the number of rows in a specified table without getting rid of duplicates. It counts each row separately. This includes rows that contain null values."
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
December 28, 2016 at 6:12 am
Note also that
CASE WHEN height IS NULL THEN 0 ELSE 1 END
may be shortened to
IIF(Height IS NULL, 0, 1)
If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.
December 28, 2016 at 6:15 am
Phil Parkin (12/28/2016)
Note also thatCASE WHEN height IS NULL THEN 0 ELSE 1 END
may be shortened to
IIF(Height IS NULL, 0, 1)
It's shortened...but is it fastened too? :-))
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
December 28, 2016 at 8:23 am
ChrisM@Work (12/28/2016)
drew.allen (12/19/2016)
Aggregate functions automatically exclude NULL values, so this can be simplified.
SELECT *
FROM (
SELECT
*,
rn = ROW_NUMBER() OVER(PARTITION BY ProductId, ObjectId ORDER BY CheckNumber DESC)
FROM EuroLuxProductBE.dbo.pdt_multidimensions m
CROSS APPLY (
SELECT COUNT(*) AS CheckNumber
FROM (
VALUES
(height),
(width),
(depth),
([length]),
([height (min)]),
([height (max)]),
([width (min)]),
([width (max)]),
([depth (min)]),
([depth (max)]),
([length (min)]),
([length (max)]
) x
WHERE m.ProductId = 125319
) d
WHERE rn > 1
Although I prefer using CTEs rather than derived tables.
Drew
Edit: forgot to include column alias.
As it stands Drew this won't work - from BOL:
"COUNT(*) does not require an expression parameter because, by definition, it does not use information about any particular column. COUNT(*) returns the number of rows in a specified table without getting rid of duplicates. It counts each row separately. This includes rows that contain null values."
Of course, you're right. If the OP had provided consumable data, I would have been able to test. COUNT will still work if you specify the column rather than using *.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply