How To DISTINCT COUNT with Windows Functions (i.e. OVER and PARTITION BY)

Question

How To DISTINCT COUNT with Windows Functions (i.e. OVER and PARTITION BY)

YSLGuru

SSC-Insane

Points: 21632
More actions
May 21, 2013 at 11:09 am

#274098

SQL 2012 Reference = http://msdn.microsoft.com/en-us/library/ms175997(v=sql.110).aspx
I'm trying to get a unique count of rows in a child table in a multi-table join just like the example shown below from the above Microsoft article on COUNT(). I'm using the Windows functions (as opposed to old fashion aggregation with GROUP BY) because I need both aggregate data and non-aggregated data to be included in the results.
In the below Example (from example C in the above Microsoft Document on T-SQL) it shows that you can do the following
USE AdventureWorks2012;
GO
SELECT
DISTINCT Name,
COUNT(edh.BusinessEntityID) OVER (PARTITION BY edh.DepartmentID) AS EmployeesPerDept
FROM HumanResources.EmployeePayHistory AS eph
JOIN HumanResources.EmployeeDepartmentHistory AS edh
ON eph.BusinessEntityID = edh.BusinessEntityID
JOIN HumanResources.Department AS d
ON d.DepartmentID = edh.DepartmentID
WHERE edh.EndDate IS NULL
ORDER BY Name;
This is using the DISTINCT keyword so as to get a distinct count where as without it the Count() coould be over stated by the number of rows retruend by the other tables in the join.
Is there a way to do the equivelant in T-SQL for SQL Server 2008R2? I've tried the above and it errors out under SQL 2008R2 and examplce C above is NOT in the same Microsoft article when you change the platform to SQL 2008R2 so I'm assuming this is a feature nehancement in SQL 2012. How then can you do this under SQL 2008R2?
Thanks
Kindest Regards,
Just say No to Facebook!

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Lowell SSC Guru Points: 323504 More actions · Answer 1

YSL I think i'd tackle it by changing one of the tables into a simple GROUP BY, that returns the distinct count inside it;

something like this is syntactically correct, but i didn't have the database aviaalble for testing.

SELECT

DISTINCT eph.Name,

edh.CNT AS EmployeesPerDept

FROM HumanResources.EmployeePayHistory AS eph

JOIN (SELECT DepartmentID, COUNT(DISTINCT BusinessEntityID) AS CNT FROM HumanResources.EmployeeDepartmentHistory) AS edh

ON eph.BusinessEntityID = edh.BusinessEntityID

JOIN HumanResources.Department AS d

ON d.DepartmentID = edh.DepartmentID

WHERE edh.EndDate IS NULL

ORDER BY eph.Name;

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

YSLGuru SSC-Insane Points: 21632 More actions · Answer 2

Thanks for the suggesting Lowell. Thats what I've been working on as a work-a-round but I was hoping there might be a better answer. I must admit I was suprised that I could not find any other posts in this on the forums. Since Microsoft added this capability to SQL 2012 I can only imagine it must have been something people have been asking for.

Thanks again.

Kindest Regards,

Just say No to Facebook!

dkschill Hall of Fame Points: 3375 More actions · Answer 3

What if you used a DENSE_RANK instead of a count (ordering by the columns you want a distinct count on) then wrapping that query with a select that grabs the MAX value of the DENSE_RANK.

I know this worked for me, but is it a valid solution?

DECLARE @DumbTable TABLE (cat CHAR(1), id TINYINT, rw TINYINT);

INSERT INTO @DumbTable SELECT 'A' as cat, 3 as id, 1 as rw

INSERT INTO @DumbTable SELECT 'A' as cat, 3 as id, 5 as rw

INSERT INTO @DumbTable SELECT 'A' as cat, 4 as id, 4 as rw

INSERT INTO @DumbTable SELECT 'B' as cat, 2 as id, 4 as rw

INSERT INTO @DumbTable SELECT 'C' as cat, 4 as id, 1 as rw

INSERT INTO @DumbTable SELECT 'C' as cat, 3 as id, 2 as rw

INSERT INTO @DumbTable SELECT 'C' as cat, 5 as id, 3 as rw

SELECT DISTINCT cat

, MAX(catDenseRank) OVER (PARTITION BY cat) AS idCount

, maxRw

FROM(

SELECT cat

, DENSE_RANK() OVER (PARTITION BY cat ORDER BY id) as catDenseRank

, MAX(rw) OVER (PARTITION BY cat) as maxRw

FROM @DumbTable

) AS dumbDataWrapper

On a side note....why can't you post DR*P TABLE or UNION [ALL] SELECT anymore...or is that just me.

Erland Sommarskog SSC-Insane Points: 24038 More actions · Answer 4

I'm not really sure what your problem is. The query you posted would run on SQL 2008 R2, if you only have the tables. When I try I get:

Msg 208, Level 16, State 1, Line 1

Invalid object name 'HumanResources.EmployeePayHistory'.

But I have long ago given up trying to keep up with all the AdventureWorks databases. However, the error message tells me that the syntax is good.

I'm not really sure that the query mirrors what you actually want to achieve. Rather than using AdventureWorks, maybe you could post a CREATE TABLE statement, an INSERT statement with sample data and the desired result to highlight what you are looking for?

[font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

Steven Willis SSCrazy Eights Points: 9893 More actions · Answer 5

I don't have adventure works installed either, so will make a guess. Does this work?

USE AdventureWorks2012;

GO

SELECT

Name

,(SELECT DISTINCT

COUNT(edh.BusinessEntityID) OVER (PARTITION BY edh.DepartmentID) AS TempCount

FROM

HumanResources.EmployeeDepartmentHistory)

AS EmployeesPerDept

,*

FROM HumanResources.EmployeePayHistory AS eph

JOIN HumanResources.EmployeeDepartmentHistory AS edh

ON eph.BusinessEntityID = edh.BusinessEntityID

JOIN HumanResources.Department AS d

ON d.DepartmentID = edh.DepartmentID

WHERE edh.EndDate IS NULL

ORDER BY Name;

Also, in a worse case scenario, you should be able to use the ROW_NUMBER windowing function in a nested SELECT and then add a WHERE clause to remove any duplicates after the fact.

Erland Sommarskog SSC-Insane Points: 24038 More actions · Answer 6

Steven Willis (7/31/2013)
SELECT
Name
,(SELECT DISTINCT
COUNT(edh.BusinessEntityID) OVER (PARTITION BY edh.DepartmentID) AS TempCount
FROM
HumanResources.EmployeeDepartmentHistory)
AS EmployeesPerDept

Looks like that would die with "subquery returned more than one row"...

[font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

ben.brugman SSChampion Points: 13369 More actions · Answer 7

Bit late 😀

I was looking for a similar question and found:DECLARE @DumbTable TABLE (cat CHAR(1), id TINYINT, rw TINYINT);

INSERT INTO @DumbTable SELECT 'A' as cat, 3 as id, 1 as rw

INSERT INTO @DumbTable SELECT 'A' as cat, 3 as id, 5 as rw

INSERT INTO @DumbTable SELECT 'A' as cat, 4 as id, 4 as rw

INSERT INTO @DumbTable SELECT 'B' as cat, 2 as id, 4 as rw

INSERT INTO @DumbTable SELECT 'C' as cat, 4 as id, 1 as rw

INSERT INTO @DumbTable SELECT 'C' as cat, 3 as id, 2 as rw

INSERT INTO @DumbTable SELECT 'C' as cat, 5 as id, 3 as rw

Code copied from previous anwser.

Solution:

--

-- Get a count of distinct id's partitioned by cat.

--

SELECT --DISTINCT -- Distinct can be switched on / off.

cat

, dense_rank() over (partition by cat order by id)

+ dense_rank() over (partition by cat order by id desc)

- 1 as count_distinct_id

FROM @DumbTable

dense_rank + dense_rank desc gives the position from the 'start' plus the position from the end, for a given set this gives a 'constant'. This 'constant' is one more than the number of distinct values. (For one value this is 1 + 1 - 1) For three rows with two values this gives (1 + 2 - 1 or 2 + 1 - 1)

Ben

Although the anwer is late, others searching and finding this thread might find this solution usefull.