Efficient way to gat all unique values and their counts for a dynamic list of fields using one query statement

Question

Efficient way to gat all unique values and their counts for a dynamic list of fields using one query statement

Tim Kovacich

SSC Enthusiast

Points: 145
More actions
December 17, 2015 at 9:18 pm

#314260

I am looking for an efficient way of getting a count of unique values for many fields from a table hopefully in one EFFICIENT query.
I am successfully getting the data I need right now in my prototype however, it is very inefficient. I am passing millions of rows and it takes about 1 minute to run all of the individual queries. I am sure there is a better way to accomplish what I am doing but I am not sure how best to tackle this.
Here is what I am doing now.
I have a table that contains a list of approx. 20 different fields that I need to get counts of unique values. Let’s just look at 3 of them.
So my “Driver” table has the list of fields, let’s say they are:
Region
State
City
I extract those fields one by one from the “Driver table” using a cursor (I know, performance stinks) and then gather the totals, inserting them into a temp table using a dynamic SQL statement within a stored procedure. So essentially it looks like this
Select FieldName from DriverTable using cursor
Select ‘Field from Driver Table’ as ‘Field Name’, ‘Field from Driver Table’ , count(*)
From Customer
Where ‘Field from Driver Table’ is not null and status = ‘Active’
and date = curdate()
Group by ‘Field from Driver Table’
Get next Field From Driver Table’
So I end up with a temp table that has all of the unique values for:
Region and their counts
State and their counts
City and their counts
Etc.
As you can imagine, looping through the cursor and running each of these individual queries is slow.
Is there a better way?

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Jason A. Long SSC-Insane Points: 23790 More actions · Answer 1

Tim Kovacich (12/17/2015)
I am looking for an efficient way of getting a count of unique values for many fields from a table hopefully in one EFFICIENT query.
I am successfully getting the data I need right now in my prototype however, it is very inefficient. I am passing millions of rows and it takes about 1 minute to run all of the individual queries. I am sure there is a better way to accomplish what I am doing but I am not sure how best to tackle this.
Here is what I am doing now.
I have a table that contains a list of approx. 20 different fields that I need to get counts of unique values. Let’s just look at 3 of them.
So my “Driver” table has the list of fields, let’s say they are:
Region
State
City
I extract those fields one by one from the “Driver table” using a cursor (I know, performance stinks) and then gather the totals, inserting them into a temp table using a dynamic SQL statement within a stored procedure. So essentially it looks like this
Select FieldName from DriverTable using cursor
Select ‘Field from Driver Table’ as ‘Field Name’, ‘Field from Driver Table’ , count(*)
From Customer
Where ‘Field from Driver Table’ is not null and status = ‘Active’
and date = curdate()
Group by ‘Field from Driver Table’
Get next Field From Driver Table’
So I end up with a temp table that has all of the unique values for:
Region and their counts
State and their counts
City and their counts
Etc.
As you can imagine, looping through the cursor and running each of these individual queries is slow.
Is there a better way?

Does your "driver table" change on a regular basis? If not, you're making it WAY harder than it need to be. Something as simple as the following will give you what you're after...

SELECT

Region = COUNT(DISTINCT c.Region),

State = COUNT(DISTINCT c.State),

City = COUNT(DISTINCT c.City)

FROM

Customer c

If the required columns are changing on a regular basis and this something that needs to be done on an automated basis, then the above query can be built on the fly using dynamic sql.

Just note that you can build the entire query and simply execute it once rather that one column at a time...

SET NOCOUNT ON;

IF OBJECT_ID('tempdb..#DriverTable', 'U') IS NOT NULL

DROP TABLE #DriverTable;

CREATE TABLE #DriverTable (

ID INT,

SchemaName SYSNAME,

TableName SYSNAME,

ColumnName SYSNAME

);

INSERT #DriverTable (ID, SchemaName, TableName, ColumnName) VALUES

(1, 'dbo', 'Customer', 'Region'), (2, 'dbo', 'Customer', 'State'), (3, 'dbo', 'Customer', 'City');

DECLARE @sql VARCHAR(8000) = 'SELECT

' ;

SELECT @sql = @sql +

STUFF (

(SELECT

',' + dt.ColumnName + ' = COUNT(DISTINCT ' + dt.TableName + '.' + dt.ColumnName + CHAR(10)

FROM

#DriverTable dt

ORDER BY

dt.ID

FOR XML PATH ('')), 1, 1, '');

SELECT @sql = @sql + 'FROM ' + MIN(dt.SchemaName) + '.' + MIN(dt.TableName)

FROM #DriverTable dt;

PRINT @sql;

--EXEC (@sql);

Printed dynamic sql...

SELECT

Region = COUNT(DISTINCT Customer.Region

,State = COUNT(DISTINCT Customer.State

,City = COUNT(DISTINCT Customer.City

FROM dbo.Customer

Tim Kovacich SSC Enthusiast Points: 145 More actions · Answer 2

Thank you for the quick turnaround. Sorry, I was not clear .

The results I get with this suggestion are:

RegionState City

6 3 6

What I am after is more like this:

From the Region Query, the expected result would look something like:

Field value Count

SouthWest 8

North 9

South 10

From the State Query, the expected result would look something like:

Field value Count

New York 60

Texas 4

Arizona 10

etc.

Also, to answer your question, yes the field names are dynamic

Thanks

ScottPletcher SSC Guru Points: 101207 More actions · Answer 3

I'd create indexes on all the columns you need to count that way. Then the COUNT query can use the existing indexes to give you a very quick totals.

If you don't want to do that, I'd probably just cursor once thru every row in the table getting all the counts at once dynamically based on the control table contents.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jeff Moden SSC Guru Points: 1004664 More actions · Answer 4

Something like this is pretty easy if you know about a personal friend of mine... the SQL_Variant datatype. It can be a bit of a fair weather friend so read up on it before you use it in other places.

Before we start, we need a typical million row table to test with. I only made 6 columns but it'll give you a feel for what's possible. Not to worry... it only takes 3 seconds to build on my humble laptop.

--=======================================================================================

-- Create and populate a million row test table with random data

-- This is NOT a part of the solution. It's just a table to demo with.

--=======================================================================================

--===== Do this in a nice, safe place that everyone has.

USE tempdb

;

--===== If the test table exists, drop it to make reruns in SSMS easier.

IF OBJECT_ID('tempdb.dbo.JBMTest','U') IS NOT NULL

DROP TABLE dbo.JBMTest

;

--===== Create and populate the table on-the-fly.

SELECT TOP 1000000

SomeID = IDENTITY(INT,1,1)

,ColA = CHAR(ABS(CHECKSUM(NEwID()))% 5+65)+CHAR(ABS(CHECKSUM(NEwID()))%5+65)

,ColB = CHAR(ABS(CHECKSUM(NEwID()))%10+65)+CHAR(ABS(CHECKSUM(NEwID()))%5+65)

,ColC = CHAR(ABS(CHECKSUM(NEwID()))% 7+65)+CHAR(ABS(CHECKSUM(NEwID()))%5+65)

,ColD = ABS(CHECKSUM(NEwID()))% 6+1

,ColE = CAST(ABS(CHECKSUM(NEwID()))%30+10500 AS CHAR(5)) --ZipCode Simulation

INTO dbo.JBMTest

FROM sys.all_columns ac1

CROSS JOIN sys.all_columns ac2

;

--===== Let's see the datatypes for the table.

-- It'll handle a lot more than just CHAR and INT, though.

EXEC sp_help 'dbo.JBMTest'

;

Now comes the fun part. Basically, we'll create a dynamic "unpivot" using the column names from the table (sans those columns with the wrong datatypes and attributes for this) using a CROSS APPLY to form a nice, healthy, NVP (Name Value Pair) table and then simply do a normal count by ColumnName (attribute) and value.

Most of the details are in the comments in the code below. This only takes about 6 seconds on my laptop.

--=======================================================================================

-- Now we get to a solution.

--=======================================================================================

--===== Declare some obviously named variables.

-- The first one could be a parameter for a stored procedure.

DECLARE @TableName SYSNAME = N'[dbo].[JBMTest]'

,@SQL VARCHAR(MAX)

,@SQLCA VARCHAR(MAX)

;

--===== Create the CROSS APPLY code from the non-Identity, non-Blob columns.

-- The CROSS APPLY will be used to "unpivot" the data.

-- Order is NOT important here.

SELECT @SQLCA = ISNULL(@SQLCA + ' UNION ALL','')

+ REPLACE(REPLACE('

SELECT "<<name>>",CAST(<<name>> AS SQL_VARIANT)'

,'"','''')

,'<<name>>',QUOTENAME(name))

FROM sys.columns

WHERE object_id = (OBJECT_ID(@TableName))

AND is_identity = 0 --No IDENTITY columns allowed. Makes no sense.

AND is_rowguidcol = 0 --No ROWGUID columns either.

AND max_length BETWEEN 1 AND 8000 --Rejects blobs and SQL_Variant

AND system_type_id NOT IN ( --=== Reject other incompatible types

SELECT system_type_id

FROM sys.types

WHERE name IN ('hierarchyid','geometry','geography','timestamp')

)

;

--===== Add the CROSS APPLY code to the rest of the code

SELECT @SQL = '

SELECT ca.Attribute, ca.Value, Occurances = COUNT(*)

FROM ' + @TableName + '

CROSS APPLY ('

+ @SQLCA + '

) ca (Attribute, Value)

GROUP BY ca.Attribute,ca.Value

ORDER BY ca.Attribute,ca.Value

;'

;

--===== Print the code and then execute it

PRINT @SQL;

EXEC (@SQL)

;

Let me know if that's even close to what you need.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Tim Kovacich SSC Enthusiast Points: 145 More actions · Answer 5

Hi Jeff,

I just wanted to let you know I love your personal friend. The idea you sent was an excellent suggestion. I went from my original post which was taking approx. 1 minute to run, to your post which takes approx. 9 seconds. In addition, my one minute version was tested using approx. 10 fields and your model was for the majority of the table (minus the inappropriate data types) so approx. 150 fields. Just a huge performance increase!! Thank you very much!!

Jeff Moden SSC Guru Points: 1004664 More actions · Answer 6

Tim Kovacich (1/7/2016)
Hi Jeff,
I just wanted to let you know I love your personal friend. The idea you sent was an excellent suggestion. I went from my original post which was taking approx. 1 minute to run, to your post which takes approx. 9 seconds. In addition, my one minute version was tested using approx. 10 fields and your model was for the majority of the table (minus the inappropriate data types) so approx. 150 fields. Just a huge performance increase!! Thank you very much!!

Very glad I could help and thank you very much for the feedback. I love it when people post back some performance stats whether they're good or bad because I ran a test on test data and you ran a test on a real life situation. Your feedback has helped ME tremendously and that, of course, will be used to help others.

Thanks again, Tim.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)