September 22, 2017 at 7:22 am
Hi to all sql lovers.
Is there any way to get for each table and each column for example this:SELECT COUNT(DISTINCT ColumnName) AS UniqueValues FROM Table
?
I don't want to do it in some loop/cursor, because this will destroy the server. Is there any sys.stats similar table? Thank you...
September 22, 2017 at 8:29 am
You could do something like this and then copy and paste the SQL_TEXT column into SSMS for execution, and you could modify the WHERE clause to suit your objectives:SELECT C.TABLE_NAME, C.ORDINAL_POSITION, C.COLUMN_NAME, T.SQL_TEXT
FROM INFORMATION_SCHEMA.COLUMNS AS C
CROSS APPLY (
VALUES ('SELECT COUNT(DISTINCT ' + C.COLUMN_NAME + ' AS ' + C.COLUMN_NAME + ' FROM ' + C.TABLE_NAME + ';')
) AS T (SQL_TEXT)
WHERE C.TABLE_SCHEMA = 'dbo'
ORDER BY C.TABLE_NAME, C.ORDINAL_POSITION;
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
September 22, 2017 at 8:54 am
I made the same solution.
1st load all my necessary tables and columns
WITH
DimTables AS (
SELECT
t.[name] AS TableName
,s.[name] AS SchemaName
,t.[Object_ID] AS TableID
FROM
DWH_Staging.sys.tables t
JOIN DWH_Staging.sys.schemas s
ON s.[schema_id] = t.[schema_id]
WHERE
t.[schema_id] = 6 -- Only for dim. now
AND t.[temporal_type] = 2 -- Only non-temporal tables
)
,DimColumns AS (
SELECT
t.SchemaName AS SchemaName
,t.TableName AS TableName
,c.[name] AS ColumnName
FROM
DWH_Staging.sys.columns c
JOIN DimTables t ON t.TableID = c.[object_id]
WHERE c.[name] NOT IN ('BeginDate','EndDate','Created','Modified','CountryID','ActiveID','SourceID','ValidFromID','ValidToID','Domain')
)
-- Load to the temp table for evaluating
SELECT
c.SchemaName AS SchemaName
,c.TableName AS TableName
,c.ColumnName AS ColumnName
INTO #Columns
FROM DimColumns c
ORDER BY c.TableName
Then simple while and.... SET @sql = 'SELECT DISTINCT ' + @columnName + ' INTO #DistinctCounts FROM [DWH_Staging].[' + @schemaName + '].[' + @tableName + ']'
EXEC (@sql)
But it takes 20second.....works but slow 🙂
September 22, 2017 at 10:39 am
20 Seconds to count ALL the table's columns number of distinct values ? You might not have all that much data, or just not that many overall columns. I would have been happy if it ran in half an hour. That could be a LOT of work for the server to do. That said, however, I'm pretty sure it's not going to go any faster.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
September 22, 2017 at 10:43 am
povamartin - Friday, September 22, 2017 8:54 AMThen simple while and....SET @sql = 'SELECT DISTINCT ' + @columnName + ' INTO #DistinctCounts FROM [DWH_Staging].[' + @schemaName + '].[' + @tableName + ']'
EXEC (@sql)But it takes 20second.....works but slow 🙂
Just realized that code will run only for the first column and then every other column it will fail because the temp table already exists so SELECT INTO doesn't work. You could create the temp table as the first part of the @sql variable and then add the remaining statements, possibly including UNION ALL, as a single insert.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
September 22, 2017 at 11:14 am
How about we try the following:WITH DimTables AS (
SELECT t.[name] AS TableName
, s.[name] AS SchemaName
, t.[Object_ID] AS TableID
FROM DWH_Staging.sys.tables AS t
INNER JOIN DWH_Staging.sys.schemas AS s
ON s.[schema_id] = t.[schema_id]
WHERE t.[schema_id] = 6 -- Only for dim. now
AND t.[temporal_type] = 2 -- Only non-temporal tables
),
DimColumns AS (
SELECT t.SchemaName AS SchemaName
, t.TableName AS TableName
, c.[name] AS ColumnName
FROM DWH_Staging.sys.columns AS c
INNER JOIN DimTables AS t
ON t.TableID = c.[object_id]
WHERE c.[name] NOT IN ('BeginDate','EndDate','Created','Modified','CountryID','ActiveID','SourceID','ValidFromID','ValidToID','Domain')
)
-- Load to the temp table for evaluating
SELECT c.SchemaName AS SchemaName
, c.TableName AS TableName
, c.ColumnName AS ColumnName
INTO #Columns
FROM DimColumns AS c
ORDER BY c.TableName;
DECLARE @SQL AS varchar(max) =
'CREATE TABLE ##Results (ID int IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED, TableName varchar(128), ColumnName varchar(128), UniqueValues bigint);
INSERT INTO ##Results (TableName, ColumnName, UniqueValues)
';
SELECT @SQL = @SQL + 'SELECT ''' + C.TableName + ''' AS TableName, ''' + C.ColumnName + ''' COUNT(DISTINCT ' + C.ColumnName + ') AS UniqueValues
FROM ' + C.SchemaName + '.' + C.TableName +
CASE WHEN LEAD(C.TableName, 1, NULL) OVER (ORDER BY C.TableName, C.ColumnName) IS NOT NULL THEN ' UNION ALL ' ELSE '; SELECT * FROM ##Results;' END
FROM #Columns AS C
ORDER BY C.TableName, C.ColumnName;
--PRINT @SQL;
EXEC (@SQL);
Let me know what results that produces.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
September 25, 2017 at 6:06 am
sgmunson - Friday, September 22, 2017 10:43 AMpovamartin - Friday, September 22, 2017 8:54 AMThen simple while and....SET @sql = 'SELECT DISTINCT ' + @columnName + ' INTO #DistinctCounts FROM [DWH_Staging].[' + @schemaName + '].[' + @tableName + ']'
EXEC (@sql)But it takes 20second.....works but slow 🙂
Just realized that code will run only for the first column and then every other column it will fail because the temp table already exists so SELECT INTO doesn't work. You could create the temp table as the first part of the @sql variable and then add the remaining statements, possibly including UNION ALL, as a single insert.
I truncate the temp table at the start of cycle and that UPDATE number of unique values for specific Table and Column in other temp table, so the result work into #DistinctCount table always. We have a few tables now, cca 30 fact tables and 40 dimensions, about milion of rows of data in the fact tables, the whole DB is now about 70GB. But it will grow up lately by adding new countries (now is this DWH made only for Czech Republic).
September 25, 2017 at 6:21 am
Complete procedure below.
But I will try to concat the SQL statement a run it in one EXEC, we will see.
ALTER PROCEDURE [ctr].[ColumnVariabilityCheck]
@withOutput INT = 1
,@onlyErrors INT = 1
AS
BEGIN
--------------------------------------------------------------------------
-- Temp tables cleaning
--------------------------------------------------------------------------
IF OBJECT_ID ('tempdb..#Columns') IS NOT NULL
DROP TABLE #Columns
--------------------------------------------------------------------------
-- Variables
--------------------------------------------------------------------------
DECLARE @tableName VARCHAR(50)
DECLARE @columnName VARCHAR(50)
DECLARE @schemaName VARCHAR(3)
DECLARE @sql VARCHAR(1000)
DECLARE @MinimalVariabilityLimit INT = 2 -- (-1) for Unknown and 1 for other value
--------------------------------------------------------------------------
-- Load all columns and tables in specific schema
--------------------------------------------------------------------------
;
WITH
DimTables AS (
SELECT
t.[name] AS TableName
,s.[name] AS SchemaName
,t.[Object_ID] AS TableID
FROM
DWH_Staging.sys.tables t
JOIN DWH_Staging.sys.schemas s
ON s.[schema_id] = t.[schema_id]
WHERE
t.[schema_id] = 6 -- Only for dim. now
AND t.[temporal_type] = 2 -- Only non-temporal tables
)
,DimColumns AS (
SELECT
t.SchemaName AS SchemaName
,t.TableName AS TableName
,c.[name] AS ColumnName
FROM
DWH_Staging.sys.columns c
JOIN DimTables t
ON t.TableID = c.[object_id]
WHERE c.[name] NOT IN (
'BeginDate','EndDate'
,'Created','Modified'
,'CountryID','ActiveID','SourceID'
,'ValidFromID','ValidToID'
,'Domain'
)
)
-- Load to the temp table for evaluating
SELECT
c.SchemaName AS SchemaName
,c.TableName AS TableName
,c.ColumnName AS ColumnName
,NULL AS Executed -- Filled when EXEC is made
,NULL AS UniqueRows -- COUNT(DISTINCT)
INTO #Columns
FROM DimColumns c
ORDER BY c.TableName
--------------------------------------------------------------------------
-- While cycle to check all columns
--------------------------------------------------------------------------
-- Duration 20 second, not bad but needs to improve that
WHILE (SELECT COUNT(*) FROM #Columns WHERE Executed IS NULL) > 0
BEGIN
-- Clear previous results
IF OBJECT_ID ('tempdb..#DistinctCounts') IS NOT NULL
DROP TABLE #DistinctCounts
-- Load column
SELECT TOP 1
@tableName = TableName
,@columnName = ColumnName
,@schemaName = SchemaName
FROM #Columns
WHERE Executed IS NULL
-- SQL statement prepare and execution
-- It is inserted to temp table because for no results on screen during all control scripts
SET @sql = 'SELECT DISTINCT ' + @columnName + ' INTO #DistinctCounts FROM [DWH_Staging].[' + @schemaName + '].[' + @tableName + ']'
EXEC (@sql)
-- Actualize executions
UPDATE #Columns
SET
Executed = 1
,UniqueRows = @@ROWCOUNT
WHERE
TableName = @tableName
AND ColumnName = @columnName
AND SchemaName = @schemaName
END
--------------------------------------------------------------------------
-- Output results
--------------------------------------------------------------------------
IF (@withOutput = 1)
BEGIN
SELECT
c.SchemaName
,c.TableName
,c.ColumnName
,'SELECT ' + c.ColumnName + ' FROM [DWH_Staging].[' + c.SchemaName + '].[' + c.TableName + ']' AS SelectStatement
,c.UniqueRows
FROM #Columns c
WHERE
((@onlyErrors = 1) AND (c.UniqueRows < @MinimalVariabilityLimit))
OR (@onlyErrors = 0)
END
RETURN
END
September 25, 2017 at 6:52 am
Did you even try my posted code? Given what's being done, I'm not sure the presence of the loop vs. my set-based code will have a difference in execution time or not, but it seems unnecessary to invoke a WHILE loop.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
September 25, 2017 at 7:20 am
sgmunson - Monday, September 25, 2017 6:52 AMDid you even try my posted code? Given what's being done, I'm not sure the presence of the loop vs. my set-based code will have a difference in execution time or not, but it seems unnecessary to invoke a WHILE loop.
I need to rewrite a little bit your code, it ends with error
Msg 120, Level 15, State 1, Line 2
The select list for the INSERT statement contains fewer items than the insert list. The number of SELECT values must match the number of INSERT columns.
But I agree with you, the LOOP is very bad solution.
September 25, 2017 at 7:37 am
povamartin - Monday, September 25, 2017 7:20 AMsgmunson - Monday, September 25, 2017 6:52 AMDid you even try my posted code? Given what's being done, I'm not sure the presence of the loop vs. my set-based code will have a difference in execution time or not, but it seems unnecessary to invoke a WHILE loop.I need to rewrite a little bit your code, it ends with error
Msg 120, Level 15, State 1, Line 2
The select list for the INSERT statement contains fewer items than the insert list. The number of SELECT values must match the number of INSERT columns.
But I agree with you, the LOOP is very bad solution.
Yep, I see the error... one stinkin' lousy comma...
Here's the fix:WITH DimTables AS (
SELECT t.[name] AS TableName
, s.[name] AS SchemaName
, t.[Object_ID] AS TableID
FROM DWH_Staging.sys.tables AS t
INNER JOIN DWH_Staging.sys.schemas AS s
ON s.[schema_id] = t.[schema_id]
WHERE t.[schema_id] = 6 -- Only for dim. now
AND t.[temporal_type] = 2 -- Only non-temporal tables
),
DimColumns AS (
SELECT t.SchemaName AS SchemaName
, t.TableName AS TableName
, c.[name] AS ColumnName
FROM DWH_Staging.sys.columns AS c
INNER JOIN DimTables AS t
ON t.TableID = c.[object_id]
WHERE c.[name] NOT IN ('BeginDate','EndDate','Created','Modified','CountryID','ActiveID','SourceID','ValidFromID','ValidToID','Domain')
)
-- Load to the temp table for evaluating
SELECT c.SchemaName AS SchemaName
, c.TableName AS TableName
, c.ColumnName AS ColumnName
INTO #Columns
FROM DimColumns AS c
ORDER BY c.TableName;
DECLARE @SQL AS varchar(max) =
'CREATE TABLE ##Results (ID int IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED, TableName varchar(128), ColumnName varchar(128), UniqueValues bigint);
INSERT INTO ##Results (TableName, ColumnName, UniqueValues)
';
SELECT @SQL = @SQL + 'SELECT ''' + C.TableName + ''' AS TableName, ''' + C.ColumnName + ''' AS ColumnName, COUNT(DISTINCT ' + C.ColumnName + ') AS UniqueValues
FROM ' + C.SchemaName + '.' + C.TableName +
CASE WHEN LEAD(C.TableName, 1, NULL) OVER (ORDER BY C.TableName, C.ColumnName) IS NOT NULL THEN ' UNION ALL ' ELSE '; SELECT * FROM ##Results;' END
FROM #Columns AS C
ORDER BY C.TableName, C.ColumnName;
--PRINT @SQL;
EXEC (@SQL);
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
September 25, 2017 at 8:51 am
There was also problem, that some columns were names as USER so I need to do it like this [ + ColumnName ]
And the results are: 23s,17s, 16s, 16s, 20s, 17s
probably your solution is quickier, co I will do it this way. We will see after some time and data increasing. Thank you a lot 🙂
Wish you a good day from Czech Republic.
September 25, 2017 at 9:15 am
povamartin - Monday, September 25, 2017 8:51 AMThere was also problem, that some columns were names as USER so I need to do it like this [ + ColumnName ]
And the results are: 23s,17s, 16s, 16s, 20s, 17s
probably your solution is quickier, co I will do it this way. We will see after some time and data increasing. Thank you a lot 🙂
Wish you a good day from Czech Republic.
Okay, sounds good. Glad I could help.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
Viewing 13 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply