Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Table-Column Data Profiler


Table-Column Data Profiler

Author
Message
Todd M. Owens
Todd M. Owens
Valued Member
Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)

Group: General Forum Members
Points: 62 Visits: 471
Comments posted to this topic are about the item Table-Column Data Profiler
Joe Kelly-376311
Joe Kelly-376311
SSC-Enthusiastic
SSC-Enthusiastic (131 reputation)SSC-Enthusiastic (131 reputation)SSC-Enthusiastic (131 reputation)SSC-Enthusiastic (131 reputation)SSC-Enthusiastic (131 reputation)SSC-Enthusiastic (131 reputation)SSC-Enthusiastic (131 reputation)SSC-Enthusiastic (131 reputation)

Group: General Forum Members
Points: 131 Visits: 84
Many thanks for this really useful piece of code. Have been doing this with SSIS Profile task, but this is much more handy. JK


Tks,

JK

Todd M. Owens
Todd M. Owens
Valued Member
Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)

Group: General Forum Members
Points: 62 Visits: 471
Glad someone found it helpful. Here is an update that includes (Distinct) Value Counts:

DECLARE @Cat_Sch_TableName varchar(128) = 'HRIS.NTC.D20130201'
-- Initialize fully qualified object name; script can be used like a stored procedure
DECLARE @sqlstring varchar(8000), @ColumnName varchar(128), @ColumnID smallint,
@SequenceNumber smallint

-- cleanup temp tables
IF OBJECT_ID('Tempdb..#RawColumns') IS NOT NULL DROP TABLE #RawColumns
IF OBJECT_ID('Tempdb..#ProfiledColumns') IS NOT NULL DROP TABLE #ProfiledColumns

-- Populate working list of columns
SELECT ColumnID = IDENTITY(int, 1,1),
COLUMN_NAME AS ColumnName,
ORDINAL_POSITION AS SequenceNumber
INTO #RawColumns
FROM information_schema.COLUMNS
WHERE TABLE_CATALOG+'.'+TABLE_SCHEMA+'.'+TABLE_NAME=@Cat_Sch_TableName

-- Initialize output table
CREATE TABLE #ProfiledColumns(
ColumnName varchar(128) NOT NULL,
SequenceNumber varchar(10),
NullCount int ,
ModalValue varchar(255),
ModalCount int,
ValueCount int,
MinValue varchar(255),
MaxValue varchar(255),
MaxLength smallint )

-- Loop through columns
WHILE (EXISTS(
SELECT *
FROM #RawColumns
WHERE ColumnName IS NOT NULL ))

BEGIN

SELECT @ColumnID = ColumnID,
@ColumnName = ColumnName,
@SequenceNumber = SequenceNumber
FROM #RawColumns -- assign column data to variables

SET @sqlstring=
'INSERT INTO #ProfiledColumns ' +
'SELECT '''+@ColumnName+''',' +
CAST(@SequenceNumber AS varchar(10)) + ' , ' +
'SUM(CASE WHEN ['+@ColumnName+'] IS NULL THEN 1 ELSE 0 END) ,'+
'(SELECT TOP 1 ['+@ColumnName+'] FROM '+@Cat_Sch_TableName+' GROUP BY ['+@ColumnName+'] ORDER BY COUNT(*) DESC),' +
'(SELECT TOP 1 COUNT(*) FROM '+@Cat_Sch_TableName+' GROUP BY ['+@ColumnName+'] ORDER BY COUNT(*) DESC),' +
'COUNT(DISTINCT ['+@ColumnName+']),' +
'CAST(MIN(['+@ColumnName+']) AS VARCHAR(255)),'+
'CAST(MAX(['+@ColumnName+']) AS VARCHAR(255)),'+
'MAX(LEN(CAST(['+@ColumnName+'] AS VARCHAR(255)))) '+
'FROM ' + @Cat_Sch_TableName -- build query string

DELETE FROM #RawColumns WHERE ColumnID=@ColumnID -- one column down, next...
EXECUTE (@sqlstring) -- execute built query
END -- End loop block

-- Display data profile with condensed data type
SELECT
ColumnName,
DATA_TYPE +
CASE WHEN DATA_TYPE LIKE '%char' THEN '('+LTRIM(CHARACTER_MAXIMUM_LENGTH)+')'
WHEN COALESCE(NUMERIC_SCALE,0) <> 0 THEN '('+LTRIM(NUMERIC_PRECISION)+','+LTRIM(NUMERIC_SCALE)+')' ELSE ''
END AS DataTypeName,
NullCount,
ModalValue,
ModalCount,
ValueCount,
MinValue,
MaxValue,
MaxLength
FROM #ProfiledColumns
JOIN information_schema.COLUMNS
ON SequenceNumber=ORDINAL_POSITION
WHERE TABLE_CATALOG+'.'+TABLE_SCHEMA+'.'+TABLE_NAME=@Cat_Sch_TableName
ORDER BY CAST(SequenceNumber AS SMALLINT)


Michael L John
Michael L John
SSCrazy
SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)

Group: General Forum Members
Points: 2258 Visits: 7573
Handy, but it throws an error on a bit field because you cannot perform a min or max on them

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/
Todd M. Owens
Todd M. Owens
Valued Member
Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)

Group: General Forum Members
Points: 62 Visits: 471
Thanks forgot about bit. This should work on bit (other data types?) as well...

DECLARE @Cat_Sch_TableName varchar(128) = 'HRIS.dbo.E20130201'
-- Initialize fully qualified object name; script can be used like a stored procedure
DECLARE @sqlstring varchar(8000), @ColumnName varchar(128), @ColumnID smallint,
@SequenceNumber smallint

-- cleanup temp tables
IF OBJECT_ID('Tempdb..#RawColumns') IS NOT NULL DROP TABLE #RawColumns
IF OBJECT_ID('Tempdb..#ProfiledColumns') IS NOT NULL DROP TABLE #ProfiledColumns

-- Populate working list of columns
SELECT ColumnID = IDENTITY(int, 1,1),
COLUMN_NAME AS ColumnName,
ORDINAL_POSITION AS SequenceNumber
INTO #RawColumns
FROM information_schema.COLUMNS
WHERE TABLE_CATALOG+'.'+TABLE_SCHEMA+'.'+TABLE_NAME=@Cat_Sch_TableName

-- Initialize output table
CREATE TABLE #ProfiledColumns(
ColumnName varchar(128) NOT NULL,
SequenceNumber varchar(10),
NullCount int ,
ModalValue varchar(255),
ModalCount int,
ValueCount int,
MinValue varchar(255),
MaxValue varchar(255),
MaxLength smallint )

-- Loop through columns
WHILE (EXISTS(
SELECT *
FROM #RawColumns
WHERE ColumnName IS NOT NULL ))

BEGIN

SELECT @ColumnID = ColumnID,
@ColumnName = ColumnName,
@SequenceNumber = SequenceNumber
FROM #RawColumns -- assign column data to variables

SET @sqlstring=
'INSERT INTO #ProfiledColumns ' +
'SELECT '''+@ColumnName+''',' +
CAST(@SequenceNumber AS varchar(10)) + ' , ' +
'SUM(CASE WHEN ['+@ColumnName+'] IS NULL THEN 1 ELSE 0 END) ,'+
'(SELECT TOP 1 ['+@ColumnName+'] FROM '+@Cat_Sch_TableName+' GROUP BY ['+@ColumnName+'] ORDER BY COUNT(*) DESC),' +
'(SELECT TOP 1 COUNT(*) FROM '+@Cat_Sch_TableName+' GROUP BY ['+@ColumnName+'] ORDER BY COUNT(*) DESC),' +
'COUNT(DISTINCT ['+@ColumnName+']),' +
'CAST(MIN(CAST(['+@ColumnName+'] AS sql_variant)) AS VARCHAR(8000)),'+
'CAST(MAX(CAST(['+@ColumnName+'] AS sql_variant)) AS VARCHAR(8000)),'+
'MAX(LEN(CAST(['+@ColumnName+'] AS VARCHAR(255)))) '+
'FROM ' + @Cat_Sch_TableName -- build query string

DELETE FROM #RawColumns WHERE ColumnID=@ColumnID -- one column down, next...
EXECUTE (@sqlstring) -- execute built query
END -- End loop block

-- Display data profile with condensed data type
SELECT
ColumnName,
DATA_TYPE +
CASE WHEN DATA_TYPE LIKE '%char' THEN '('+LTRIM(CHARACTER_MAXIMUM_LENGTH)+')'
WHEN COALESCE(NUMERIC_SCALE,0) <> 0 THEN '('+LTRIM(NUMERIC_PRECISION)+','+LTRIM(NUMERIC_SCALE)+')' ELSE ''
END AS DataTypeName,
NullCount,
ModalValue,
ModalCount,
ValueCount,
MinValue,
MaxValue,
MaxLength
FROM #ProfiledColumns
JOIN information_schema.COLUMNS
ON SequenceNumber=ORDINAL_POSITION
WHERE TABLE_CATALOG+'.'+TABLE_SCHEMA+'.'+TABLE_NAME=@Cat_Sch_TableName
ORDER BY CAST(SequenceNumber AS SMALLINT)


SQL Fool
SQL Fool
Grasshopper
Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)

Group: General Forum Members
Points: 14 Visits: 407
Handy script, Thanks!! Many for the DB's I run into have nvarchar(max) columns. Any feel for how to address these?

Steve Pirazzi
ONEWARE, Inc.
http://www.ONEWARE.com
SteveBell
SteveBell
SSC Veteran
SSC Veteran (224 reputation)SSC Veteran (224 reputation)SSC Veteran (224 reputation)SSC Veteran (224 reputation)SSC Veteran (224 reputation)SSC Veteran (224 reputation)SSC Veteran (224 reputation)SSC Veteran (224 reputation)

Group: General Forum Members
Points: 224 Visits: 1537
Hi Todd,

I just tried your script and I like it!

A couple of questions:
1. Can it be modified to loop through all tables in a database automatically and show the results?
2. For the datatypes that it can't handle (uniqueidentifier, image, text) it generates an exception and stops processing. Is it possible that it could simply skip the columns with these datatypes and produce an output of the remaining columns?

I ask because I'm not a strong coder and not sure how to get at this.

Thanks again for an excellent script Todd

Steve
Todd M. Owens
Todd M. Owens
Valued Member
Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)

Group: General Forum Members
Points: 62 Visits: 471
Thanks Steve

I am not guaranteeing anything because I never allow use of varchar(max) and have no way to test it, BUT
for Varchar(max) types to work, change this statement:
CREATE TABLE #ProfiledColumns(
ColumnName varchar(128) NOT NULL,
SequenceNumber varchar(10),
NullCount int ,
ModalValue varchar(max),
ModalCount int,
ValueCount int,
MinValue varchar(max),
MaxValue varchar(max),

MaxLength smallint



And this statement:
SET @sqlstring=
'INSERT INTO #ProfiledColumns ' +
'SELECT '''+@ColumnName+''',' +
CAST(@SequenceNumber AS varchar(10)) + ' , ' +
'SUM(CASE WHEN ['+@ColumnName+'] IS NULL THEN 1 ELSE 0 END) ,'+
'(SELECT TOP 1 ['+@ColumnName+'] FROM '+@Cat_Sch_TableName+' GROUP BY ['+@ColumnName+'] ORDER BY COUNT(*) DESC),' +
'(SELECT TOP 1 COUNT(*) FROM '+@Cat_Sch_TableName+' GROUP BY ['+@ColumnName+'] ORDER BY COUNT(*) DESC),' +
'COUNT(DISTINCT ['+@ColumnName+']),' +
'CAST(MIN(CAST(['+@ColumnName+'] AS sql_variant)) AS VARCHAR(max)),'+
'CAST(MAX(CAST(['+@ColumnName+'] AS sql_variant)) AS VARCHAR(max)),'+
'MAX(LEN(CAST(['+@ColumnName+'] AS VARCHAR(255)))) '+
'FROM ' + @Cat_Sch_TableName -- build query string


Todd M. Owens
Todd M. Owens
Valued Member
Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)Valued Member (62 reputation)

Group: General Forum Members
Points: 62 Visits: 471
SteveBell (2/20/2013)
Hi Todd,

I just tried your script and I like it!
Thanks!

SteveBell (2/20/2013)
A couple of questions:
Uhoh..

SteveBell (2/20/2013)
1. Can it be modified to loop through all tables in a database automatically and show the results?
Yes but not by me yet; you will need to set up an outer loop using a Cursor or a WHILE statement or somesuch. Maybe later...

SteveBell (2/20/2013)
2. For the datatypes that it can't handle (uniqueidentifier, image, text) it generates an exception and stops processing. Is it possible that it could simply skip the columns with these datatypes and produce an output of the remaining columns?

Try adding this bolded line:
WHERE TABLE_CATALOG+'.'+TABLE_SCHEMA+'.'+TABLE_NAME=@Cat_Sch_TableName 
AND DATA_TYPE NOT IN ('uniqueidentifier','image','text') -- excluded types


SteveBell
SteveBell
SSC Veteran
SSC Veteran (224 reputation)SSC Veteran (224 reputation)SSC Veteran (224 reputation)SSC Veteran (224 reputation)SSC Veteran (224 reputation)SSC Veteran (224 reputation)SSC Veteran (224 reputation)SSC Veteran (224 reputation)

Group: General Forum Members
Points: 224 Visits: 1537
Thanks again Todd,

This is very helpful and much appreciated.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search