Data Profiling with T-SQL

  • Adam Aspin

    SSCarpal Tunnel

    Points: 4845

    Comments posted to this topic are about the item Data Profiling with T-SQL

  • KJKKPSI

    SSCommitted

    Points: 1514

    Thanks for the excellent article, and the wonderfully useful script. I can already think of a number of instances where something like this could have been huge time savers.

    One thing of note, there appears to be a character that got lost in translation along the way.

    On Line 114: There appears to be a single quote (') that was incorrectly converted to value '#39;'

    WHERE DATA_TYPE IN ('Date', 'DateTime', 'SmallDateTime', #39;DateTime2', 'time')

  • Adam Aspin

    SSCarpal Tunnel

    Points: 4845

    Thanks Ken, I will see if I can get this corrected.

    Glad that you find this script useful.

    Adam

  • Ira Warren Whiteside

    SSCrazy

    Points: 2146

    Adam, excellent article and my favorite topic.

    I have several articles and scripts also on Data Profiling via SSIS(DataFlow script component) and also TSQL.

    [/url]

    [font="Comic Sans MS"]Ira Warren Whiteside[/font]

  • David Stein Jr

    Valued Member

    Points: 53

    Nice script Adam, but I found another typo. About line 257, you'll find the following:

    + QUOTENAME(@COLUMN_NAME) + ')

    ,NULL

    ,NLL

    ,NULL

    NLL should be NULL, right?

    Thanks for the script though. It works great after I cleaned up the 2 typos. 🙂

  • Adam Aspin

    SSCarpal Tunnel

    Points: 4845

    Thanks David!

  • Abu Dina

    SSChampion

    Points: 14155

    This is great stuff! Thanks very much!

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • macthaya

    Valued Member

    Points: 52

    when I execute the attached script for my table and for all columns, I am getting the following error:

    ErrorNumbe ErrorMessage

    207 Invalid column name 'NLL'.

    Here are my values:

    DECLARE @TABLE_SCHEMA NVARCHAR(128) = 'dbo' -- Your schema here

    DECLARE @TABLE_NAME NVARCHAR(128) = 'Customer' -- Your table here

    DECLARE @ColumnListIN NVARCHAR(4000) = '' -- Enter a comma-separated list of specific columns

    -- to profile, or leave blank for all

    Please help me on this error.

  • Stanislav Novoseletskiy

    SSC Journeyman

    Points: 87

    Note to Author. This is a GREAT script but I also got the NLL error. Appears there's typo in line 257 ... just replace NLL with NULL and it works like a charm

  • jason_bougas

    SSC Enthusiast

    Points: 111

    This is a really useful script, thanks.

    I was using it on a 160,000,000 row table and noticed that I got an arithmetic overflow error. Traced this down to the following part

    , AVG(LEN(' + QUOTENAME(@COLUMN_NAME) + '))

    I cast the result of the LEN as bigint before calculating the AVG to overcome this:

    , AVG(CAST(LEN(' + QUOTENAME(@COLUMN_NAME) + ') AS BIGINT))

  • Adam Aspin

    SSCarpal Tunnel

    Points: 4845

    Hi there,

    Thanks for the comment and I am glad that this was useful.

    160 M records - wow!

    All the best,

    Adam

  • kkffjj

    Grasshopper

    Points: 22

    Excellent script... thank you. 

    I have two issues to report...

    1 - DECLARE @ColumnList VARCHAR(4000) = ' ' - this is too short for tables with lots of columns... consider using a varchar(max) - or even a dynamic value like sum of the len() of all columns from a table etc... 
    2 - for columns which are BIT data type... I noticed that when all rows have value of "0", it doesn't get included in the results output... not sure why.

  • kkffjj

    Grasshopper

    Points: 22

    kkffjj - Saturday, April 15, 2017 3:08 PM

    Excellent script... thank you. 

    I have two issues to report...

    1 - DECLARE @ColumnList VARCHAR(4000) = ' ' - this is too short for tables with lots of columns... consider using a varchar(max) - or even a dynamic value like sum of the len() of all columns from a table etc... 
    2 - for columns which are BIT data type... I noticed that when all rows have value of "0", it doesn't get included in the results output... not sure why.

    ...and another:

    3 - you need to have a condition before using ISDATE()... as this function fails on values > 4000 characters in length.

  • johnmcp

    SSCommitted

    Points: 1969

    The foreign key check does not work correctly. 
    On Line 404 there is an update with a join between the #ProfileData table and a CTE with the check for FK. The #ProfileData COLUMN_NAME field contains square brackets on the field names and the CTE does not. 
    I changed the UPDATE to this:
    UPDATE    P
    SET        P.IsFK = 1
    FROM    #ProfileData P
            INNER JOIN FK_CTE CTE
            ON P.COLUMN_NAME = '[' + CTE.FKColumnName + ']'

  • mschluper

    Newbie

    Points: 6

    The ISDATE function does not work for columns with long texts. 

    When I replaced 

            WHERE ISDATE(' + QUOTENAME(@COLUMN_NAME) + ') = 1)

    by

            WHERE TRY_CAST(' + QUOTENAME(@COLUMN_NAME) + ' AS Date) IS NOT NULL)

    it worked fine.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply