Find columns with no data

  • I am in the process of re-developing my application for my customers and I want to loop through all of the columns of specific tables that contain 150 fields or more and find which columns have no data in them. My goal is to get rid of fields that users are not using. I am not well versed on t-sql enough to loop through columns and create I am assuming dynamic sql to see if any data is returned for each column. Any advice or help would be greatly appreciated.

    Owen White

  • You could generate the needed code fairly easily from INFORMATION_SCHEMA.columns.

    You have to decide, though, what specifically you mean by "no data".

    Must the column be NULL?

    If it's a non-nullable column, does '' / 0 mean the column has "no data"?

    Do any columns have "flag" values that mean "no data"? (like 'n/a' or 99 or 1900/01/01 or whatever).

    Scott Pletcher, SQL Server MVP 2008-2010

  • The columns are null values so it is truly "No Data." I do have some columns that have default values that might come into play but the columns I am interested in do not have a default value and are nullable. I have not used the INFORMATION_SCHEMA. I think I used this one time in the past to return the number of rows per table but I found that this was not as accurate as I had wished. When I went to validate the number of rows, the numbers did not match. That is neither here nor there.

    How can I use the information_schema to find what I need?

    Owen White

  • Fill in your table name and uncomment the EXEC() when ready to go.

    DECLARE @tablename sysname

    SET @tablename = N'info' --<<-- put table name here :-)

    DECLARE @sql varchar(max)

    SET @sql = NULL

    SELECT @sql = ISNULL(@sql + ', ' + CHAR(13) + CHAR(10), '') + SPACE(4) +

    'SUM(CASE WHEN [' + CAST(column_name AS varchar(128)) + '] IS NULL THEN 0 ELSE 1 END) ' +

    'AS [' + CAST(column_name AS varchar(128)) + ']'

    FROM INFORMATION_SCHEMA.columns

    WHERE table_name = @tablename

    AND IS_NULLABLE = 'YES'

    AND COLUMNPROPERTY(OBJECT_ID(@tablename), column_name, 'IsComputed') = 0

    ORDER BY ordinal_position

    SET @sql = 'SELECT' + CHAR(13) + CHAR(10) + @sql + CHAR(13) + CHAR(10) +

    'FROM [' + CAST(@tablename AS varchar(128)) + ']'

    PRINT @sql

    --EXEC(@sql)

    Any column that shows a total of "0" has only NULL values in it.

    Scott Pletcher, SQL Server MVP 2008-2010

  • I cannot tell you how much I appreciate it. I tend to learn from examples and this will give me great insight into how to write more powerful queries. Thank you again.

    Owen White

  • You're welcome. Glad it helped!

    Scott Pletcher, SQL Server MVP 2008-2010

  • Hello Scott,

    Do you have a solution for larger tables? I have a table with 207 columns, and very long column names, the average being approx 26 characters. The code you posted truncates the @sql at just over 22k chars even tho it is declared as varchar(max).

  • Hello,

    in the same way as Scott I tried to do it with a cursor for all tables, try this code:

    select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, Null AS HasNotNullValue

    INTO #A

    FROM information_schema.columns

    DECLARE @TABLE_CATALOG AS VARCHAR(100)

    DECLARE @TABLE_SCHEMA AS VARCHAR(100)

    DECLARE @TABLE_NAME AS VARCHAR(100)

    DECLARE @COLUMN_NAME AS VARCHAR(100)

    DECLARE @FullTableName AS VARCHAR(300)

    DECLARE @ColumnName AS VARCHAR(100)

    DECLARE @SqlInst as VARCHAR(1000)

    DECLARE C CURSOR FOR

    SELECT TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME,

    '[' + TABLE_CATALOG + '].[' + TABLE_SCHEMA + '].[' + TABLE_NAME + ']' AS FullTableName,

    '[' + COLUMN_NAME + ']' AS ColumnName

    FROM #A

    OPEN C

    FETCH C INTO @TABLE_CATALOG, @TABLE_SCHEMA, @TABLE_NAME, @COLUMN_NAME, @FullTableName, @ColumnName

    WHILE @@Fetch_Status = 0

    BEGIN

    SET @SqlInst = 'UPDATE #A SET HasNotNullValue = ('

    SET @SqlInst = @SqlInst + 'SELECT MAX(CASE WHEN ' + @ColumnName + 'IS NULL THEN 0 ELSE 1 END)'

    SET @SqlInst = @SqlInst + ' FROM ' + @FullTableName

    SET @SqlInst = @SqlInst + ') WHERE TABLE_CATALOG = ''' + @TABLE_CATALOG

    SET @SqlInst = @SqlInst + ''' AND TABLE_SCHEMA = ''' + @TABLE_SCHEMA

    SET @SqlInst = @SqlInst + ''' AND TABLE_NAME = ''' + @TABLE_NAME

    SET @SqlInst = @SqlInst + ''' AND COLUMN_NAME = ''' + @COLUMN_NAME + ''''

    execute (@SqlInst)

    FETCH C INTO @TABLE_CATALOG, @TABLE_SCHEMA, @TABLE_NAME, @COLUMN_NAME, @FullTableName, @ColumnName

    END

    CLOSE C

    DEALLOCATE C

    SELECT * FROM #A

    DROP TABLE #A

    Regards,

    Francesc

  • I have a table with 207 columns, and very long column names, the average being approx 26 characters. The code you posted truncates the @sql at just over 22k chars even tho it is declared as varchar(max).

    That might be just a display issue, i.e. the PRINT result is being truncated but the actual query is OK. Check your settings in Management Studio (under "Query", "Query Options", "Results").

    Did you just get a truncated display or did you also get an error when you ran the command, which looked like it was caused by truncating?

    [207 columns...impressive! Perhaps not normalized, but impressive :-)]

    Scott Pletcher, SQL Server MVP 2008-2010

  • Scott, it is definitely cutting off. I inserted a PRINT LEN(@sql) right after the statement that begins with

    "SELECT @sql = ISNULL(@sql " etc.

    and it returns 22492 characters.

    PS this is an inherited design, 200+ columns was not my idea 🙂 but luckily it's the only one like that.

  • Hmm, no reason it should arbitrarily stop, altho that method of concatening values can supposedly be flaky.

    Did you check the Query settings, to make sure it's not just a display issue?

    Does the EXEC(@sql) get an error? Does that error look like it's caused by truncation?

    For now, I'd run the statement that builds the @sql statement twice, doing roughly 1/2 the columns in each.

    For example,

    SELECT @sql = ...

    WHERE ...

    AND ordinal_position <= 100

    Then another statement with.

    SELECT @sql2 = ...

    WHERE ...

    AND ordinal_position > 100

    Then something like this:

    EXEC('SELECT ...' + @sql + @sql2 + '...')

    Scott Pletcher, SQL Server MVP 2008-2010

  • I found this very interesting so I did some more research. . .

    Scott, your code works perfectly fine, the EXEC does work as expected. Apparently the PRINT statement has a size limitation; if I do a SELECT @sql it returns the entire string, all 22492 characters.

    frfernan, your code works fine & covers all tables, but because of the cursor it takes 2 minutes 15 seconds to run on my database. Also, because information_schema includes not only tables but views, and has no column to distinguish between them, you have to inspect the results to find which are actual tables vs. views. Finally, I would add this to display the results, and allow you to run this more than once:

    DEALLOCATE C

    SELECT * FROM #A WHERE HasNotNullValue = 0 ORDER BY 1,2,3,4

    DROP TABLE #A

    Your code does has the advantage that the HasNotNullValue returns NULL if the table / view has no records.

    I went ahead & wrote the following which covers only tables & runs in approx 15 seconds on my database:

    DECLARE

    @ColumnName sysname,

    @NotNulls int,

    @sql nvarchar(1000),

    @TableName sysname,

    @this int,

    @rows int

    IF OBJECT_ID('dbo.my_nulls') IS NOT NULL

    DROP TABLE dbo.my_nulls

    CREATE TABLE dbo.my_nulls

    (

    ID int IDENTITY(1,1) NOT NULL,

    TableName sysname,

    ColumnName sysname,

    Rows_NotNull int

    )

    SET NOCOUNT ON

    INSERT my_nulls ( TableName, ColumnName )

    SELECT t.name, c.name

    FROM sys.tables t INNER JOIN sys.columns c ON t.object_id = c.object_id

    SELECT @rows = COUNT(*) FROM my_nulls

    SELECT @this = 1

    WHILE @this < @rows

    BEGIN

    SELECT @sql = 'SELECT @NotNullsOUT = COUNT(*) FROM [' + TableName + '] WHERE [' + ColumnName + '] IS NOT NULL'

    FROM my_nulls

    WHERE ID = @this

    EXEC sp_executesql @sql, N'@NotNullsOUT int OUTPUT', @NotNullsOUT = @NotNulls OUTPUT

    UPDATE my_nulls SET Rows_NotNull = @NotNulls WHERE ID = @this

    SET @this = @this + 1

    END

    SELECT

    TableName,

    ColumnName

    FROM

    my_nulls

    WHERE

    Rows_NotNull = 0

    ORDER BY

    TableName,

    ColumnName

  • WILLIAM MITCHELL (8/4/2010)


    Also, because information_schema includes not only tables but views, and has no column to distinguish between them, you have to inspect the results to find which are actual tables vs. views.

    Will, INFORMATION_SCEHMA.TABLES will have information whether an object is a table or view. You can do something like

    SELECT COLS.*

    FROM information_schema.columns COLS

    JOIN information_schema.tables TABS

    ON COLS.table_name = TABS.table_name

    WHERE

    TABS.table_type = 'BASE TABLE'

    ORDER BY COLS.table_name , COLS.ordinal_position

    From that you can retrieve the column names pertaining only to Tables.

    Hope that helps!

  • Sorry, yes, I forgot to mention the table type.

    Scanning the table once for every column is huge extra overhead, of course. To do it that quickly, you have a fairly small table and / or a very powerful box.

    That's one reason I wrote the code I did. All the examples I've seen posted online were one column at a time.

    Scott Pletcher, SQL Server MVP 2008-2010

  • Hi all,

    yes, scanning trough all views is an excess of work, and it could be expensive in the case of views with some hard calculation. And, as Scott points, run a query for each column is overloading when the Scott's suggestion runs a query for each table. Anyway, my main disatisfaction is running a MAX or COUNT sentence when an EXISTS sentence could be good enough, and it will do the job faster.

    So I corrected my first script in two ways:

    - Adding the filter for tables that coldCoffee suggested.

    - Changing the MAX sentence for a EXISTS sentence.

    After these changes, my tests passed from 20 min (and cancelling the process) to 19 sec. Time shouldn't be a critical question because this script shouldn't run continuously, but better if faster. The resulting script is:

    SELECT C.TABLE_CATALOG, C.TABLE_SCHEMA, C.TABLE_NAME, C.COLUMN_NAME, Null AS HasNotNullValue

    INTO #A

    FROM information_schema.columns C

    INNER JOIN information_schema.tables T

    ON C.TABLE_CATALOG = T.TABLE_CATALOG AND C.TABLE_SCHEMA = T.TABLE_SCHEMA AND C.TABLE_NAME = T.TABLE_NAME

    WHERE T.TABLE_TYPE = 'BASE TABLE' AND IS_NULLABLE = 'YES'

    DECLARE @TABLE_CATALOG AS VARCHAR(100)

    DECLARE @TABLE_SCHEMA AS VARCHAR(100)

    DECLARE @TABLE_NAME AS VARCHAR(100)

    DECLARE @COLUMN_NAME AS VARCHAR(100)

    DECLARE @FullTableName AS VARCHAR(300)

    DECLARE @ColumnName AS VARCHAR(100)

    DECLARE @SqlInst as VARCHAR(1000)

    DECLARE C CURSOR FOR

    SELECT TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME,

    '[' + TABLE_CATALOG + '].[' + TABLE_SCHEMA + '].[' + TABLE_NAME + ']' AS FullTableName,

    '[' + COLUMN_NAME + ']' AS ColumnName

    FROM #A

    OPEN C

    FETCH C INTO @TABLE_CATALOG, @TABLE_SCHEMA, @TABLE_NAME, @COLUMN_NAME, @FullTableName, @ColumnName

    WHILE @@Fetch_Status = 0

    BEGIN

    SET @SqlInst = 'UPDATE #A SET HasNotNullValue = CASE'

    SET @SqlInst = @SqlInst + ' WHEN EXISTS (SELECT ' + @ColumnName

    SET @SqlInst = @SqlInst + ' FROM ' + @FullTableName

    SET @SqlInst = @SqlInst + ' WHERE ' + @ColumnName + ' IS NOT NULL) THEN 1 ELSE 0 END'

    SET @SqlInst = @SqlInst + ' WHERE TABLE_CATALOG = ''' + @TABLE_CATALOG

    SET @SqlInst = @SqlInst + ''' AND TABLE_SCHEMA = ''' + @TABLE_SCHEMA

    SET @SqlInst = @SqlInst + ''' AND TABLE_NAME = ''' + @TABLE_NAME

    SET @SqlInst = @SqlInst + ''' AND COLUMN_NAME = ''' + @COLUMN_NAME + ''''

    execute (@SqlInst)

    FETCH C INTO @TABLE_CATALOG, @TABLE_SCHEMA, @TABLE_NAME, @COLUMN_NAME, @FullTableName, @ColumnName

    END

    CLOSE C

    DEALLOCATE C

    SELECT * FROM #A ORDER BY 1, 2, 3, 4

    DROP TABLE #A

    I know some people hate cursors, sorry, I will burn forever in the hell 🙂

    Regards,

    Francesc

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply