Spring Clean Your Database Schema

  • Comments posted to this topic are about the item Spring Clean Your Database Schema

  • Since this only works in SQL 2005 (due to the "OVER" clause on the "COUNT(*)" statement), you could replace the temporary table with a CTE.

    Alternatively, for SQL 2000, you could use a nested query, and order by "COUNT(COLUMN_NAME) DESC" instead.

  • Half the problem here is that SQL isn't very good at relational 'domains' i.e. a standard definition of a 'type' of column, with predefinied rules (constraints), datatype, valid values etc.

    Also missing is the ability to abstract types i.e. when defining parameters in t/sql you can't define them based on a domain, or even an existing column (in ORACLE this is done with the '%' operator e.g. title book.title%TYPE )

    Finally, the lack of a 'record' datatype also makes life difficult, as you can't easily define a full set of local variables that map the types of a row in a table, resulting ineven MORE duplicate definitions e.g:

    --define a PL/SQL table containing entries of type book_rec:

    Type book_rec_tab IS TABLE OF book_rec%TYPE

    INDEX BY BINARY_INTEGER;

    Personally it's about time this stuff arrived! It would do more for consistency in typing than any amount of tools.

    However, this is a useful stopgap tool - many thanks to the author!

  • To make this work in SQL 2000

    Change the first Query to:

    [Code]

    -- Calculate prevalence of column name

    SELECT COLUMN_NAME, CONVERT(DECIMAL(12,2), Count(*)* 100.0/t.total) as [%]

    INTO #Prevalence

    FROM INFORMATION_SCHEMA.COLUMNS

    CROSS JOIN

    (SELECT Count(*) as total FROM INFORMATION_SCHEMA.COLUMNS) t

    GROUP BY COLUMN_NAME, t.total

    -- Do the columns differ on datatype across the schemas and tables?

    [/code]

    --

    JimFive

  • Ha! Just last Friday, I discovered I could not define a foreign key because the datatype was different between two tables. The first thing on my To-Do list for today was to check all the tables to see what other tables might be using the incorrect datatype.

    Thanks for writing the query I need for me!

  • I decided I wanted the Prevalence to be the percentage of all tables not of all columns, so I used James Goodwin's code and modified it to get the total table count.

    SELECT COLUMN_NAME, CONVERT(DECIMAL(12,2), Count(*)* 100.0/t.totalTables) as [%]

    INTO #Prevalence

    FROM INFORMATION_SCHEMA.COLUMNS

    CROSS JOIN

    (SELECT Count(*) as totalTables FROM INFORMATION_SCHEMA.TABLES ) t

    GROUP BY COLUMN_NAME, t.totalTables

    Select * from #Prevalence gives me useful information, all by itself.

  • In SQL Server 2008 in the first statement (SET TRANSACTION)

    ISOLATIONLEVEL should be

    ISOLATION LEVEL

    Herman

  • Ian,

    I only wanted to see the column names that appear more than once, so that I exclude all the columns that are ok.

    So I re-wrote your query to:

    -- From http://www.sqlservercentral.com/articles/Admin/65138/

    /*----------------------------------------------------------------------

    Purpose: Identify columns having different datatypes, for the same column name.

    Sorted by the prevalence of the mismatched column.

    ------------------------------------------------------------------------

    Revision History:

    06/01/2008 Ian_Stirk@yahoo.com Initial version.

    2009-01-28 hsp@stovi.com; only show the column names that appear more than once

    -----------------------------------------------------------------------*/

    -- Do not lock anything, and do not get held up by any locks.

    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

    BEGIN TRY

    DROP TABLE #Prevalence

    END TRY

    BEGIN CATCH

    END CATCH

    BEGIN TRY

    DROP TABLE #FieldList

    END TRY

    BEGIN CATCH

    END CATCH

    -- Calculate prevalence of column name

    SELECT

    COLUMN_NAME

    ,[%] = CONVERT(DECIMAL(12,2),COUNT(COLUMN_NAME)* 100.0 / COUNT(*)OVER())

    INTO #Prevalence

    FROM INFORMATION_SCHEMA.COLUMNS

    GROUP BY COLUMN_NAME

    -- Do the columns differ on datatype across the schemas and tables?

    SELECT DISTINCT

    C1.COLUMN_NAME

    , C1.TABLE_SCHEMA + '.' + C1.TABLE_NAME AS TableName

    , C1.DATA_TYPE

    , C1.CHARACTER_MAXIMUM_LENGTH

    , C1.NUMERIC_PRECISION

    , C1.NUMERIC_SCALE

    , [%]

    INTO #FieldList

    FROM INFORMATION_SCHEMA.COLUMNS C1

    INNER JOIN INFORMATION_SCHEMA.COLUMNS C2 ON C1.COLUMN_NAME = C2.COLUMN_NAME

    INNER JOIN #Prevalence p ON p.COLUMN_NAME = C1.COLUMN_NAME

    WHERE ((C1.DATA_TYPE != C2.DATA_TYPE)

    OR (C1.CHARACTER_MAXIMUM_LENGTH != C2.CHARACTER_MAXIMUM_LENGTH)

    OR (C1.NUMERIC_PRECISION != C2.NUMERIC_PRECISION)

    OR (C1.NUMERIC_SCALE != C2.NUMERIC_SCALE))

    ORDER BY [%] DESC, C1.COLUMN_NAME, C1.TABLE_SCHEMA + '.' + C1.TABLE_NAME

    -- Tidy up.

    DROP TABLE #Prevalence

    --SELECT * FROM #fieldList ORDER BY column_name

    ;

    WITH cte (column_name, Data_type, Character_maximum_length, Numeric_precision, Numeric_Scale, UsedInTables, rownum) as

    (

    SELECT column_name

    , MAX(data_type) AS Data_type, MAX(Character_maximum_length) AS Character_maximum_length, MAX(Numeric_precision) AS Numeric_precision

    , MAX(Numeric_Scale) AS Numeric_Scale

    , (SELECT TableName + ', ' FROM #fieldList F1

    WHERE f1.Column_name= F.Column_name

    AND (f1.Data_type = F.Data_Type OR F.Data_Type IS NULL)

    AND (f1.Character_maximum_length = f.Character_maximum_length OR f.Character_maximum_length IS NULL)

    AND (f1.Numeric_precision = f.Numeric_precision OR f.Numeric_precision IS NULL)

    AND (f1.Numeric_Scale = f.Numeric_Scale OR f.Numeric_Scale IS NULL)

    for xml path('')) AS UsedInTables

    , ROW_NUMBER() OVER ( PARTITION BY column_name ORDER BY data_type, Character_maximum_length) AS rownum

    FROM #fieldList F

    WHERE

    f.tablename NOT LIKE 'tmp%'

    GROUP BY column_name, Data_type, Character_maximum_length, Numeric_precision, Numeric_Scale

    )

    SELECT c1.* FROM cte c1

    INNER JOIN cte c2 ON c1.column_name = c2.Column_name

    WHERE c2.rownum=2

    Thank you very much for your initial script.

    Best regards,

    Henrik Staun Poulsen

    Stovi Software

  • A bit late but I made the following updates which were useful to me -- flag table versus view and identify if a field is an identity field. Maybe someone else will as well...

    /*----------------------------------------------------------------------

    Purpose: Identify columns having different datatypes, for the same column name.

    Sorted by the prevalence of the mismatched column.

    ------------------------------------------------------------------------

    Revision History:

    06/01/2008 Ian_Stirk@yahoo.com Initial version.

    03/11/2014Martin McDonald -- Modified to add is_table and is_identity information

    -----------------------------------------------------------------------*/

    -- Do not lock anything, and do not get held up by any locks.

    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

    -- Calculate prevalence of column name

    SELECT

    COLUMN_NAME

    ,[%] = CONVERT(DECIMAL(12,2),COUNT(COLUMN_NAME)* 100.0 / COUNT(*)OVER())

    INTO #Prevalence

    FROM INFORMATION_SCHEMA.COLUMNS

    GROUP BY COLUMN_NAME

    -- Do the columns differ on datatype across the schemas and tables?

    SELECT DISTINCT

    C1.COLUMN_NAME

    , C1.TABLE_SCHEMA

    , C1.TABLE_NAME

    , case

    when t.name is null then 'View'

    else 'Table'

    end as is_table

    , c.is_identity

    , C1.DATA_TYPE

    , C1.CHARACTER_MAXIMUM_LENGTH

    , C1.NUMERIC_PRECISION

    , C1.NUMERIC_SCALE

    , [%]

    FROM INFORMATION_SCHEMA.COLUMNS C1

    INNER JOIN INFORMATION_SCHEMA.COLUMNS C2 ON C1.COLUMN_NAME = C2.COLUMN_NAME

    INNER JOIN #Prevalence p ON p.COLUMN_NAME = C1.COLUMN_NAME

    left outer join sys.tables t on c1.TABLE_NAME = t.name

    left outer join sys.columns c on t.object_id = c.object_id

    and c1.COLUMN_NAME = c.name

    WHERE ((C1.DATA_TYPE != C2.DATA_TYPE)

    OR (C1.CHARACTER_MAXIMUM_LENGTH != C2.CHARACTER_MAXIMUM_LENGTH)

    OR (C1.NUMERIC_PRECISION != C2.NUMERIC_PRECISION)

    OR (C1.NUMERIC_SCALE != C2.NUMERIC_SCALE))

    ORDER BY [%] DESC, C1.COLUMN_NAME, C1.TABLE_SCHEMA, C1.TABLE_NAME

    -- Tidy up.

    DROP TABLE #Prevalence

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply