Spring Clean Your Database Schema

Question

Spring Clean Your Database Schema

ianstirk

Ten Centuries

Points: 1310
More actions
January 17, 2009 at 12:33 pm

#196291

Comments posted to this topic are about the item Spring Clean Your Database Schema

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

richardd Hall of Fame Points: 3963 More actions · Answer 1

Since this only works in SQL 2005 (due to the "OVER" clause on the "COUNT(*)" statement), you could replace the temporary table with a CTE.

Alternatively, for SQL 2000, you could use a nested query, and order by "COUNT(COLUMN_NAME) DESC" instead.

Phil Morris-454316 SSC Veteran Points: 252 More actions · Answer 2

Half the problem here is that SQL isn't very good at relational 'domains' i.e. a standard definition of a 'type' of column, with predefinied rules (constraints), datatype, valid values etc.

Also missing is the ability to abstract types i.e. when defining parameters in t/sql you can't define them based on a domain, or even an existing column (in ORACLE this is done with the '%' operator e.g. title book.title%TYPE )

Finally, the lack of a 'record' datatype also makes life difficult, as you can't easily define a full set of local variables that map the types of a row in a table, resulting ineven MORE duplicate definitions e.g:

--define a PL/SQL table containing entries of type book_rec:

Type book_rec_tab IS TABLE OF book_rec%TYPE

INDEX BY BINARY_INTEGER;

Personally it's about time this stuff arrived! It would do more for consistency in typing than any amount of tools.

However, this is a useful stopgap tool - many thanks to the author!

James Goodwin Hall of Fame Points: 3691 More actions · Answer 3

To make this work in SQL 2000

Change the first Query to:

[Code]

-- Calculate prevalence of column name

SELECT COLUMN_NAME, CONVERT(DECIMAL(12,2), Count(*)* 100.0/t.total) as [%]

INTO #Prevalence

FROM INFORMATION_SCHEMA.COLUMNS

CROSS JOIN

(SELECT Count(*) as total FROM INFORMATION_SCHEMA.COLUMNS) t

GROUP BY COLUMN_NAME, t.total

-- Do the columns differ on datatype across the schemas and tables?

[/code]

--

JimFive

SqlOnMyMind SSCertifiable Points: 5062 More actions · Answer 4

Ha! Just last Friday, I discovered I could not define a foreign key because the datatype was different between two tables. The first thing on my To-Do list for today was to check all the tables to see what other tables might be using the incorrect datatype.

Thanks for writing the query I need for me!

SqlOnMyMind SSCertifiable Points: 5062 More actions · Answer 5

I decided I wanted the Prevalence to be the percentage of all tables not of all columns, so I used James Goodwin's code and modified it to get the total table count.

SELECT COLUMN_NAME, CONVERT(DECIMAL(12,2), Count(*)* 100.0/t.totalTables) as [%]

INTO #Prevalence

FROM INFORMATION_SCHEMA.COLUMNS

CROSS JOIN

(SELECT Count(*) as totalTables FROM INFORMATION_SCHEMA.TABLES ) t

GROUP BY COLUMN_NAME, t.totalTables

Select * from #Prevalence gives me useful information, all by itself.

herman-991596 SSC Rookie Points: 30 More actions · Answer 6

In SQL Server 2008 in the first statement (SET TRANSACTION)

ISOLATIONLEVEL should be

ISOLATION LEVEL

Herman

Henrik Staun Poulsen SSCertifiable Points: 6551 More actions · Answer 7

Ian,

I only wanted to see the column names that appear more than once, so that I exclude all the columns that are ok.

So I re-wrote your query to:

-- From http://www.sqlservercentral.com/articles/Admin/65138/

/*----------------------------------------------------------------------

Purpose: Identify columns having different datatypes, for the same column name.

Sorted by the prevalence of the mismatched column.

------------------------------------------------------------------------

Revision History:

06/01/2008 Ian_Stirk@yahoo.com Initial version.

2009-01-28 hsp@stovi.com; only show the column names that appear more than once

-----------------------------------------------------------------------*/

-- Do not lock anything, and do not get held up by any locks.

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

BEGIN TRY

DROP TABLE #Prevalence

END TRY

BEGIN CATCH

END CATCH

BEGIN TRY

DROP TABLE #FieldList

END TRY

BEGIN CATCH

END CATCH

-- Calculate prevalence of column name

SELECT

COLUMN_NAME

,[%] = CONVERT(DECIMAL(12,2),COUNT(COLUMN_NAME)* 100.0 / COUNT(*)OVER())

INTO #Prevalence

FROM INFORMATION_SCHEMA.COLUMNS

GROUP BY COLUMN_NAME

-- Do the columns differ on datatype across the schemas and tables?

SELECT DISTINCT

C1.COLUMN_NAME

, C1.TABLE_SCHEMA + '.' + C1.TABLE_NAME AS TableName

, C1.DATA_TYPE

, C1.CHARACTER_MAXIMUM_LENGTH

, C1.NUMERIC_PRECISION

, C1.NUMERIC_SCALE

, [%]

INTO #FieldList

FROM INFORMATION_SCHEMA.COLUMNS C1

INNER JOIN INFORMATION_SCHEMA.COLUMNS C2 ON C1.COLUMN_NAME = C2.COLUMN_NAME

INNER JOIN #Prevalence p ON p.COLUMN_NAME = C1.COLUMN_NAME

WHERE ((C1.DATA_TYPE != C2.DATA_TYPE)

OR (C1.CHARACTER_MAXIMUM_LENGTH != C2.CHARACTER_MAXIMUM_LENGTH)

OR (C1.NUMERIC_PRECISION != C2.NUMERIC_PRECISION)

OR (C1.NUMERIC_SCALE != C2.NUMERIC_SCALE))

ORDER BY [%] DESC, C1.COLUMN_NAME, C1.TABLE_SCHEMA + '.' + C1.TABLE_NAME

-- Tidy up.

DROP TABLE #Prevalence

--SELECT * FROM #fieldList ORDER BY column_name

;

WITH cte (column_name, Data_type, Character_maximum_length, Numeric_precision, Numeric_Scale, UsedInTables, rownum) as

(

SELECT column_name

, MAX(data_type) AS Data_type, MAX(Character_maximum_length) AS Character_maximum_length, MAX(Numeric_precision) AS Numeric_precision

, MAX(Numeric_Scale) AS Numeric_Scale

, (SELECT TableName + ', ' FROM #fieldList F1

WHERE f1.Column_name= F.Column_name

AND (f1.Data_type = F.Data_Type OR F.Data_Type IS NULL)

AND (f1.Character_maximum_length = f.Character_maximum_length OR f.Character_maximum_length IS NULL)

AND (f1.Numeric_precision = f.Numeric_precision OR f.Numeric_precision IS NULL)

AND (f1.Numeric_Scale = f.Numeric_Scale OR f.Numeric_Scale IS NULL)

for xml path('')) AS UsedInTables

, ROW_NUMBER() OVER ( PARTITION BY column_name ORDER BY data_type, Character_maximum_length) AS rownum

FROM #fieldList F

WHERE

f.tablename NOT LIKE 'tmp%'

GROUP BY column_name, Data_type, Character_maximum_length, Numeric_precision, Numeric_Scale

)

SELECT c1.* FROM cte c1

INNER JOIN cte c2 ON c1.column_name = c2.Column_name

WHERE c2.rownum=2

Thank you very much for your initial script.

Best regards,

Henrik Staun Poulsen

Stovi Software

m mcdonald Hall of Fame Points: 3264 More actions · Answer 8

A bit late but I made the following updates which were useful to me -- flag table versus view and identify if a field is an identity field. Maybe someone else will as well...

/*----------------------------------------------------------------------

Purpose: Identify columns having different datatypes, for the same column name.

Sorted by the prevalence of the mismatched column.

------------------------------------------------------------------------

Revision History:

06/01/2008 Ian_Stirk@yahoo.com Initial version.

03/11/2014Martin McDonald -- Modified to add is_table and is_identity information

-----------------------------------------------------------------------*/

-- Do not lock anything, and do not get held up by any locks.

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

-- Calculate prevalence of column name

SELECT

COLUMN_NAME

,[%] = CONVERT(DECIMAL(12,2),COUNT(COLUMN_NAME)* 100.0 / COUNT(*)OVER())

INTO #Prevalence

FROM INFORMATION_SCHEMA.COLUMNS

GROUP BY COLUMN_NAME

-- Do the columns differ on datatype across the schemas and tables?

SELECT DISTINCT

C1.COLUMN_NAME

, C1.TABLE_SCHEMA

, C1.TABLE_NAME

, case

when t.name is null then 'View'

else 'Table'

end as is_table

, c.is_identity

, C1.DATA_TYPE

, C1.CHARACTER_MAXIMUM_LENGTH

, C1.NUMERIC_PRECISION

, C1.NUMERIC_SCALE

, [%]

FROM INFORMATION_SCHEMA.COLUMNS C1

INNER JOIN INFORMATION_SCHEMA.COLUMNS C2 ON C1.COLUMN_NAME = C2.COLUMN_NAME

INNER JOIN #Prevalence p ON p.COLUMN_NAME = C1.COLUMN_NAME

left outer join sys.tables t on c1.TABLE_NAME = t.name

left outer join sys.columns c on t.object_id = c.object_id

and c1.COLUMN_NAME = c.name

WHERE ((C1.DATA_TYPE != C2.DATA_TYPE)

OR (C1.CHARACTER_MAXIMUM_LENGTH != C2.CHARACTER_MAXIMUM_LENGTH)

OR (C1.NUMERIC_PRECISION != C2.NUMERIC_PRECISION)

OR (C1.NUMERIC_SCALE != C2.NUMERIC_SCALE))

ORDER BY [%] DESC, C1.COLUMN_NAME, C1.TABLE_SCHEMA, C1.TABLE_NAME

-- Tidy up.

DROP TABLE #Prevalence