January 17, 2009 at 12:33 pm
Comments posted to this topic are about the item Spring Clean Your Database Schema
January 19, 2009 at 6:08 am
Since this only works in SQL 2005 (due to the "OVER" clause on the "COUNT(*)" statement), you could replace the temporary table with a CTE.
Alternatively, for SQL 2000, you could use a nested query, and order by "COUNT(COLUMN_NAME) DESC" instead.
January 19, 2009 at 6:37 am
Half the problem here is that SQL isn't very good at relational 'domains' i.e. a standard definition of a 'type' of column, with predefinied rules (constraints), datatype, valid values etc.
Also missing is the ability to abstract types i.e. when defining parameters in t/sql you can't define them based on a domain, or even an existing column (in ORACLE this is done with the '%' operator e.g. title book.title%TYPE
)
Finally, the lack of a 'record' datatype also makes life difficult, as you can't easily define a full set of local variables that map the types of a row in a table, resulting ineven MORE duplicate definitions e.g:
--define a PL/SQL table containing entries of type book_rec:
Type book_rec_tab IS TABLE OF book_rec%TYPE
INDEX BY BINARY_INTEGER;
Personally it's about time this stuff arrived! It would do more for consistency in typing than any amount of tools.
However, this is a useful stopgap tool - many thanks to the author!
January 19, 2009 at 7:09 am
To make this work in SQL 2000
Change the first Query to:
[Code]
-- Calculate prevalence of column name
SELECT COLUMN_NAME, CONVERT(DECIMAL(12,2), Count(*)* 100.0/t.total) as [%]
INTO #Prevalence
FROM INFORMATION_SCHEMA.COLUMNS
CROSS JOIN
(SELECT Count(*) as total FROM INFORMATION_SCHEMA.COLUMNS) t
GROUP BY COLUMN_NAME, t.total
-- Do the columns differ on datatype across the schemas and tables?
[/code]
--
JimFive
January 19, 2009 at 8:06 am
Ha! Just last Friday, I discovered I could not define a foreign key because the datatype was different between two tables. The first thing on my To-Do list for today was to check all the tables to see what other tables might be using the incorrect datatype.
Thanks for writing the query I need for me!
January 19, 2009 at 8:46 am
I decided I wanted the Prevalence to be the percentage of all tables not of all columns, so I used James Goodwin's code and modified it to get the total table count.
SELECT COLUMN_NAME, CONVERT(DECIMAL(12,2), Count(*)* 100.0/t.totalTables) as [%]
INTO #Prevalence
FROM INFORMATION_SCHEMA.COLUMNS
CROSS JOIN
(SELECT Count(*) as totalTables FROM INFORMATION_SCHEMA.TABLES ) t
GROUP BY COLUMN_NAME, t.totalTables
Select * from #Prevalence gives me useful information, all by itself.
January 19, 2009 at 3:00 pm
In SQL Server 2008 in the first statement (SET TRANSACTION)
ISOLATIONLEVEL should be
ISOLATION LEVEL
Herman
January 28, 2009 at 12:47 am
Ian,
I only wanted to see the column names that appear more than once, so that I exclude all the columns that are ok.
So I re-wrote your query to:
-- From http://www.sqlservercentral.com/articles/Admin/65138/
/*----------------------------------------------------------------------
Purpose: Identify columns having different datatypes, for the same column name.
Sorted by the prevalence of the mismatched column.
------------------------------------------------------------------------
Revision History:
06/01/2008 Ian_Stirk@yahoo.com Initial version.
2009-01-28 hsp@stovi.com; only show the column names that appear more than once
-----------------------------------------------------------------------*/
-- Do not lock anything, and do not get held up by any locks.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
BEGIN TRY
DROP TABLE #Prevalence
END TRY
BEGIN CATCH
END CATCH
BEGIN TRY
DROP TABLE #FieldList
END TRY
BEGIN CATCH
END CATCH
-- Calculate prevalence of column name
SELECT
COLUMN_NAME
,[%] = CONVERT(DECIMAL(12,2),COUNT(COLUMN_NAME)* 100.0 / COUNT(*)OVER())
INTO #Prevalence
FROM INFORMATION_SCHEMA.COLUMNS
GROUP BY COLUMN_NAME
-- Do the columns differ on datatype across the schemas and tables?
SELECT DISTINCT
C1.COLUMN_NAME
, C1.TABLE_SCHEMA + '.' + C1.TABLE_NAME AS TableName
, C1.DATA_TYPE
, C1.CHARACTER_MAXIMUM_LENGTH
, C1.NUMERIC_PRECISION
, C1.NUMERIC_SCALE
, [%]
INTO #FieldList
FROM INFORMATION_SCHEMA.COLUMNS C1
INNER JOIN INFORMATION_SCHEMA.COLUMNS C2 ON C1.COLUMN_NAME = C2.COLUMN_NAME
INNER JOIN #Prevalence p ON p.COLUMN_NAME = C1.COLUMN_NAME
WHERE ((C1.DATA_TYPE != C2.DATA_TYPE)
OR (C1.CHARACTER_MAXIMUM_LENGTH != C2.CHARACTER_MAXIMUM_LENGTH)
OR (C1.NUMERIC_PRECISION != C2.NUMERIC_PRECISION)
OR (C1.NUMERIC_SCALE != C2.NUMERIC_SCALE))
ORDER BY [%] DESC, C1.COLUMN_NAME, C1.TABLE_SCHEMA + '.' + C1.TABLE_NAME
-- Tidy up.
DROP TABLE #Prevalence
--SELECT * FROM #fieldList ORDER BY column_name
;
WITH cte (column_name, Data_type, Character_maximum_length, Numeric_precision, Numeric_Scale, UsedInTables, rownum) as
(
SELECT column_name
, MAX(data_type) AS Data_type, MAX(Character_maximum_length) AS Character_maximum_length, MAX(Numeric_precision) AS Numeric_precision
, MAX(Numeric_Scale) AS Numeric_Scale
, (SELECT TableName + ', ' FROM #fieldList F1
WHERE f1.Column_name= F.Column_name
AND (f1.Data_type = F.Data_Type OR F.Data_Type IS NULL)
AND (f1.Character_maximum_length = f.Character_maximum_length OR f.Character_maximum_length IS NULL)
AND (f1.Numeric_precision = f.Numeric_precision OR f.Numeric_precision IS NULL)
AND (f1.Numeric_Scale = f.Numeric_Scale OR f.Numeric_Scale IS NULL)
for xml path('')) AS UsedInTables
, ROW_NUMBER() OVER ( PARTITION BY column_name ORDER BY data_type, Character_maximum_length) AS rownum
FROM #fieldList F
WHERE
f.tablename NOT LIKE 'tmp%'
GROUP BY column_name, Data_type, Character_maximum_length, Numeric_precision, Numeric_Scale
)
SELECT c1.* FROM cte c1
INNER JOIN cte c2 ON c1.column_name = c2.Column_name
WHERE c2.rownum=2
Thank you very much for your initial script.
Best regards,
Henrik Staun Poulsen
Stovi Software
March 11, 2014 at 7:11 am
A bit late but I made the following updates which were useful to me -- flag table versus view and identify if a field is an identity field. Maybe someone else will as well...
/*----------------------------------------------------------------------
Purpose: Identify columns having different datatypes, for the same column name.
Sorted by the prevalence of the mismatched column.
------------------------------------------------------------------------
Revision History:
06/01/2008 Ian_Stirk@yahoo.com Initial version.
03/11/2014Martin McDonald -- Modified to add is_table and is_identity information
-----------------------------------------------------------------------*/
-- Do not lock anything, and do not get held up by any locks.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
-- Calculate prevalence of column name
SELECT
COLUMN_NAME
,[%] = CONVERT(DECIMAL(12,2),COUNT(COLUMN_NAME)* 100.0 / COUNT(*)OVER())
INTO #Prevalence
FROM INFORMATION_SCHEMA.COLUMNS
GROUP BY COLUMN_NAME
-- Do the columns differ on datatype across the schemas and tables?
SELECT DISTINCT
C1.COLUMN_NAME
, C1.TABLE_SCHEMA
, C1.TABLE_NAME
, case
when t.name is null then 'View'
else 'Table'
end as is_table
, c.is_identity
, C1.DATA_TYPE
, C1.CHARACTER_MAXIMUM_LENGTH
, C1.NUMERIC_PRECISION
, C1.NUMERIC_SCALE
, [%]
FROM INFORMATION_SCHEMA.COLUMNS C1
INNER JOIN INFORMATION_SCHEMA.COLUMNS C2 ON C1.COLUMN_NAME = C2.COLUMN_NAME
INNER JOIN #Prevalence p ON p.COLUMN_NAME = C1.COLUMN_NAME
left outer join sys.tables t on c1.TABLE_NAME = t.name
left outer join sys.columns c on t.object_id = c.object_id
and c1.COLUMN_NAME = c.name
WHERE ((C1.DATA_TYPE != C2.DATA_TYPE)
OR (C1.CHARACTER_MAXIMUM_LENGTH != C2.CHARACTER_MAXIMUM_LENGTH)
OR (C1.NUMERIC_PRECISION != C2.NUMERIC_PRECISION)
OR (C1.NUMERIC_SCALE != C2.NUMERIC_SCALE))
ORDER BY [%] DESC, C1.COLUMN_NAME, C1.TABLE_SCHEMA, C1.TABLE_NAME
-- Tidy up.
DROP TABLE #Prevalence
Viewing 9 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy