January 22, 2010 at 3:20 pm
How would i find size of each table in a large database and find if there are any exact duplicate records in that table.
January 22, 2010 at 3:30 pm
Below script can be used to find the number of records and the size of each table in a database
CREATE TABLE #temp (
table_name sysname ,
row_count INT,
reserved_size VARCHAR(50),
data_size VARCHAR(50),
index_size VARCHAR(50),
unused_size VARCHAR(50))
SET NOCOUNT ON
INSERT #temp
EXEC sp_msforeachtable 'sp_spaceused ''?'''
SELECT a.table_name,
a.row_count,
COUNT(*) AS col_count,
a.data_size
FROM #temp a
INNER JOIN information_schema.columns b
ON a.table_name collate database_default
= b.table_name collate database_default
GROUP BY a.table_name, a.row_count, a.data_size
ORDER BY CAST(REPLACE(a.data_size, ' KB', '') AS integer) DESC
DROP TABLE #temp
And to find Duplicates, you need to specify the column on which you want to find if there are duplicates but the basic query is something like belowSELECT column,
COUNT(column) AS NumOccurrences
FROM table
GROUP BY column
HAVING ( COUNT(column) > 1 )
You can also loop through multiple databases if they have same table and column name
Hope this helps...
The_SQL_DBA
MCTS
"Quality is never an accident; it is always the result of high intention, sincere effort, intelligent direction and skillful execution; it represents the wise choice of many alternatives."
January 22, 2010 at 3:40 pm
Here is an alternative script for Table Size in a database
-- Create the temp table for further querying
CREATE TABLE #temp(
rec_idint IDENTITY (1, 1),
table_namevarchar(128),
nbr_of_rowsint,
data_spacedecimal(15,2),
index_spacedecimal(15,2),
total_sizedecimal(15,2),
percent_of_dbdecimal(15,12),
db_sizedecimal(15,2))
-- Get all tables, names, and sizes
EXEC sp_msforeachtable @command1="insert into #temp(nbr_of_rows, data_space, index_space) exec sp_mstablespace '?'",
@command2="update #temp set table_name = '?' where rec_id = (select max(rec_id) from #temp)"
-- Set the total_size and total database size fields
UPDATE #temp
SET total_size = (data_space + index_space)/1024, db_size = (SELECT SUM(data_space + index_space)/1024 FROM #temp)
-- Set the percent of the total database size
UPDATE #temp
SET percent_of_db = (total_size/db_size) * 100
-- Get the data
SELECT *
FROM #temp
--ORDER BY nbr_of_rows DESC
ORDER BY total_size DESC
-- Comment out the following line if you want to do further querying
DROP TABLE #temp
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
January 22, 2010 at 4:13 pm
find if there are any exact duplicate records in that table.
Here is a CTE to find duplicate records within a single table
;with numbered as(SELECT rowno=row_number() over
(partition by *1 order by *2), *3 FROM *4)
select * from numbered
Substitute for :
*1 the column name(s) which you are checking for duplicate values
*2 the order in which you want the data presented
*3 any other column name(s) (comma separated) of interest
*4 the table name
Any row returned with a rowno value greater than 1 is a duplicate record
The results for checking for duplicate ProductID with a duplicate CustomerId-- that is *1
yielded:
rownoPRODUCTID CustomerId
1 5 5
2 5 5
1 10 1
1 10 5
1 20 101
2 20 101
3 20 101
Without table definitions, sample data and desired results the above has NOT been checked so Test, test and test again prior to using on a production database
January 22, 2010 at 4:41 pm
Using sp_mstablespace won't give you any unused space that the table is holding and sp_msforeachtable won't return internal tables. We were having issues pinpointing our space discrepancies, which were largely due to internal table usage by service broker, so we switched to a version of the following (this is an older version where I've stripped out some things that are specific to our environment)
DECLARE @idINT
,@pagesBIGINT
,@logsize BIGINT
,@reservedpages BIGINT
,@usedpages BIGINT
,@rowCount BIGINT
DECLARE @table_name VARCHAR(500) ;
DECLARE @schema_name VARCHAR(500) ;
DECLARE @tab1 TABLE(
tablename VARCHAR (500) COLLATE database_default
, schemaname VARCHAR(500) COLLATE database_default
);
DECLARE @tempspacetable TABLE(
[tablename] [SYSNAME]
,[row_count] [BIGINT]
,[reserved] [VARCHAR](50)
,[data] [VARCHAR](50)
,[index_size] [VARCHAR](50)
,[unused] [VARCHAR](50)
,[schemaname] [VARCHAR](500)
);
DECLARE @temp_table TABLE (
tablename SYSNAME
, row_count BIGINT
, reserved VARCHAR(50)
, data VARCHAR(50)
, index_size VARCHAR(50)
, unused VARCHAR(50)
);
INSERT INTO @tab1
SELECT t1.name
, t2.name
FROM sys.objects t1 WITH (NOLOCK)
INNER JOIN sys.schemas t2 WITH (NOLOCK)
ON ( t1.schema_id = t2.schema_id )
WHERE t1.type in ('U', 'IT', 'S','V');
DECLARE c1 CURSOR FOR
SELECT t2.name + '.' + t1.name
FROM sys.objects t1 WITH (NOLOCK)
INNER JOIN sys.schemas t2 WITH (NOLOCK)
ON ( t1.schema_id = t2.schema_id )
WHERE t1.type IN ('U', 'IT', 'S','V')
AND t2.name NOT IN ('INFORMATION_SCHEMA');
OPEN c1;
FETCH NEXT FROM c1 INTO @table_name;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @table_name = replace(@table_name, '[','');
SET @table_name = replace(@table_name, ']','');
IF EXISTS(SELECT object_id FROM sys.objects WITH (NOLOCK) WHERE object_id = object_id(@table_name))
BEGIN
SELECT @id = object_id(@table_name)
SELECT
@reservedpages = SUM (reserved_page_count),
@usedpages = SUM (used_page_count),
@pages = SUM (
CASE
WHEN (index_id < 2) THEN (in_row_data_page_count + lob_used_page_count + row_overflow_used_page_count)
ELSE lob_used_page_count + row_overflow_used_page_count
END
),
@rowCount = SUM (
CASE
WHEN (index_id < 2) THEN row_count
ELSE 0
END
)
FROM sys.dm_db_partition_stats WITH (NOLOCK)
WHERE object_id = @id;
IF (SELECT count(*) FROM sys.internal_tables WITH (NOLOCK) WHERE parent_id = @id AND internal_type IN (202,204)) > 0
BEGIN
SELECT
@reservedpages = @reservedpages + sum(reserved_page_count),
@usedpages = @usedpages + sum(used_page_count)
FROM sys.dm_db_partition_stats p WITH (NOLOCK), sys.internal_tables it WITH (NOLOCK)
WHERE it.parent_id = @id AND it.internal_type IN (202,204) AND p.object_id = it.object_id;
END
INSERT INTO @temp_table
VALUES (
schema_name(objectproperty(@id,'schemaid')) + '.' + OBJECT_NAME (@id),
convert (char(11), @rowCount),
LTRIM (STR (@reservedpages * 8, 15, 0) + ' KB'),
LTRIM (STR (@pages * 8, 15, 0) + ' KB'),
LTRIM (STR ((CASE WHEN @usedpages > @pages THEN (@usedpages - @pages) ELSE 0 END) * 8, 15, 0) + ' KB'),
LTRIM (STR ((CASE WHEN @reservedpages > @usedpages THEN (@reservedpages - @usedpages) ELSE 0 END) * 8, 15, 0) + ' KB')
);
END
FETCH NEXT FROM C1 INTO @TABLE_NAME;
END;
CLOSE c1;
DEALLOCATE c1;
INSERT INTO @tempspacetable (tablename, row_count, reserved, data, index_size, unused, SchemaName )
SELECT t1.*
, t2.schemaname
FROM @temp_table t1
INNER JOIN @tab1 t2
ON (t1.tablename = t2.schemaname+'.'+ t2.tablename )
WHERE row_count >= 0
ORDER BY row_count DESC , schemaname, t2.tablename;
SELECT *
FROM @tempspacetable
ORDER BY tablename
January 22, 2010 at 4:49 pm
January 22, 2010 at 4:59 pm
Dugi (1/22/2010)
I vote for CTE to find the duplicates in a table!:w00t::cool::hehe:
I second that vote.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
January 22, 2010 at 5:03 pm
SQLPirate (1/22/2010)
Using sp_mstablespace won't give you any unused space that the table is holding and sp_msforeachtable won't return internal tables. We were having issues pinpointing our space discrepancies, which were largely due to internal table usage by service broker, so we switched to a version of the following (this is an older version where I've stripped out some things that are specific to our environment)
DECLARE @idINT
,@pagesBIGINT
,@logsize BIGINT
,@reservedpages BIGINT
,@usedpages BIGINT
,@rowCount BIGINT
DECLARE @table_name VARCHAR(500) ;
DECLARE @schema_name VARCHAR(500) ;
DECLARE @tab1 TABLE(
tablename VARCHAR (500) COLLATE database_default
, schemaname VARCHAR(500) COLLATE database_default
);
DECLARE @tempspacetable TABLE(
[tablename] [SYSNAME]
,[row_count] [BIGINT]
,[reserved] [VARCHAR](50)
,[data] [VARCHAR](50)
,[index_size] [VARCHAR](50)
,[unused] [VARCHAR](50)
,[schemaname] [VARCHAR](500)
);
DECLARE @temp_table TABLE (
tablename SYSNAME
, row_count BIGINT
, reserved VARCHAR(50)
, data VARCHAR(50)
, index_size VARCHAR(50)
, unused VARCHAR(50)
);
INSERT INTO @tab1
SELECT t1.name
, t2.name
FROM sys.objects t1 WITH (NOLOCK)
INNER JOIN sys.schemas t2 WITH (NOLOCK)
ON ( t1.schema_id = t2.schema_id )
WHERE t1.type in ('U', 'IT', 'S','V');
DECLARE c1 CURSOR FOR
SELECT t2.name + '.' + t1.name
FROM sys.objects t1 WITH (NOLOCK)
INNER JOIN sys.schemas t2 WITH (NOLOCK)
ON ( t1.schema_id = t2.schema_id )
WHERE t1.type IN ('U', 'IT', 'S','V')
AND t2.name NOT IN ('INFORMATION_SCHEMA');
OPEN c1;
FETCH NEXT FROM c1 INTO @table_name;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @table_name = replace(@table_name, '[','');
SET @table_name = replace(@table_name, ']','');
IF EXISTS(SELECT object_id FROM sys.objects WITH (NOLOCK) WHERE object_id = object_id(@table_name))
BEGIN
SELECT @id = object_id(@table_name)
SELECT
@reservedpages = SUM (reserved_page_count),
@usedpages = SUM (used_page_count),
@pages = SUM (
CASE
WHEN (index_id < 2) THEN (in_row_data_page_count + lob_used_page_count + row_overflow_used_page_count)
ELSE lob_used_page_count + row_overflow_used_page_count
END
),
@rowCount = SUM (
CASE
WHEN (index_id < 2) THEN row_count
ELSE 0
END
)
FROM sys.dm_db_partition_stats WITH (NOLOCK)
WHERE object_id = @id;
IF (SELECT count(*) FROM sys.internal_tables WITH (NOLOCK) WHERE parent_id = @id AND internal_type IN (202,204)) > 0
BEGIN
SELECT
@reservedpages = @reservedpages + sum(reserved_page_count),
@usedpages = @usedpages + sum(used_page_count)
FROM sys.dm_db_partition_stats p WITH (NOLOCK), sys.internal_tables it WITH (NOLOCK)
WHERE it.parent_id = @id AND it.internal_type IN (202,204) AND p.object_id = it.object_id;
END
INSERT INTO @temp_table
VALUES (
schema_name(objectproperty(@id,'schemaid')) + '.' + OBJECT_NAME (@id),
convert (char(11), @rowCount),
LTRIM (STR (@reservedpages * 8, 15, 0) + ' KB'),
LTRIM (STR (@pages * 8, 15, 0) + ' KB'),
LTRIM (STR ((CASE WHEN @usedpages > @pages THEN (@usedpages - @pages) ELSE 0 END) * 8, 15, 0) + ' KB'),
LTRIM (STR ((CASE WHEN @reservedpages > @usedpages THEN (@reservedpages - @usedpages) ELSE 0 END) * 8, 15, 0) + ' KB')
);
END
FETCH NEXT FROM C1 INTO @TABLE_NAME;
END;
CLOSE c1;
DEALLOCATE c1;
INSERT INTO @tempspacetable (tablename, row_count, reserved, data, index_size, unused, SchemaName )
SELECT t1.*
, t2.schemaname
FROM @temp_table t1
INNER JOIN @tab1 t2
ON (t1.tablename = t2.schemaname+'.'+ t2.tablename )
WHERE row_count >= 0
ORDER BY row_count DESC , schemaname, t2.tablename;
SELECT *
FROM @tempspacetable
ORDER BY tablename
Nice Script. Runs very quickly.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
January 22, 2010 at 5:11 pm
CirquedeSQLeil (1/22/2010)
SQLPirate (1/22/2010)
Using sp_mstablespace won't give you any unused space that the table is holding and sp_msforeachtable won't return internal tables. We were having issues pinpointing our space discrepancies, which were largely due to internal table usage by service broker, so we switched to a version of the following (this is an older version where I've stripped out some things that are specific to our environment)
DECLARE @idINT
,@pagesBIGINT
,@logsize BIGINT
,@reservedpages BIGINT
,@usedpages BIGINT
,@rowCount BIGINT
DECLARE @table_name VARCHAR(500) ;
DECLARE @schema_name VARCHAR(500) ;
DECLARE @tab1 TABLE(
tablename VARCHAR (500) COLLATE database_default
, schemaname VARCHAR(500) COLLATE database_default
);
DECLARE @tempspacetable TABLE(
[tablename] [SYSNAME]
,[row_count] [BIGINT]
,[reserved] [VARCHAR](50)
,[data] [VARCHAR](50)
,[index_size] [VARCHAR](50)
,[unused] [VARCHAR](50)
,[schemaname] [VARCHAR](500)
);
DECLARE @temp_table TABLE (
tablename SYSNAME
, row_count BIGINT
, reserved VARCHAR(50)
, data VARCHAR(50)
, index_size VARCHAR(50)
, unused VARCHAR(50)
);
INSERT INTO @tab1
SELECT t1.name
, t2.name
FROM sys.objects t1 WITH (NOLOCK)
INNER JOIN sys.schemas t2 WITH (NOLOCK)
ON ( t1.schema_id = t2.schema_id )
WHERE t1.type in ('U', 'IT', 'S','V');
DECLARE c1 CURSOR FOR
SELECT t2.name + '.' + t1.name
FROM sys.objects t1 WITH (NOLOCK)
INNER JOIN sys.schemas t2 WITH (NOLOCK)
ON ( t1.schema_id = t2.schema_id )
WHERE t1.type IN ('U', 'IT', 'S','V')
AND t2.name NOT IN ('INFORMATION_SCHEMA');
OPEN c1;
FETCH NEXT FROM c1 INTO @table_name;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @table_name = replace(@table_name, '[','');
SET @table_name = replace(@table_name, ']','');
IF EXISTS(SELECT object_id FROM sys.objects WITH (NOLOCK) WHERE object_id = object_id(@table_name))
BEGIN
SELECT @id = object_id(@table_name)
SELECT
@reservedpages = SUM (reserved_page_count),
@usedpages = SUM (used_page_count),
@pages = SUM (
CASE
WHEN (index_id < 2) THEN (in_row_data_page_count + lob_used_page_count + row_overflow_used_page_count)
ELSE lob_used_page_count + row_overflow_used_page_count
END
),
@rowCount = SUM (
CASE
WHEN (index_id < 2) THEN row_count
ELSE 0
END
)
FROM sys.dm_db_partition_stats WITH (NOLOCK)
WHERE object_id = @id;
IF (SELECT count(*) FROM sys.internal_tables WITH (NOLOCK) WHERE parent_id = @id AND internal_type IN (202,204)) > 0
BEGIN
SELECT
@reservedpages = @reservedpages + sum(reserved_page_count),
@usedpages = @usedpages + sum(used_page_count)
FROM sys.dm_db_partition_stats p WITH (NOLOCK), sys.internal_tables it WITH (NOLOCK)
WHERE it.parent_id = @id AND it.internal_type IN (202,204) AND p.object_id = it.object_id;
END
INSERT INTO @temp_table
VALUES (
schema_name(objectproperty(@id,'schemaid')) + '.' + OBJECT_NAME (@id),
convert (char(11), @rowCount),
LTRIM (STR (@reservedpages * 8, 15, 0) + ' KB'),
LTRIM (STR (@pages * 8, 15, 0) + ' KB'),
LTRIM (STR ((CASE WHEN @usedpages > @pages THEN (@usedpages - @pages) ELSE 0 END) * 8, 15, 0) + ' KB'),
LTRIM (STR ((CASE WHEN @reservedpages > @usedpages THEN (@reservedpages - @usedpages) ELSE 0 END) * 8, 15, 0) + ' KB')
);
END
FETCH NEXT FROM C1 INTO @TABLE_NAME;
END;
CLOSE c1;
DEALLOCATE c1;
INSERT INTO @tempspacetable (tablename, row_count, reserved, data, index_size, unused, SchemaName )
SELECT t1.*
, t2.schemaname
FROM @temp_table t1
INNER JOIN @tab1 t2
ON (t1.tablename = t2.schemaname+'.'+ t2.tablename )
WHERE row_count >= 0
ORDER BY row_count DESC , schemaname, t2.tablename;
SELECT *
FROM @tempspacetable
ORDER BY tablename
Nice Script. Runs very quickly.
Yes I agree but I recommend to eliminate the sys schema!
January 22, 2010 at 5:14 pm
i took the liberty to update the script a little bit to allow better sorting based on different columns, such as Data.
By changing the Data column to a BigInt and then removing the KB concatenation, one can now sort on the Data field (as an example).
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
January 22, 2010 at 5:15 pm
Yes I agree but I recommend to eliminate the sys schema!
Oops, yeah that's also a remnant from our environmental needs. The information in our current version also contains servernames, dbnames, and the date that it's taken, among other things. It's all used for proactive monitoring and trending purposes, schema included 🙂
January 22, 2010 at 5:19 pm
i took the liberty to update the script a little bit to allow better sorting based on different columns, such as Data.
By changing the Data column to a BigInt and then removing the KB concatenation, one can now sort on the Data field (as an example).
Yeah, converting afterward is one of my biggest gripes about this, but it's so deeply embedded in our systems I haven't gotten around to implementing all the other changes that would need to occur for this piece of code to change.
January 22, 2010 at 5:45 pm
SQLPirate (1/22/2010)
Yes I agree but I recommend to eliminate the sys schema!
Oops, yeah that's also a remnant from our environmental needs. The information in our current version also contains servernames, dbnames, and the date that it's taken, among other things. It's all used for proactive monitoring and trending purposes, schema included 🙂
Yes. Easy fix and much cleaner.
Thanks again for the script. Better than mine.;-)
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
Viewing 13 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply