Database Research

Question

Post reply

Database Research

Tara-1044200

SSCoach

Points: 15951
More actions
January 22, 2010 at 3:20 pm

#140903

How would i find size of each table in a large database and find if there are any exact duplicate records in that table.

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

The_SQL_DBA SSChampion Points: 10494 More actions · Answer 1

Below script can be used to find the number of records and the size of each table in a database

CREATE TABLE #temp (

table_name sysname ,

row_count INT,

reserved_size VARCHAR(50),

data_size VARCHAR(50),

index_size VARCHAR(50),

unused_size VARCHAR(50))

SET NOCOUNT ON

INSERT #temp

EXEC sp_msforeachtable 'sp_spaceused ''?'''

SELECT a.table_name,

a.row_count,

COUNT(*) AS col_count,

a.data_size

FROM #temp a

INNER JOIN information_schema.columns b

ON a.table_name collate database_default

= b.table_name collate database_default

GROUP BY a.table_name, a.row_count, a.data_size

ORDER BY CAST(REPLACE(a.data_size, ' KB', '') AS integer) DESC

DROP TABLE #temp

And to find Duplicates, you need to specify the column on which you want to find if there are duplicates but the basic query is something like belowSELECT column,

COUNT(column) AS NumOccurrences

FROM table

GROUP BY column

HAVING ( COUNT(column) > 1 )

You can also loop through multiple databases if they have same table and column name

Hope this helps...

The_SQL_DBA
MCTS

"Quality is never an accident; it is always the result of high intention, sincere effort, intelligent direction and skillful execution; it represents the wise choice of many alternatives."

SQLRNNR SSC Guru Points: 281344 More actions · Answer 2

Here is an alternative script for Table Size in a database

-- Create the temp table for further querying

CREATE TABLE #temp(

rec_idint IDENTITY (1, 1),

table_namevarchar(128),

nbr_of_rowsint,

data_spacedecimal(15,2),

index_spacedecimal(15,2),

total_sizedecimal(15,2),

percent_of_dbdecimal(15,12),

db_sizedecimal(15,2))

-- Get all tables, names, and sizes

EXEC sp_msforeachtable @command1="insert into #temp(nbr_of_rows, data_space, index_space) exec sp_mstablespace '?'",

@command2="update #temp set table_name = '?' where rec_id = (select max(rec_id) from #temp)"

-- Set the total_size and total database size fields

UPDATE #temp

SET total_size = (data_space + index_space)/1024, db_size = (SELECT SUM(data_space + index_space)/1024 FROM #temp)

-- Set the percent of the total database size

UPDATE #temp

SET percent_of_db = (total_size/db_size) * 100

-- Get the data

SELECT *

FROM #temp

--ORDER BY nbr_of_rows DESC

ORDER BY total_size DESC

-- Comment out the following line if you want to do further querying

DROP TABLE #temp

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Ron McCullough SSC Guru Points: 63877 More actions · Answer 3

find if there are any exact duplicate records in that table.

Here is a CTE to find duplicate records within a single table

;with numbered as(SELECT rowno=row_number() over

(partition by *1 order by *2), *3 FROM *4)

select * from numbered

Substitute for :

*1 the column name(s) which you are checking for duplicate values

*2 the order in which you want the data presented

*3 any other column name(s) (comma separated) of interest

*4 the table name

Any row returned with a rowno value greater than 1 is a duplicate record

The results for checking for duplicate ProductID with a duplicate CustomerId-- that is *1

yielded:

rownoPRODUCTID CustomerId

1 5 5

2 5 5

1 10 1

1 10 5

1 20 101

2 20 101

3 20 101

Without table definitions, sample data and desired results the above has NOT been checked so Test, test and test again prior to using on a production database

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

SQLPirate SSCrazy Points: 2880 More actions · Answer 4

Using sp_mstablespace won't give you any unused space that the table is holding and sp_msforeachtable won't return internal tables. We were having issues pinpointing our space discrepancies, which were largely due to internal table usage by service broker, so we switched to a version of the following (this is an older version where I've stripped out some things that are specific to our environment)

DECLARE @idINT

,@pagesBIGINT

,@logsize BIGINT

,@reservedpages BIGINT

,@usedpages BIGINT

,@rowCount BIGINT

DECLARE @table_name VARCHAR(500) ;

DECLARE @schema_name VARCHAR(500) ;

DECLARE @tab1 TABLE(

tablename VARCHAR (500) COLLATE database_default

, schemaname VARCHAR(500) COLLATE database_default

);

DECLARE @tempspacetable TABLE(

[tablename] [SYSNAME]

,[row_count] [BIGINT]

,[reserved] [VARCHAR](50)

,[data] [VARCHAR](50)

,[index_size] [VARCHAR](50)

,[unused] [VARCHAR](50)

,[schemaname] [VARCHAR](500)

);

DECLARE @temp_table TABLE (

tablename SYSNAME

, row_count BIGINT

, reserved VARCHAR(50)

, data VARCHAR(50)

, index_size VARCHAR(50)

, unused VARCHAR(50)

);

INSERT INTO @tab1

SELECT t1.name

, t2.name

FROM sys.objects t1 WITH (NOLOCK)

INNER JOIN sys.schemas t2 WITH (NOLOCK)

ON ( t1.schema_id = t2.schema_id )

WHERE t1.type in ('U', 'IT', 'S','V');

DECLARE c1 CURSOR FOR

SELECT t2.name + '.' + t1.name

FROM sys.objects t1 WITH (NOLOCK)

INNER JOIN sys.schemas t2 WITH (NOLOCK)

ON ( t1.schema_id = t2.schema_id )

WHERE t1.type IN ('U', 'IT', 'S','V')

AND t2.name NOT IN ('INFORMATION_SCHEMA');

OPEN c1;

FETCH NEXT FROM c1 INTO @table_name;

WHILE @@FETCH_STATUS = 0

BEGIN

SET @table_name = replace(@table_name, '[','');

SET @table_name = replace(@table_name, ']','');

IF EXISTS(SELECT object_id FROM sys.objects WITH (NOLOCK) WHERE object_id = object_id(@table_name))

BEGIN

SELECT @id = object_id(@table_name)

SELECT

@reservedpages = SUM (reserved_page_count),

@usedpages = SUM (used_page_count),

@pages = SUM (

CASE

WHEN (index_id < 2) THEN (in_row_data_page_count + lob_used_page_count + row_overflow_used_page_count)

ELSE lob_used_page_count + row_overflow_used_page_count

END

),

@rowCount = SUM (

CASE

WHEN (index_id < 2) THEN row_count

ELSE 0

END

)

FROM sys.dm_db_partition_stats WITH (NOLOCK)

WHERE object_id = @id;

IF (SELECT count(*) FROM sys.internal_tables WITH (NOLOCK) WHERE parent_id = @id AND internal_type IN (202,204)) > 0

BEGIN

SELECT

@reservedpages = @reservedpages + sum(reserved_page_count),

@usedpages = @usedpages + sum(used_page_count)

FROM sys.dm_db_partition_stats p WITH (NOLOCK), sys.internal_tables it WITH (NOLOCK)

WHERE it.parent_id = @id AND it.internal_type IN (202,204) AND p.object_id = it.object_id;

END

INSERT INTO @temp_table

VALUES (

schema_name(objectproperty(@id,'schemaid')) + '.' + OBJECT_NAME (@id),

convert (char(11), @rowCount),

LTRIM (STR (@reservedpages * 8, 15, 0) + ' KB'),

LTRIM (STR (@pages * 8, 15, 0) + ' KB'),

LTRIM (STR ((CASE WHEN @usedpages > @pages THEN (@usedpages - @pages) ELSE 0 END) * 8, 15, 0) + ' KB'),

LTRIM (STR ((CASE WHEN @reservedpages > @usedpages THEN (@reservedpages - @usedpages) ELSE 0 END) * 8, 15, 0) + ' KB')

);

END

FETCH NEXT FROM C1 INTO @TABLE_NAME;

END;

CLOSE c1;

DEALLOCATE c1;

INSERT INTO @tempspacetable (tablename, row_count, reserved, data, index_size, unused, SchemaName )

SELECT t1.*

, t2.schemaname

FROM @temp_table t1

INNER JOIN @tab1 t2

ON (t1.tablename = t2.schemaname+'.'+ t2.tablename )

WHERE row_count >= 0

ORDER BY row_count DESC , schemaname, t2.tablename;

SELECT *

FROM @tempspacetable

ORDER BY tablename

Dugi SSCoach Points: 17998 More actions · Answer 5

I vote for CTE to find the duplicates in a table!

:w00t::cool::hehe:

============================================================
SELECT YOUR PROBLEM FROM SSC.com WHERE PROBLEM DESCRIPTION =
http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]

SQLRNNR SSC Guru Points: 281344 More actions · Answer 6

Dugi (1/22/2010)
I vote for CTE to find the duplicates in a table!
:w00t::cool::hehe:

I second that vote.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

SQLRNNR SSC Guru Points: 281344 More actions · Answer 7

SQLPirate (1/22/2010)
Using sp_mstablespace won't give you any unused space that the table is holding and sp_msforeachtable won't return internal tables. We were having issues pinpointing our space discrepancies, which were largely due to internal table usage by service broker, so we switched to a version of the following (this is an older version where I've stripped out some things that are specific to our environment)
DECLARE @idINT
,@pagesBIGINT
,@logsize BIGINT
,@reservedpages BIGINT
,@usedpages BIGINT
,@rowCount BIGINT
DECLARE @table_name VARCHAR(500) ;
DECLARE @schema_name VARCHAR(500) ;
DECLARE @tab1 TABLE(
tablename VARCHAR (500) COLLATE database_default
, schemaname VARCHAR(500) COLLATE database_default
);
DECLARE @tempspacetable TABLE(
[tablename] [SYSNAME]
,[row_count] [BIGINT]
,[reserved] [VARCHAR](50)
,[data] [VARCHAR](50)
,[index_size] [VARCHAR](50)
,[unused] [VARCHAR](50)
,[schemaname] [VARCHAR](500)
);
DECLARE @temp_table TABLE (
tablename SYSNAME
, row_count BIGINT
, reserved VARCHAR(50)
, data VARCHAR(50)
, index_size VARCHAR(50)
, unused VARCHAR(50)
);
INSERT INTO @tab1
SELECT t1.name
, t2.name
FROM sys.objects t1 WITH (NOLOCK)
INNER JOIN sys.schemas t2 WITH (NOLOCK)
ON ( t1.schema_id = t2.schema_id )
WHERE t1.type in ('U', 'IT', 'S','V');
DECLARE c1 CURSOR FOR
SELECT t2.name + '.' + t1.name
FROM sys.objects t1 WITH (NOLOCK)
INNER JOIN sys.schemas t2 WITH (NOLOCK)
ON ( t1.schema_id = t2.schema_id )
WHERE t1.type IN ('U', 'IT', 'S','V')
AND t2.name NOT IN ('INFORMATION_SCHEMA');
OPEN c1;
FETCH NEXT FROM c1 INTO @table_name;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @table_name = replace(@table_name, '[','');
SET @table_name = replace(@table_name, ']','');
IF EXISTS(SELECT object_id FROM sys.objects WITH (NOLOCK) WHERE object_id = object_id(@table_name))
BEGIN
SELECT @id = object_id(@table_name)
SELECT
@reservedpages = SUM (reserved_page_count),
@usedpages = SUM (used_page_count),
@pages = SUM (
CASE
WHEN (index_id < 2) THEN (in_row_data_page_count + lob_used_page_count + row_overflow_used_page_count)
ELSE lob_used_page_count + row_overflow_used_page_count
END
),
@rowCount = SUM (
CASE
WHEN (index_id < 2) THEN row_count
ELSE 0
END
)
FROM sys.dm_db_partition_stats WITH (NOLOCK)
WHERE object_id = @id;
IF (SELECT count(*) FROM sys.internal_tables WITH (NOLOCK) WHERE parent_id = @id AND internal_type IN (202,204)) > 0
BEGIN
SELECT
@reservedpages = @reservedpages + sum(reserved_page_count),
@usedpages = @usedpages + sum(used_page_count)
FROM sys.dm_db_partition_stats p WITH (NOLOCK), sys.internal_tables it WITH (NOLOCK)
WHERE it.parent_id = @id AND it.internal_type IN (202,204) AND p.object_id = it.object_id;
END
INSERT INTO @temp_table
VALUES (
schema_name(objectproperty(@id,'schemaid')) + '.' + OBJECT_NAME (@id),
convert (char(11), @rowCount),
LTRIM (STR (@reservedpages * 8, 15, 0) + ' KB'),
LTRIM (STR (@pages * 8, 15, 0) + ' KB'),
LTRIM (STR ((CASE WHEN @usedpages > @pages THEN (@usedpages - @pages) ELSE 0 END) * 8, 15, 0) + ' KB'),
LTRIM (STR ((CASE WHEN @reservedpages > @usedpages THEN (@reservedpages - @usedpages) ELSE 0 END) * 8, 15, 0) + ' KB')
);
END
FETCH NEXT FROM C1 INTO @TABLE_NAME;
END;
CLOSE c1;
DEALLOCATE c1;
INSERT INTO @tempspacetable (tablename, row_count, reserved, data, index_size, unused, SchemaName )
SELECT t1.*
, t2.schemaname
FROM @temp_table t1
INNER JOIN @tab1 t2
ON (t1.tablename = t2.schemaname+'.'+ t2.tablename )
WHERE row_count >= 0
ORDER BY row_count DESC , schemaname, t2.tablename;
SELECT *
FROM @tempspacetable
ORDER BY tablename

Nice Script. Runs very quickly.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Dugi SSCoach Points: 17998 More actions · Answer 8

CirquedeSQLeil (1/22/2010)
SQLPirate (1/22/2010)
Using sp_mstablespace won't give you any unused space that the table is holding and sp_msforeachtable won't return internal tables. We were having issues pinpointing our space discrepancies, which were largely due to internal table usage by service broker, so we switched to a version of the following (this is an older version where I've stripped out some things that are specific to our environment)
DECLARE @idINT
,@pagesBIGINT
,@logsize BIGINT
,@reservedpages BIGINT
,@usedpages BIGINT
,@rowCount BIGINT
DECLARE @table_name VARCHAR(500) ;
DECLARE @schema_name VARCHAR(500) ;
DECLARE @tab1 TABLE(
tablename VARCHAR (500) COLLATE database_default
, schemaname VARCHAR(500) COLLATE database_default
);
DECLARE @tempspacetable TABLE(
[tablename] [SYSNAME]
,[row_count] [BIGINT]
,[reserved] [VARCHAR](50)
,[data] [VARCHAR](50)
,[index_size] [VARCHAR](50)
,[unused] [VARCHAR](50)
,[schemaname] [VARCHAR](500)
);
DECLARE @temp_table TABLE (
tablename SYSNAME
, row_count BIGINT
, reserved VARCHAR(50)
, data VARCHAR(50)
, index_size VARCHAR(50)
, unused VARCHAR(50)
);
INSERT INTO @tab1
SELECT t1.name
, t2.name
FROM sys.objects t1 WITH (NOLOCK)
INNER JOIN sys.schemas t2 WITH (NOLOCK)
ON ( t1.schema_id = t2.schema_id )
WHERE t1.type in ('U', 'IT', 'S','V');
DECLARE c1 CURSOR FOR
SELECT t2.name + '.' + t1.name
FROM sys.objects t1 WITH (NOLOCK)
INNER JOIN sys.schemas t2 WITH (NOLOCK)
ON ( t1.schema_id = t2.schema_id )
WHERE t1.type IN ('U', 'IT', 'S','V')
AND t2.name NOT IN ('INFORMATION_SCHEMA');
OPEN c1;
FETCH NEXT FROM c1 INTO @table_name;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @table_name = replace(@table_name, '[','');
SET @table_name = replace(@table_name, ']','');
IF EXISTS(SELECT object_id FROM sys.objects WITH (NOLOCK) WHERE object_id = object_id(@table_name))
BEGIN
SELECT @id = object_id(@table_name)
SELECT
@reservedpages = SUM (reserved_page_count),
@usedpages = SUM (used_page_count),
@pages = SUM (
CASE
WHEN (index_id < 2) THEN (in_row_data_page_count + lob_used_page_count + row_overflow_used_page_count)
ELSE lob_used_page_count + row_overflow_used_page_count
END
),
@rowCount = SUM (
CASE
WHEN (index_id < 2) THEN row_count
ELSE 0
END
)
FROM sys.dm_db_partition_stats WITH (NOLOCK)
WHERE object_id = @id;
IF (SELECT count(*) FROM sys.internal_tables WITH (NOLOCK) WHERE parent_id = @id AND internal_type IN (202,204)) > 0
BEGIN
SELECT
@reservedpages = @reservedpages + sum(reserved_page_count),
@usedpages = @usedpages + sum(used_page_count)
FROM sys.dm_db_partition_stats p WITH (NOLOCK), sys.internal_tables it WITH (NOLOCK)
WHERE it.parent_id = @id AND it.internal_type IN (202,204) AND p.object_id = it.object_id;
END
INSERT INTO @temp_table
VALUES (
schema_name(objectproperty(@id,'schemaid')) + '.' + OBJECT_NAME (@id),
convert (char(11), @rowCount),
LTRIM (STR (@reservedpages * 8, 15, 0) + ' KB'),
LTRIM (STR (@pages * 8, 15, 0) + ' KB'),
LTRIM (STR ((CASE WHEN @usedpages > @pages THEN (@usedpages - @pages) ELSE 0 END) * 8, 15, 0) + ' KB'),
LTRIM (STR ((CASE WHEN @reservedpages > @usedpages THEN (@reservedpages - @usedpages) ELSE 0 END) * 8, 15, 0) + ' KB')
);
END
FETCH NEXT FROM C1 INTO @TABLE_NAME;
END;
CLOSE c1;
DEALLOCATE c1;
INSERT INTO @tempspacetable (tablename, row_count, reserved, data, index_size, unused, SchemaName )
SELECT t1.*
, t2.schemaname
FROM @temp_table t1
INNER JOIN @tab1 t2
ON (t1.tablename = t2.schemaname+'.'+ t2.tablename )
WHERE row_count >= 0
ORDER BY row_count DESC , schemaname, t2.tablename;
SELECT *
FROM @tempspacetable
ORDER BY tablename
Nice Script. Runs very quickly.

Yes I agree but I recommend to eliminate the sys schema!

============================================================
SELECT YOUR PROBLEM FROM SSC.com WHERE PROBLEM DESCRIPTION =
http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]

SQLRNNR SSC Guru Points: 281344 More actions · Answer 9

i took the liberty to update the script a little bit to allow better sorting based on different columns, such as Data.

By changing the Data column to a BigInt and then removing the KB concatenation, one can now sort on the Data field (as an example).

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

SQLPirate SSCrazy Points: 2880 More actions · Answer 10

Yes I agree but I recommend to eliminate the sys schema!

Oops, yeah that's also a remnant from our environmental needs. The information in our current version also contains servernames, dbnames, and the date that it's taken, among other things. It's all used for proactive monitoring and trending purposes, schema included 🙂

SQLPirate SSCrazy Points: 2880 More actions · Answer 11

i took the liberty to update the script a little bit to allow better sorting based on different columns, such as Data.
By changing the Data column to a BigInt and then removing the KB concatenation, one can now sort on the Data field (as an example).

Yeah, converting afterward is one of my biggest gripes about this, but it's so deeply embedded in our systems I haven't gotten around to implementing all the other changes that would need to occur for this piece of code to change.

SQLRNNR SSC Guru Points: 281344 More actions · Answer 12

SQLPirate (1/22/2010)
Yes I agree but I recommend to eliminate the sys schema!
Oops, yeah that's also a remnant from our environmental needs. The information in our current version also contains servernames, dbnames, and the date that it's taken, among other things. It's all used for proactive monitoring and trending purposes, schema included 🙂

Yes. Easy fix and much cleaner.

Thanks again for the script. Better than mine.;-)

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events