Database Research

  • How would i find size of each table in a large database and find if there are any exact duplicate records in that table.

  • Below script can be used to find the number of records and the size of each table in a database

    CREATE TABLE #temp (

    table_name sysname ,

    row_count INT,

    reserved_size VARCHAR(50),

    data_size VARCHAR(50),

    index_size VARCHAR(50),

    unused_size VARCHAR(50))

    SET NOCOUNT ON

    INSERT #temp

    EXEC sp_msforeachtable 'sp_spaceused ''?'''

    SELECT a.table_name,

    a.row_count,

    COUNT(*) AS col_count,

    a.data_size

    FROM #temp a

    INNER JOIN information_schema.columns b

    ON a.table_name collate database_default

    = b.table_name collate database_default

    GROUP BY a.table_name, a.row_count, a.data_size

    ORDER BY CAST(REPLACE(a.data_size, ' KB', '') AS integer) DESC

    DROP TABLE #temp

    And to find Duplicates, you need to specify the column on which you want to find if there are duplicates but the basic query is something like belowSELECT column,

    COUNT(column) AS NumOccurrences

    FROM table

    GROUP BY column

    HAVING ( COUNT(column) > 1 )

    You can also loop through multiple databases if they have same table and column name

    Hope this helps...

    The_SQL_DBA
    MCTS

    "Quality is never an accident; it is always the result of high intention, sincere effort, intelligent direction and skillful execution; it represents the wise choice of many alternatives."

  • Here is an alternative script for Table Size in a database

    -- Create the temp table for further querying

    CREATE TABLE #temp(

    rec_idint IDENTITY (1, 1),

    table_namevarchar(128),

    nbr_of_rowsint,

    data_spacedecimal(15,2),

    index_spacedecimal(15,2),

    total_sizedecimal(15,2),

    percent_of_dbdecimal(15,12),

    db_sizedecimal(15,2))

    -- Get all tables, names, and sizes

    EXEC sp_msforeachtable @command1="insert into #temp(nbr_of_rows, data_space, index_space) exec sp_mstablespace '?'",

    @command2="update #temp set table_name = '?' where rec_id = (select max(rec_id) from #temp)"

    -- Set the total_size and total database size fields

    UPDATE #temp

    SET total_size = (data_space + index_space)/1024, db_size = (SELECT SUM(data_space + index_space)/1024 FROM #temp)

    -- Set the percent of the total database size

    UPDATE #temp

    SET percent_of_db = (total_size/db_size) * 100

    -- Get the data

    SELECT *

    FROM #temp

    --ORDER BY nbr_of_rows DESC

    ORDER BY total_size DESC

    -- Comment out the following line if you want to do further querying

    DROP TABLE #temp

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • find if there are any exact duplicate records in that table.

    Here is a CTE to find duplicate records within a single table

    ;with numbered as(SELECT rowno=row_number() over

    (partition by *1 order by *2), *3 FROM *4)

    select * from numbered

    Substitute for :

    *1 the column name(s) which you are checking for duplicate values

    *2 the order in which you want the data presented

    *3 any other column name(s) (comma separated) of interest

    *4 the table name

    Any row returned with a rowno value greater than 1 is a duplicate record

    The results for checking for duplicate ProductID with a duplicate CustomerId-- that is *1

    yielded:

    rownoPRODUCTID CustomerId

    1 5 5

    2 5 5

    1 10 1

    1 10 5

    1 20 101

    2 20 101

    3 20 101

    Without table definitions, sample data and desired results the above has NOT been checked so Test, test and test again prior to using on a production database

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • Using sp_mstablespace won't give you any unused space that the table is holding and sp_msforeachtable won't return internal tables. We were having issues pinpointing our space discrepancies, which were largely due to internal table usage by service broker, so we switched to a version of the following (this is an older version where I've stripped out some things that are specific to our environment)

    DECLARE @idINT

    ,@pagesBIGINT

    ,@logsize BIGINT

    ,@reservedpages BIGINT

    ,@usedpages BIGINT

    ,@rowCount BIGINT

    DECLARE @table_name VARCHAR(500) ;

    DECLARE @schema_name VARCHAR(500) ;

    DECLARE @tab1 TABLE(

    tablename VARCHAR (500) COLLATE database_default

    , schemaname VARCHAR(500) COLLATE database_default

    );

    DECLARE @tempspacetable TABLE(

    [tablename] [SYSNAME]

    ,[row_count] [BIGINT]

    ,[reserved] [VARCHAR](50)

    ,[data] [VARCHAR](50)

    ,[index_size] [VARCHAR](50)

    ,[unused] [VARCHAR](50)

    ,[schemaname] [VARCHAR](500)

    );

    DECLARE @temp_table TABLE (

    tablename SYSNAME

    , row_count BIGINT

    , reserved VARCHAR(50)

    , data VARCHAR(50)

    , index_size VARCHAR(50)

    , unused VARCHAR(50)

    );

    INSERT INTO @tab1

    SELECT t1.name

    , t2.name

    FROM sys.objects t1 WITH (NOLOCK)

    INNER JOIN sys.schemas t2 WITH (NOLOCK)

    ON ( t1.schema_id = t2.schema_id )

    WHERE t1.type in ('U', 'IT', 'S','V');

    DECLARE c1 CURSOR FOR

    SELECT t2.name + '.' + t1.name

    FROM sys.objects t1 WITH (NOLOCK)

    INNER JOIN sys.schemas t2 WITH (NOLOCK)

    ON ( t1.schema_id = t2.schema_id )

    WHERE t1.type IN ('U', 'IT', 'S','V')

    AND t2.name NOT IN ('INFORMATION_SCHEMA');

    OPEN c1;

    FETCH NEXT FROM c1 INTO @table_name;

    WHILE @@FETCH_STATUS = 0

    BEGIN

    SET @table_name = replace(@table_name, '[','');

    SET @table_name = replace(@table_name, ']','');

    IF EXISTS(SELECT object_id FROM sys.objects WITH (NOLOCK) WHERE object_id = object_id(@table_name))

    BEGIN

    SELECT @id = object_id(@table_name)

    SELECT

    @reservedpages = SUM (reserved_page_count),

    @usedpages = SUM (used_page_count),

    @pages = SUM (

    CASE

    WHEN (index_id < 2) THEN (in_row_data_page_count + lob_used_page_count + row_overflow_used_page_count)

    ELSE lob_used_page_count + row_overflow_used_page_count

    END

    ),

    @rowCount = SUM (

    CASE

    WHEN (index_id < 2) THEN row_count

    ELSE 0

    END

    )

    FROM sys.dm_db_partition_stats WITH (NOLOCK)

    WHERE object_id = @id;

    IF (SELECT count(*) FROM sys.internal_tables WITH (NOLOCK) WHERE parent_id = @id AND internal_type IN (202,204)) > 0

    BEGIN

    SELECT

    @reservedpages = @reservedpages + sum(reserved_page_count),

    @usedpages = @usedpages + sum(used_page_count)

    FROM sys.dm_db_partition_stats p WITH (NOLOCK), sys.internal_tables it WITH (NOLOCK)

    WHERE it.parent_id = @id AND it.internal_type IN (202,204) AND p.object_id = it.object_id;

    END

    INSERT INTO @temp_table

    VALUES (

    schema_name(objectproperty(@id,'schemaid')) + '.' + OBJECT_NAME (@id),

    convert (char(11), @rowCount),

    LTRIM (STR (@reservedpages * 8, 15, 0) + ' KB'),

    LTRIM (STR (@pages * 8, 15, 0) + ' KB'),

    LTRIM (STR ((CASE WHEN @usedpages > @pages THEN (@usedpages - @pages) ELSE 0 END) * 8, 15, 0) + ' KB'),

    LTRIM (STR ((CASE WHEN @reservedpages > @usedpages THEN (@reservedpages - @usedpages) ELSE 0 END) * 8, 15, 0) + ' KB')

    );

    END

    FETCH NEXT FROM C1 INTO @TABLE_NAME;

    END;

    CLOSE c1;

    DEALLOCATE c1;

    INSERT INTO @tempspacetable (tablename, row_count, reserved, data, index_size, unused, SchemaName )

    SELECT t1.*

    , t2.schemaname

    FROM @temp_table t1

    INNER JOIN @tab1 t2

    ON (t1.tablename = t2.schemaname+'.'+ t2.tablename )

    WHERE row_count >= 0

    ORDER BY row_count DESC , schemaname, t2.tablename;

    SELECT *

    FROM @tempspacetable

    ORDER BY tablename

  • I vote for CTE to find the duplicates in a table!

    :w00t::cool::hehe:

    ============================================================
    SELECT YOUR PROBLEM FROM SSC.com WHERE PROBLEM DESCRIPTION =
    http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]

  • Dugi (1/22/2010)


    I vote for CTE to find the duplicates in a table!

    :w00t::cool::hehe:

    I second that vote.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • SQLPirate (1/22/2010)


    Using sp_mstablespace won't give you any unused space that the table is holding and sp_msforeachtable won't return internal tables. We were having issues pinpointing our space discrepancies, which were largely due to internal table usage by service broker, so we switched to a version of the following (this is an older version where I've stripped out some things that are specific to our environment)

    DECLARE @idINT

    ,@pagesBIGINT

    ,@logsize BIGINT

    ,@reservedpages BIGINT

    ,@usedpages BIGINT

    ,@rowCount BIGINT

    DECLARE @table_name VARCHAR(500) ;

    DECLARE @schema_name VARCHAR(500) ;

    DECLARE @tab1 TABLE(

    tablename VARCHAR (500) COLLATE database_default

    , schemaname VARCHAR(500) COLLATE database_default

    );

    DECLARE @tempspacetable TABLE(

    [tablename] [SYSNAME]

    ,[row_count] [BIGINT]

    ,[reserved] [VARCHAR](50)

    ,[data] [VARCHAR](50)

    ,[index_size] [VARCHAR](50)

    ,[unused] [VARCHAR](50)

    ,[schemaname] [VARCHAR](500)

    );

    DECLARE @temp_table TABLE (

    tablename SYSNAME

    , row_count BIGINT

    , reserved VARCHAR(50)

    , data VARCHAR(50)

    , index_size VARCHAR(50)

    , unused VARCHAR(50)

    );

    INSERT INTO @tab1

    SELECT t1.name

    , t2.name

    FROM sys.objects t1 WITH (NOLOCK)

    INNER JOIN sys.schemas t2 WITH (NOLOCK)

    ON ( t1.schema_id = t2.schema_id )

    WHERE t1.type in ('U', 'IT', 'S','V');

    DECLARE c1 CURSOR FOR

    SELECT t2.name + '.' + t1.name

    FROM sys.objects t1 WITH (NOLOCK)

    INNER JOIN sys.schemas t2 WITH (NOLOCK)

    ON ( t1.schema_id = t2.schema_id )

    WHERE t1.type IN ('U', 'IT', 'S','V')

    AND t2.name NOT IN ('INFORMATION_SCHEMA');

    OPEN c1;

    FETCH NEXT FROM c1 INTO @table_name;

    WHILE @@FETCH_STATUS = 0

    BEGIN

    SET @table_name = replace(@table_name, '[','');

    SET @table_name = replace(@table_name, ']','');

    IF EXISTS(SELECT object_id FROM sys.objects WITH (NOLOCK) WHERE object_id = object_id(@table_name))

    BEGIN

    SELECT @id = object_id(@table_name)

    SELECT

    @reservedpages = SUM (reserved_page_count),

    @usedpages = SUM (used_page_count),

    @pages = SUM (

    CASE

    WHEN (index_id < 2) THEN (in_row_data_page_count + lob_used_page_count + row_overflow_used_page_count)

    ELSE lob_used_page_count + row_overflow_used_page_count

    END

    ),

    @rowCount = SUM (

    CASE

    WHEN (index_id < 2) THEN row_count

    ELSE 0

    END

    )

    FROM sys.dm_db_partition_stats WITH (NOLOCK)

    WHERE object_id = @id;

    IF (SELECT count(*) FROM sys.internal_tables WITH (NOLOCK) WHERE parent_id = @id AND internal_type IN (202,204)) > 0

    BEGIN

    SELECT

    @reservedpages = @reservedpages + sum(reserved_page_count),

    @usedpages = @usedpages + sum(used_page_count)

    FROM sys.dm_db_partition_stats p WITH (NOLOCK), sys.internal_tables it WITH (NOLOCK)

    WHERE it.parent_id = @id AND it.internal_type IN (202,204) AND p.object_id = it.object_id;

    END

    INSERT INTO @temp_table

    VALUES (

    schema_name(objectproperty(@id,'schemaid')) + '.' + OBJECT_NAME (@id),

    convert (char(11), @rowCount),

    LTRIM (STR (@reservedpages * 8, 15, 0) + ' KB'),

    LTRIM (STR (@pages * 8, 15, 0) + ' KB'),

    LTRIM (STR ((CASE WHEN @usedpages > @pages THEN (@usedpages - @pages) ELSE 0 END) * 8, 15, 0) + ' KB'),

    LTRIM (STR ((CASE WHEN @reservedpages > @usedpages THEN (@reservedpages - @usedpages) ELSE 0 END) * 8, 15, 0) + ' KB')

    );

    END

    FETCH NEXT FROM C1 INTO @TABLE_NAME;

    END;

    CLOSE c1;

    DEALLOCATE c1;

    INSERT INTO @tempspacetable (tablename, row_count, reserved, data, index_size, unused, SchemaName )

    SELECT t1.*

    , t2.schemaname

    FROM @temp_table t1

    INNER JOIN @tab1 t2

    ON (t1.tablename = t2.schemaname+'.'+ t2.tablename )

    WHERE row_count >= 0

    ORDER BY row_count DESC , schemaname, t2.tablename;

    SELECT *

    FROM @tempspacetable

    ORDER BY tablename

    Nice Script. Runs very quickly.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • CirquedeSQLeil (1/22/2010)


    SQLPirate (1/22/2010)


    Using sp_mstablespace won't give you any unused space that the table is holding and sp_msforeachtable won't return internal tables. We were having issues pinpointing our space discrepancies, which were largely due to internal table usage by service broker, so we switched to a version of the following (this is an older version where I've stripped out some things that are specific to our environment)

    DECLARE @idINT

    ,@pagesBIGINT

    ,@logsize BIGINT

    ,@reservedpages BIGINT

    ,@usedpages BIGINT

    ,@rowCount BIGINT

    DECLARE @table_name VARCHAR(500) ;

    DECLARE @schema_name VARCHAR(500) ;

    DECLARE @tab1 TABLE(

    tablename VARCHAR (500) COLLATE database_default

    , schemaname VARCHAR(500) COLLATE database_default

    );

    DECLARE @tempspacetable TABLE(

    [tablename] [SYSNAME]

    ,[row_count] [BIGINT]

    ,[reserved] [VARCHAR](50)

    ,[data] [VARCHAR](50)

    ,[index_size] [VARCHAR](50)

    ,[unused] [VARCHAR](50)

    ,[schemaname] [VARCHAR](500)

    );

    DECLARE @temp_table TABLE (

    tablename SYSNAME

    , row_count BIGINT

    , reserved VARCHAR(50)

    , data VARCHAR(50)

    , index_size VARCHAR(50)

    , unused VARCHAR(50)

    );

    INSERT INTO @tab1

    SELECT t1.name

    , t2.name

    FROM sys.objects t1 WITH (NOLOCK)

    INNER JOIN sys.schemas t2 WITH (NOLOCK)

    ON ( t1.schema_id = t2.schema_id )

    WHERE t1.type in ('U', 'IT', 'S','V');

    DECLARE c1 CURSOR FOR

    SELECT t2.name + '.' + t1.name

    FROM sys.objects t1 WITH (NOLOCK)

    INNER JOIN sys.schemas t2 WITH (NOLOCK)

    ON ( t1.schema_id = t2.schema_id )

    WHERE t1.type IN ('U', 'IT', 'S','V')

    AND t2.name NOT IN ('INFORMATION_SCHEMA');

    OPEN c1;

    FETCH NEXT FROM c1 INTO @table_name;

    WHILE @@FETCH_STATUS = 0

    BEGIN

    SET @table_name = replace(@table_name, '[','');

    SET @table_name = replace(@table_name, ']','');

    IF EXISTS(SELECT object_id FROM sys.objects WITH (NOLOCK) WHERE object_id = object_id(@table_name))

    BEGIN

    SELECT @id = object_id(@table_name)

    SELECT

    @reservedpages = SUM (reserved_page_count),

    @usedpages = SUM (used_page_count),

    @pages = SUM (

    CASE

    WHEN (index_id < 2) THEN (in_row_data_page_count + lob_used_page_count + row_overflow_used_page_count)

    ELSE lob_used_page_count + row_overflow_used_page_count

    END

    ),

    @rowCount = SUM (

    CASE

    WHEN (index_id < 2) THEN row_count

    ELSE 0

    END

    )

    FROM sys.dm_db_partition_stats WITH (NOLOCK)

    WHERE object_id = @id;

    IF (SELECT count(*) FROM sys.internal_tables WITH (NOLOCK) WHERE parent_id = @id AND internal_type IN (202,204)) > 0

    BEGIN

    SELECT

    @reservedpages = @reservedpages + sum(reserved_page_count),

    @usedpages = @usedpages + sum(used_page_count)

    FROM sys.dm_db_partition_stats p WITH (NOLOCK), sys.internal_tables it WITH (NOLOCK)

    WHERE it.parent_id = @id AND it.internal_type IN (202,204) AND p.object_id = it.object_id;

    END

    INSERT INTO @temp_table

    VALUES (

    schema_name(objectproperty(@id,'schemaid')) + '.' + OBJECT_NAME (@id),

    convert (char(11), @rowCount),

    LTRIM (STR (@reservedpages * 8, 15, 0) + ' KB'),

    LTRIM (STR (@pages * 8, 15, 0) + ' KB'),

    LTRIM (STR ((CASE WHEN @usedpages > @pages THEN (@usedpages - @pages) ELSE 0 END) * 8, 15, 0) + ' KB'),

    LTRIM (STR ((CASE WHEN @reservedpages > @usedpages THEN (@reservedpages - @usedpages) ELSE 0 END) * 8, 15, 0) + ' KB')

    );

    END

    FETCH NEXT FROM C1 INTO @TABLE_NAME;

    END;

    CLOSE c1;

    DEALLOCATE c1;

    INSERT INTO @tempspacetable (tablename, row_count, reserved, data, index_size, unused, SchemaName )

    SELECT t1.*

    , t2.schemaname

    FROM @temp_table t1

    INNER JOIN @tab1 t2

    ON (t1.tablename = t2.schemaname+'.'+ t2.tablename )

    WHERE row_count >= 0

    ORDER BY row_count DESC , schemaname, t2.tablename;

    SELECT *

    FROM @tempspacetable

    ORDER BY tablename

    Nice Script. Runs very quickly.

    Yes I agree but I recommend to eliminate the sys schema!

    ============================================================
    SELECT YOUR PROBLEM FROM SSC.com WHERE PROBLEM DESCRIPTION =
    http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]

  • i took the liberty to update the script a little bit to allow better sorting based on different columns, such as Data.

    By changing the Data column to a BigInt and then removing the KB concatenation, one can now sort on the Data field (as an example).

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Yes I agree but I recommend to eliminate the sys schema!

    Oops, yeah that's also a remnant from our environmental needs. The information in our current version also contains servernames, dbnames, and the date that it's taken, among other things. It's all used for proactive monitoring and trending purposes, schema included 🙂

  • i took the liberty to update the script a little bit to allow better sorting based on different columns, such as Data.

    By changing the Data column to a BigInt and then removing the KB concatenation, one can now sort on the Data field (as an example).

    Yeah, converting afterward is one of my biggest gripes about this, but it's so deeply embedded in our systems I haven't gotten around to implementing all the other changes that would need to occur for this piece of code to change.

  • SQLPirate (1/22/2010)


    Yes I agree but I recommend to eliminate the sys schema!

    Oops, yeah that's also a remnant from our environmental needs. The information in our current version also contains servernames, dbnames, and the date that it's taken, among other things. It's all used for proactive monitoring and trending purposes, schema included 🙂

    Yes. Easy fix and much cleaner.

    Thanks again for the script. Better than mine.;-)

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply