any way to reindex gigantic table efficiently?

Question

any way to reindex gigantic table efficiently?

cmoy

Ten Centuries

Points: 1034
More actions
July 20, 2014 at 10:31 pm

#293509

so I have a time series table consisting of 3 columns: a key, a timestamp and a value. The primary key is of course on the key and timestamp. The table is about 80 billion rows. Certain data is deleted from the table every month. This has caused massive fragmentation on the table. Partitioning the table based on say a time doesn't do much good because many of the keys need to remain for all time periods and the key is nothing more than a generated identity.
I do have Enterprise Edition (SQL Server 2008 R2) so I can do an online index rebuild, but if I'm not mistaken, an online rebuild is making a full copy of the primary key. The table itself is about 1.6 terabytes and I only have about 200Gb worth of disk space that is free. I can't span to other drives for now. Anyone have any suggestions on how I can reindex this table?
The table is used 24x7 - pretty much no maintenance time is available. However, I am possibly going to switch to a new server on new storage, so perhaps there's a way to redo the table on the new server before switching. However, this is not going to be easy as 250 million rows are written to this table on a daily basis.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

Igor Micev SSC-Dedicated Points: 33109 More actions · Answer 1

cmoy (7/20/2014)
so I have a time series table consisting of 3 columns: a key, a timestamp and a value. The primary key is of course on the key and timestamp. The table is about 80 billion rows. Certain data is deleted from the table every month. This has caused massive fragmentation on the table. Partitioning the table based on say a time doesn't do much good because many of the keys need to remain for all time periods and the key is nothing more than a generated identity.
I do have Enterprise Edition (SQL Server 2008 R2) so I can do an online index rebuild, but if I'm not mistaken, an online rebuild is making a full copy of the primary key. The table itself is about 1.6 terabytes and I only have about 200Gb worth of disk space that is free. I can't span to other drives for now. Anyone have any suggestions on how I can reindex this table?
The table is used 24x7 - pretty much no maintenance time is available. However, I am possibly going to switch to a new server on new storage, so perhaps there's a way to redo the table on the new server before switching. However, this is not going to be easy as 250 million rows are written to this table on a daily basis.

You cannot rebuild that table with such free space you have. You can add some disk space and add some files of/to the tempdb, so then you use the tempdb to sort/rebuild the index online.

Usually tables with over 100 million rows are about to be partitioned.

Because you delete data every month, you should have partitioned your table. With just one switch you're deleting a sub-table (with truncating the stage table) and don't introduce fragmentation to the main table.

Maintenance of partitioned table takes less time. Apart that you can introduce a special maintenance to that table to rebuild only several partitions, and not all, which is an advantage.

You can decrease the FillFactor to 99 or 98 to avoid fast fragmentation.

Why is your key composite of Id-identity and timestamp? you know the advantages of key with identity property.

Igor Micev,My blog: www.igormicev.com

cmoy Ten Centuries Points: 1034 More actions · Answer 2

this was all done way before my time.

Partitioning is not feasible at this point - what would you partition on (what would the partition function be?). I would probably wind up with thousands of partitions and I'm not about to try and figure out how to dynamically create a restore database command to pick up these partitions (we do backup/restore onto other machines). It would have been nice to be able to partition on just the timestamp and be able to delete partitions after a certain age, but alas that's not the case.

One of these years we'll go to some big data solution such as Cassandra but until that time...

ScottPletcher SSC Guru Points: 100952 More actions · Answer 3

Would have to see your typical queries to be sure, but it's extremely likely the clustering keys are backwards, i.e., they should be ( timestamp, ident ) rather than vice versa. But that's almost impossible to fix/change now. Maybe you can get downtime on Christmas Day? (Not a joke, that's when I do a lot of my "table can't be down" maintenance.)

You can REORGANIZE the index, as that's always done online and in relatively small chunks. Still, make sure you have several gig of pre-allocated but unused log space available.

Also, you can put your data into the bold parts of the script below and then it to check existing index usage.

So the DELETEs are not of contiguous blocks of old rows, but instead some rows are deleted and some are not?

USE [<your_db_name>]

SET DEADLOCK_PRIORITY LOW --probably irrelevant, but just in case

DECLARE @table_name_pattern sysname

SET @table_name_pattern = 'your_table_name'

PRINT 'Started @ ' + CONVERT(varchar(30), GETDATE(), 120)

-- list index usage stats (seeks, scans, etc.)

SELECT

ius2.row_num, DB_NAME() AS db_name,

CASE WHEN i.name LIKE ca2.table_name + '%'

THEN '~' + SUBSTRING(i.name, LEN(ca2.table_name) + 1 +

CASE WHEN SUBSTRING(i.name, LEN(ca2.table_name) + 1, 1) = '_' THEN

CASE WHEN SUBSTRING(i.name, LEN(ca2.table_name) + 2, 1) = '_' THEN 2 ELSE 1 END

ELSE 0 END, 200)

ELSE i.name END AS index_name,

CASE WHEN i.is_unique = 0 THEN 'N' ELSE 'Y' END + '.' +

CASE WHEN i.is_primary_key = 0 AND i.is_unique_constraint = 0 THEN 'N' ELSE 'Y' END AS [uniq?],

ca2.table_name,

i.index_id, --ius.user_seeks + ius.user_scans AS total_reads,

dps.row_count,

SUBSTRING(key_cols, 3, 8000) AS key_cols, SUBSTRING(nonkey_cols, 3, 8000) AS nonkey_cols,

ius.user_seeks, ius.user_scans, ius.user_lookups, ius.user_updates,

ius.last_user_seek, ius.last_user_scan, ius.last_user_lookup, ius.last_user_update,

fk.Reference_Count AS fk_ref_count,

DATEDIFF(DAY, CASE WHEN o.create_date > ca1.sql_startup_date THEN o.create_date

ELSE ca1.sql_startup_date END, GETDATE()) AS max_days_active,

FILEGROUP_NAME(i.data_space_id) AS filegroup_name,

ius.system_seeks, ius.system_scans, ius.system_lookups, ius.system_updates,

ius.last_system_seek, ius.last_system_scan, ius.last_system_lookup, ius.last_system_update

FROM sys.indexes i WITH (NOLOCK)

INNER JOIN sys.objects o WITH (NOLOCK) ON

o.object_id = i.object_id

CROSS JOIN (

SELECT create_date AS sql_startup_date FROM sys.databases WHERE name = 'tempdb'

) AS ca1

CROSS APPLY (

SELECT OBJECT_NAME(i.object_id/*, DB_ID()*/) AS table_name

) AS ca2

OUTER APPLY (

SELECT

', ' + COL_NAME(object_id, ic.column_id)

FROM sys.index_columns ic

WHERE

ic.key_ordinal > 0 AND

ic.object_id = i.object_id AND

ic.index_id = i.index_id

ORDER BY

ic.key_ordinal

FOR XML PATH('')

) AS key_cols (key_cols)

OUTER APPLY (

SELECT

', ' + COL_NAME(object_id, ic.column_id)

FROM sys.index_columns ic

WHERE

ic.key_ordinal = 0 AND

ic.object_id = i.object_id AND

ic.index_id = i.index_id

ORDER BY

COL_NAME(object_id, ic.column_id)

FOR XML PATH('')

) AS nonkey_cols (nonkey_cols)

LEFT OUTER JOIN sys.dm_db_partition_stats dps WITH (NOLOCK) ON

dps.object_id = i.object_id AND

dps.index_id = i.index_id

LEFT OUTER JOIN sys.dm_db_index_usage_stats ius WITH (NOLOCK) ON

ius.database_id = DB_ID() AND

ius.object_id = i.object_id AND

ius.index_id = i.index_id

LEFT OUTER JOIN (

SELECT

database_id, object_id, MAX(user_scans) AS user_scans,

ROW_NUMBER() OVER (ORDER BY MAX(user_scans) DESC) AS row_num --user_scans|user_seeks+user_scans

FROM sys.dm_db_index_usage_stats WITH (NOLOCK)

WHERE

database_id = DB_ID()

--AND index_id > 0

GROUP BY

database_id, object_id

) AS ius2 ON

ius2.database_id = DB_ID() AND

ius2.object_id = i.object_id

LEFT OUTER JOIN (

SELECT

referenced_object_id, COUNT(*) AS Reference_Count

FROM sys.foreign_keys

WHERE

is_disabled = 0

GROUP BY

referenced_object_id

) AS fk ON

fk.referenced_object_id = i.object_id

WHERE

i.object_id > 100 AND

i.is_hypothetical = 0 AND

i.type IN (0, 1, 2) AND

o.type NOT IN ( 'IF', 'IT', 'TF', 'TT' ) AND

(

o.name LIKE @table_name_pattern AND

o.name NOT LIKE 'dtprop%' AND

o.name NOT LIKE 'filestream[_]' AND

o.name NOT LIKE 'MSpeer%' AND

o.name NOT LIKE 'MSpub%' AND

--o.name NOT LIKE 'queue[_]%' AND

o.name NOT LIKE 'sys%'

)

--AND OBJECT_NAME(i.object_id /*, DB_ID()*/) IN ('tbl1', 'tbl2', 'tbl3')

ORDER BY

--row_count DESC,

--ius.user_scans DESC,

--ius2.row_num, --user_scans&|user_seeks

db_name, table_name,

-- list clustered index first, if any, then other index(es)

CASE WHEN i.index_id IN (0, 1) THEN 1 ELSE 2 END,

key_cols

PRINT 'Ended @ ' + CONVERT(varchar(30), GETDATE(), 120)

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.