February 4, 2010 at 2:57 am
i hava a table , contains millions of records .
i need to delete records on the basis of acct_id and evt_stub
and more often it contains millions of record for any evt_stub. it takes a lot of resources ?
is there any way to reduce the this overhead ? the problem behind it is , indexes associated with it.
i am attaching the execution plan.CREATE TABLE [dbo].[EMAIL_HISTORY](
[message_stub] [dbo].[ut_stub] NOT NULL,
[acct_id] [dbo].[ut_synthetic_key] NOT NULL,
[evt_stub] [dbo].[ut_stub] NOT NULL,
[cont_stub] [dbo].[ut_stub] NOT NULL,
[invitee_stub] [dbo].[ut_stub] NOT NULL,
[act_email_stub] [dbo].[ut_stub] NOT NULL,
[tl_stub] [dbo].[ut_stub] NOT NULL,
[from_email_addr] [dbo].[ut_email] NOT NULL,
[to_email_addr] [dbo].[ut_email] NOT NULL,
[dm_smtp_code] [dbo].[ut_short_description] NOT NULL CONSTRAINT [DF_EMAIL_HISTORY_dm_smtp_code] DEFAULT (''),
[delay_sensitive_flag] [dbo].[ut_flag] NOT NULL,
[email_status_id] [smallint] NOT NULL,
[email_type_id] [smallint] NOT NULL,
[email_viewed_flag] [dbo].[ut_flag] NOT NULL,
[dm_log_date] [datetime] NOT NULL CONSTRAINT [DF_EMAIL_HISTORY_dm_log_date] DEFAULT ('1900-01-01'),
[dm_log_reason] [varchar](60) NULL,
[dm_log_response] [varchar](200) NULL,
[permanent_bounced_flag] [dbo].[ut_flag] NOT NULL,
[created_date] [datetime] NOT NULL,
[last_modified_date] [datetime] NOT NULL,
[email_queue_stub] [dbo].[ut_stub] NOT NULL,
[prod_stub] [dbo].[ut_stub] NULL,
[email_clicked_flag] [dbo].[ut_flag] NULL,
[email_first_clicked_date] [datetime] NULL,
[evt_email_stub] [dbo].[ut_stub] NULL,
CONSTRAINT [PK_EMAIL_HISTORY_message_stub_created_date] PRIMARY KEY NONCLUSTERED
(
[message_stub] ASC,
[created_date] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PS_HalfYear_Month]([created_date])
)
USE [ESS]
GO
/****** Object: Index [ix_EMAIL_HISTORY_acct_id_evt_stub_evt_email_stub] Script Date: 02/04/2010 15:28:57 ******/
CREATE NONCLUSTERED INDEX [ix_EMAIL_HISTORY_acct_id_evt_stub_evt_email_stub] ON [dbo].[EMAIL_HISTORY]
(
[acct_id] ASC,
[evt_stub] ASC,
[evt_email_stub] ASC
)
INCLUDE ( [created_date]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PS_HalfYear_Month]([created_date])
USE [ESS]
GO
/****** Object: Index [ix_EMAIL_HISTORY_acct_id_evt_stub_invitee_stub_evt_email_stub] Script Date: 02/04/2010 15:29:03 ******/
CREATE NONCLUSTERED INDEX [ix_EMAIL_HISTORY_acct_id_evt_stub_invitee_stub_evt_email_stub] ON [dbo].[EMAIL_HISTORY]
(
[acct_id] ASC,
[evt_stub] ASC,
[invitee_stub] ASC,
[evt_email_stub] ASC
)
INCLUDE ( [created_date]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PS_HalfYear_Month]([created_date])
USE [ESS]
GO
/****** Object: Index [ix_EMAIL_HISTORY_created_date_acct_id_plus_evt_email_stub_email_viewed_bounced_flag_email_status] Script Date: 02/04/2010 15:29:14 ******/
CREATE NONCLUSTERED INDEX [ix_EMAIL_HISTORY_created_date_acct_id_plus_evt_email_stub_email_viewed_bounced_flag_email_status] ON [dbo].[EMAIL_HISTORY]
(
[created_date] ASC,
[acct_id] ASC
)
INCLUDE ( [evt_email_stub],
[email_viewed_flag],
[permanent_bounced_flag],
[email_status_id]) WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 75) ON [PS_HalfYear_Month]([created_date])
go
[ix_EMAIL_HISTORY_last_modified_date_acct_id_evt_email_stub_plus_email_viewed_bounced_flag] Script Date: 02/04/2010 15:29:25 ******/
CREATE NONCLUSTERED INDEX [ix_EMAIL_HISTORY_last_modified_date_acct_id_evt_email_stub_plus_email_viewed_bounced_flag] ON [dbo].[EMAIL_HISTORY]
(
[last_modified_date] ASC,
[acct_id] ASC,
[evt_email_stub] ASC
)
INCLUDE ( [created_date],
[email_viewed_flag],
[permanent_bounced_flag]) WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 75) ON [PS_HalfYear_Month]([created_date])
Query :
DELETE EMAIL_HISTORY
WHERE acct_id = @acct_id AND
evt_stub = @evt_stub
Here : ut_stub = uniqueidentifier datatype
any alternative i can do deletion to reduce overhead ?
-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
February 4, 2010 at 5:01 am
I did not see a clustered Index, not that is related to you problemn but why is that?
I see that you have an index that consist of 4 columns and includes the two columns that you are using in the delete.
When do you update statitics?
How much data is in the table?
Do you update statistics after bulk inserts, deletes and/or they performed periodically?
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
February 4, 2010 at 5:04 am
February 4, 2010 at 5:13 am
this table contains around 20 millions of records.
the statistics are being updated on daily basis
-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
February 4, 2010 at 5:15 am
no that cant be done , just for a one delete operation .
on every run, creating indexes is not at all possible and recommended.
-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
February 9, 2010 at 4:13 am
Try to delete chunks of records, see
February 9, 2010 at 5:51 am
I wrote a program years ago that would delete records as defined by batch size.
The batch size was defined in a control table. I also had a column that defined the total number of records to delete so that it would continue to loop until it met that threshold.
I also had another bit column (Active/Inactive) that I used to terminate the program after it completed the current loop.
I also had a stop time column so that the program did not attempt to purge records outside of the maintenance window.
The program would first archive the records before the delete and each event was stored in an ArchiveEvent Table including the Stsrt and End DateTime, number of record per batch and total number of records archived/purged per run. I also included the audit date in the control table that was used to validate the paramter passed to the the Archive-Purge Stored Procedure. I created a trigger so that you could not enter a Date value that was less than 2 years from the current date.
The table had 55 million records of which appoximately 22 million were purged on the first run.
I created an Archive Database to store the records because this data was subject to audit.
Another reason for archiving the records in a seperate database was that it reduce the amohnt of time to backup/restore and re-index the database as well as refresh the Development & QA Database environments.
I would provide you the source code but I no longer have it but the link provided is great!
Good luck.
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
February 10, 2010 at 9:08 am
When I needed to delete a million plus rows of out-dated data I used SET ROWCOUNT {some number} to limit the number of rows affected with each batch so that the query would complete in a minute or so. I set up a SQL Agent job to execute every 5 minutes. I just let that chip away at the old data until it was cleaned up. Every once in a while I would check in on the progress.
February 10, 2010 at 12:30 pm
Ed-86789 (2/10/2010)
When I needed to delete a million plus rows of out-dated data I used SET ROWCOUNT {some number} to limit the number of rows affected with each batch so that the query would complete in a minute or so. I set up a SQL Agent job to execute every 5 minutes. I just let that chip away at the old data until it was cleaned up. Every once in a while I would check in on the progress.
I believe that SET ROWCOUNT is deprecated in SQL 2005 and up. You can of course achieve the same effect with the TOP clause (with better performance, usually), however, it's not external to the query syntax, like SET ROWCOUNT is.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
February 10, 2010 at 12:37 pm
Posted by mistake...
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
February 13, 2010 at 12:26 am
RBarryYoung (2/10/2010)
I believe that SET ROWCOUNT is deprecated in SQL 2005 and up.
Yes, for data modification only. See http://msdn.microsoft.com/en-us/library/ms187043.aspx, the important part is reproduced below:
RBarryYoung (2/10/2010)
You can of course achieve the same effect with the TOP clause (with better performance, usually), however, it's not external to the query syntax, like SET ROWCOUNT is.
I'm not sure there is any general performance benefit in using an explicit TOP. SET ROWCOUNT results in a special kind of TOP operator in the plan (a rowcount TOP) and so is optimized in a very similar manner to an explicit TOP, in broad terms.
A local variable (or an expression) can be used with the TOP syntax, if there is some need to specify the count 'externally'. If you're keen to use the 'correct' type, remember that the parameter is a BIGINT.
February 13, 2010 at 12:27 am
Welsh Corgi (2/10/2010)
I like how people have to resort to the copy & paste gotcha mentality...
I'm afraid your point is lost on me. Would you elucidate?
February 13, 2010 at 7:21 am
This was posted on error om my part.
Thank you for your contributions to this Forum.
Regards...
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
February 13, 2010 at 10:01 am
Paul White (2/13/2010)
RBarryYoung (2/10/2010)
I believe that SET ROWCOUNT is deprecated in SQL 2005 and up.Yes, for data modification only. See http://msdn.microsoft.com/en-us/library/ms187043.aspx, the important part is reproduced below:
Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in the next release of SQL Server. Avoid using SET ROWCOUNT with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. We recommend that DELETE, INSERT, and UPDATE statements that currently are using SET ROWCOUNT be rewritten to use TOP.
Right you are, Paul. Thanks for the correction.
RBarryYoung (2/10/2010)
You can of course achieve the same effect with the TOP clause (with better performance, usually), however, it's not external to the query syntax, like SET ROWCOUNT is.I'm not sure there is any general performance benefit in using an explicit TOP. SET ROWCOUNT results in a special kind of TOP operator in the plan (a rowcount TOP) and so is optimized in a very similar manner to an explicit TOP, in broad terms.
The performance comment was based on observations and tests carried out about two years ago (I believe some of that was with others online here at SSC). Though "usually" might be an overstatement on my part, my recollection was that sometimes they were the same and sometimes TOP was faster.
A local variable (or an expression) can be used with the TOP syntax, if there is some need to specify the count 'externally'. If you're keen to use the 'correct' type, remember that the parameter is a BIGINT.
By "external", I was referring to the advantage of external retrofitting, which SET ROWCOUNT has, that is, it can be applied without having to change the query ("external") and without forethought ("retrofitting"). TOP lacks this advantage as it cannot be applied without either changing the query or else some prior anticipation of the need.
Don't get me wrong, I much prefer TOP, in large part because it is an expression-local effect, and is not a session-global effect like SET ROWCOUNT, and because I think that intelligent anticipation and intention (i.e., "planning" and "design") are highly desirable things. However, from the standpoint of maintenance and administration, it is also nice to have the option.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
February 13, 2010 at 12:45 pm
My account has been comprimised...
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply