Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Purging Data Increases IO performance? Expand / Collapse
Author
Message
Posted Wednesday, December 12, 2012 12:52 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Today @ 9:30 AM
Points: 132, Visits: 588
TLDR: I purged about 50% of the data in a data warehouse during an upgrade and 3 days later, the IO_Completion wait stats drastically dropped. Is this normal?

I have a data warehouse loading process in which ~ 12 GB of numeric data is loaded from a binary data source to a newly created empty SQL table every night. Once this data is loaded, a new data file is created. A clustered key on the new file is created to move the loaded data from the Primary file to the new file. Once that's complete, some post load processing is run to pre aggregate the data for reporting.

After a few months of daily loading, we were told by the business that we should start purging some of the older data since it is completely useless after ~60 days. We were at about 1.5 TB of data at this point and the estimated reduction from the purge was about 1 TB. I performed the purge on Monday and the IO improvement wasn't seen till today (Wed).

Has anyone else seen IO improvements like this simply by freeing up the storage?

Server capacity is 2.45 TB. 15k RAID5
4 NUMA nodes with 20 proc's each
128 GB RAM
Post #1395855
Posted Wednesday, December 12, 2012 1:08 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Friday, November 22, 2013 10:13 AM
Points: 79, Visits: 200
If the purge was accompanied by a clustered index rebuild OR if the choice of keys in the clustered index is right, the table should resize after the purge. And yes, given the B tree structure of indexes, the amount of I/O incurred while resolving queries can drop drastically. If you have the index stats before and after, that should show it too. Last but not the least, how do you know that the query load on your DW is a constant ? What if the amount of queries againsst this table has dropped recently ? You may want to ascertain that as well, before drawing any conclusions.
Post #1395865
Posted Wednesday, December 12, 2012 1:23 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Today @ 9:30 AM
Points: 132, Visits: 588
I forgot to mention that each load is done to a new table. At the start of the each load process, a new table.

The process happens ~1:00 am every night and the only other load against the server is from my Data Collection for the MDW. The server is currently only used to load the DW and pump out some reports in the morning. No users perform any queries against the server. I have been measuring the load process duration and IO stats and it has been fairly stable for months. Last night's run was the first time I've seen this happen.
Post #1395870
Posted Wednesday, December 12, 2012 2:10 PM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Friday, August 16, 2013 8:28 AM
Points: 249, Visits: 460
Could autogrowth be accounting for this? If the files were growing each time, and the growth was wrong, you could incur a ton of IO, and after the purge, provided no shrink occured, would certainly be large enough to take in the loads without growing.

Just a guess.
Post #1395886
Posted Thursday, December 13, 2012 9:32 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Friday, July 11, 2014 9:04 AM
Points: 10, Visits: 51
How many disks are in your array? Is it one big LUN for all disks?

It could be the simple fact that now that you've cleared up 1TB of space at the start of your LUN (outer edges), Windows is allocating the new data files at the start of the LUN, which will be faster, especially for sequential reads. Compound that performance increase over the number of disks in your array.... using only fast outer edges of a drive is called "short stroking".

Try to see where your files are sitting physically on the disks.
Post #1396258
Posted Thursday, December 13, 2012 2:25 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 4:18 PM
Points: 2,192, Visits: 3,299
It's normal IF the indexing on the data was not optimal, so that you were scanning old data before that has now been removed.

Note that when you DELETE a large number of rows SQL defers the actual release of the data pages until later. A batch process later removes these "ghost" rows in increments. For 1TB data, I guess it's possible that it took more than 2 days to completely DELETE the ghost rows.

You should rebuild the affected tables after that large a DELETE.

You should also review index statistics, including missing indexes and current index usage and table scan counts, to verify whether you need to re-index one or more of the tables.


SQL DBA,SQL Server MVP('07, '08, '09)

Carl Sagan said: "There is no such thing as a dumb question." Sagan obviously never watched a congressional hearing!
Post #1396399
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse