A Technique for Determining the I/O Hog in your Database

  • Comments posted to this topic are about the content posted at temp

  • I liked this alot, and found it very useful.

  • Excellent article. Nice & thoughtful. I'm definately adding it to my tool box.

    A question about your process though. You went about isolating the database object that was giving you coniptions. I'm just surprised you were unable to spot the problem within the TSQL code, you know, this proc runs longer & uses more resources than any other?

    In any event, this was really well written, well explained and damned useful.

    Thanks.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • At first read this seems to be an interesting read with some useful conclusions, however on more careful analysis I would suggest that the heading and conclusions are somewhat misleading.

    In a nutshell the trace records the scan stat as shown thus from a query

    select * from pubs.dbo.titleview

    (25 row(s) affected)

    Table 'titles'. Scan count 25, logical reads 50, physical reads 0, read-ahead reads 0.

    Table 'titleauthor'. Scan count 23, logical reads 46, physical reads 0, read-ahead reads 0.

    Table 'authors'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0.

    The point about this query is that the scan on authors is actually an index scan taking 70% of the time and the scans on the other two tables are clustered index seeks - ok it's a poor example I admit- the point is that if the titles table was doing table scans the scan count would still stay at 25. ( and 25 table scans could be bad news whereas 25 index seeks are fine )

    I dispute some of the maths and conclusions, in a reasonable configured SQL Server any table hotspots should be in cache, I've found more often that less frequently used heavy i/o queries can flush the data cache. Technically 200k i/o could flush 1.5Gb through the data cache, for a reasonale table this may appear only as 1 scan ( in fact a select * from a 300 million row table would show as 1 scan ) so in the overall scheme of things the 1 scan that moved several gigabytes of data would not be shown as significant. On a small table a single integer value returned from an indexed table shows a scan count of 1 whereas the entire column ( just over 2100 rows ) also shows 1 scan. In actual fact we have 4 bytes vs 8600 bytes or 2 page i/o vs 5 page i/o or 1 scan vs 1 scan. For small tables sql server is unlikely to do any row ops and read ahead is usually 64kb blocks, exactly how read ahead deals with a table of only 32kb I'm not 100% sure. Most typical integer secondary indexes would be contained within 3 pages of which two pages will always be read it's wrong to attempt to multiply the scan count by 4 bytes to see how much data is being read, I'd suggest ( in this case ) always 2 pages.

    The article certainly gives a slant on object usage, however I'd suggest a sql statement vs reads, writes, cpu, duration is of much greater value

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • If you have a look at how the calculation works, all averages being equal, then it is left open to interpretation.

    Personally you cannot determine if you have a problem using a single sided approach, this would give some depth to crossing off some of those wildcards that tend to pop up once you start tracing performance problems.

    Lukas Botha

  • Nice article, definitely adding it to my collection for sometime when I have a little more spare time.

  • Thank you Michael,  Excellent article, this is definitely a article to  add as part of DBAS collection. 

    Kindest Regards,

    Sameer Raval [Sql Server DBA]
    Geico Insurance
    RavalSameer@hotmail.com

  • Excellent article!  Just one question:  as clearly stated in the beginning of the article, this technique should be used after all "commonly used" techniques have been applied.  Can we turn this around?  I mean, start with this approach and move on to other more "common" diagnostics?.  I know this is focused mainly on object/traffic issues, but just how common these issues are?  Maybe by fixing problems identified by applying this method, other problems are solved also?  Man... it was more than one question, wasn't it?

  • I have one problem with the calculations.  Overall it's a good article, and more metrics are always useful, if they're accurate.

    He lists the calculation for data flow as: Data Flow = Size of Data Element * Number of times it is accessed  -- and uses AverageRecordSize to represent Size of Data Element

    But there's a big piece missing, namely the number of rows.  A table scan or index scan will retrieve each row of each table/index.  So we should be multiplying by total table size, i.e. AverageRecordSize * NumberOfRows, or better yet, NumberOfPages.

    So, a wide table with a few rows will come across differently than a narrow table with millions of rows, even if they take up the same number of pages.  I think it's best to think of table or index data flow in terms of pages, since that's the memory allocation chunk size and basis of disk retrieval.

     

    Dylan Peters
    SQL Server DBA

  • Hi,

    select objectid, indexid, sum(*) as scans

    from IndexCapture

    group by objectid, indexid

    what is sum(*) here? it should be sum of some column name, right?

    SQL DBA.

  • select objectid, indexid, sum(*) as scans

    from IndexCapture

    group by objectid, indexid

    In the above query change sum(*) to count(*).

    This is a very nice article. But I agree with the other post that table size should be included into the calculation instead of just averagerecordsize.

    AL

  • I thought it was an excellent article and another tool to troubleshoot issues.  Certainly table size would come into play, but this is just a way to look at space it takes to put a row into cache.  I would hope that on larger tables you would only need ot be caching parts, not whole tables. 

  • I think the sum(*) statement in the SQL to create IndexCaptureSummary should probably be count(*). Also, in the report sql statement, the table name should be CompileIndexesShowContig according to the sql used to create the table earlier on...

    I thought this was an interesting concept and so I did the study on my database and I have to say that it does appear that the results as stated in the article can be misleading. One of our most heavily used indexes is on our products table which has large records (varchar() text descriptions,etc), but only 250 rows and the index used is clustered.

    I'm not sure that the products table is our IO hog, but it showed up as 57% of flow on 17% of scans using this technique. Most of our queries don't return * from products but only select columns and thus (scancount * AverageRecordSize) may not be a good indication of the IO demand created by those scans. It's also frequently used and small enough overall that I'm 99% sure it's held in memory and hurting disk IO.

    Since the profiler event Scan:Start just triggers when an index is used (correct?), our products primary key is probably also scanned when using the products table in joins where records aren't even being retreived (necessarily). Not too sure on this point...

    Thus, I think the data collection techniques are good, but I wonder if there is a better formulation of the summary report that would factor in the number of rows in the table and/or other data from profiler that would give a more accurate picture of IO demands. Otherwise, great work!

  • This article and all the replies were very interesting and serve a purpose. I too have a few questions about some of the concepts used. I actually think this solution is overthinking a common problem.

    If I think I/O is a problem, I will run a SQL Profiler Trace and parse it with my favorite parsing tool and sort by Reads or Writes (typically in my systems if I/O is a problem it is reads... there is probably a table scan or index scan someplace most of the time that can be better implemented through better code, better indexing or a better table/normalization structure) and find the worst queries.

    As colin has said earlier (and in numerous posts ) if you have a lot of reads you are bringing in a lot of data and possibly flushing out your cache and definitely using (over?) your disks and CPU..

    Take your reads multiply by 8 and divide by 1024 twice. This is number of GB you are reading. I have seen a horribly written query perform 23,000,000 reads on a database whose total size is only 5GB!!

    So I like looking at the "simple" things first. That normally solves the biggest headaches.

  • I just created this and the

     

    select cisc.objectname, cisc.indexname, ic.scans, floor(scans*cisc.averagerecordsize) as flow

    from IndexCaptureSummary ic

    inner join CompileIndexesShowContig cisc on cisc.objectid = ic.objectid and cisc.indexid = ic.indexid

    order by flow desc

    works but if you have a table with no pk or no index and just say rowid it does not pick that up...thats so on the

    two tables i added default of

    '' for both...

    one would come in as Null and the other as blank..

    SET THE INDEXID default '' on CompileIndexesShowContig and INDEXCAPTURE

    Also the sum(*) i changed to count(*)

    insert into IndexCaptureSummary (ObjectId, IndexID, Scans)

    select objectid, indexid, count(*) as scans

    from IndexCapture

    group by objectid, indexid

    Lets say its taking the AverageScans...what if you only select 5 records from a table and another day you select all records...(then would this be a bit misleading) as the i/o is reduced with smaller selects...(Anyone know how to change it so that rows could be incorporated into this )

    I like it alot............

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply