Limit Resources to UGLY query

  • I have a process that has no option but to suck the living daylights out of the system to perform its function.  At this point, I am NOT trying to find a way to improve it's performance(I am but that is another topic).  What I would like to do is find a way to limit it's impact on the system.

    I can't schedule it to run off hours.  I COULD do things like snapshots, replicated read only environments, but not really interested in going through that at the moment.  If I must I will.

    Are there any settings ways to govern this job some how.  Like, only run if CPU is below 25% and never use more than 25% etc... What options do I have.  I have full access to the code, and can change it.

    Things I would like to limit.  I/O, CPU% or Number of CPU's, Priority in the system. 

    I know someone is going to ask, so here is what I am doing.

    This is basically a data mining job and would best be suited for Analysis, but since the application is currently written to work in a certain way (without analysis) this rare situation is not worth converting over to it. 

    I have basically 1 Billion rows of data points.  These datapoints have dates, series, types, and of course a value.  I am trying to find which series in which date range of certain types show certain data trends.  These will ultimately pivioted out to 1 column per type and in some cases the deltas of those types. 

    Series, Date, Type1, type2, type3... type_1_delta,type_2_delta

    for those types that were part of the query.

    Also series can be a daily, weekly,monthly,yearly etc series. 

  • sounds like a job for a CUBE


    * Noel

  • In sql server 2005 you have only ONE option for resource control over a query's execution:  the degree of parallelization.  If you have a multi-cpu box this may be sufficient for you.  Simply use the MAXDOP option on the query(s) and use only 1 or 2 cpus.  That will leave the other cpus on the box to perform all other activity.

    On SQL 2008 they are baking in a Resource Governor, which will allow you to control total CPU percentage use by a spid.  Unfortunately I don't think they will have an I/O governor, which would be hugely beneficial but which is significantly more difficult to implement/manage.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Thanks.  That's pretty much what I figured. 

    This is probably the first thing I haven't been able to implement that I could do with Oracle. 🙂  They had "Resource Groups"  Basically You had your high priority group, med, etc. 

    You could define which group to run with very complex configurations, but basically everyone in the groups above you got the system before you. When the system had some extra juice that wasn't being used by a higher group, you got to run.  That would be ideal for me, here. 

  • In 2005 Microsoft took a LOT of the Oracle playbook to heart (that it already hadn't done or done better on), especially from a manageability standpoint (think DMVs).  They are continuing that with 2008.  I have several clients that will greatly benefit from resource control in 2008 - like doing OLAP on OLTP systems.  Set those long-running reports to a lower priority thread.  Note however that doing so can lead to unintended consequences, such as holding locks even longer and thus bringing real OLTP work to a grinding halt while the overall system utilization is held pretty low!  🙁

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • You can set a job agent job to run "whenever the CPUs become idle".  Somewhere I am sure there is information on what exactly that means.  I think you have some control over the idle condition.

    You could, I suppose, run a second instance of SQL on the server, use a linked server to the first instance, and run your process from the second SQL instance.  This would allow you to specify the CPUs to use and manage memory consumption.

    If you have a test environment available, trying it would not be terribly difficult.  I do, however, think your effort would be better spent makeing the process more efficient.

  • "idle cpu" job wouldn't work here because it is a long running query. No matter what you defined 'idle' as the query would run long enough to encroach on real work. I also believe that your suggestion to use a separate instance on the same server could well result in worse performance overall. Providing the new instance with limited memory would ensure huge amounts of physical I/O which would steal that bandwidth from the other production instance, leading to really bad performance since I/O is usually the most restrictive resource for most sql server apps. Yes you could restrict the cpus for the instance, but you can do that for the specific query also.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I feel your pain. Sometimes we just have to do things the wrong way.:hehe:

    If you can rewrite the procedure so that it processes in chunks, you can put a WAITFOR DELAY inside the loop that processes the chunks, and thereby throttle the process.

    Sounds like you may want to use a cursor.

  • In a case such as this, I'd look at the size of the .mdf files. You have > billion records. Are your indexes in their own filegroup? Is the table in its own filegroup? I'd make sure to have the table in its own filegroup and its indexes in their own filegroup. And where are the data files, program files and system files located? Also, make sure you don't have the filegroup that contains the table/indexes spanning multiple mdf files.

    this uses lots of system resources:

    1 filegroup

    allmytables - Primary

    allmyindexes - Primary

    3 files

    Primary = D:\file1.mdf

    Primary = E:\file2.mdf

    Primary = F:\file3.mdf

    as opposed to this:

    3 filegroups

    MyOtherTables = Primary

    table = MytableFG

    index = MyIndexFG

    3 files

    Primary = D:\Primary.mdf

    MytableFG = D:\MytableFG.mdf

    MyIndexFG = E:\MyIndexFG.ndx

  • Thanks for that, but as I said, this post isn't about tuning the query or the database for that matter. It looks like my only option is the MAXDOP HINT. Which I hope I don't have to use.

    I won't be as bold as to say that I can't improve the database, but it performs fairly well for what I need. The system is TUNED for the Normal. This is the abnormal case. However, it is something that will occur. Luckily for me, this option is only available to a LIMITED set of users, and they will all be trained on how NOT to kill the system.

    As I mentioned, this is PERFECTLY suited for analysis. In essence, I am constructing a cube on the fly. This is not an ideal situation, and if I find out that this is going to be more common, I will port this piece of the application over to analysis. At the moment, I expect this (special request) should currently happen about once every 300 investigative requests. Or about once every couple weeks. And about once every Million or so selects on this table. I am not going to change my design to suit a 1 in a million situation.

  • Jeff Gray (9/26/2007)


    If you can rewrite the procedure so that it processes in chunks, you can put a WAITFOR DELAY inside the loop that processes the chunks, and thereby throttle the process.

    That's a very likely option. Thanks. I considered the breaking up, but not the delay. That will actually fit fairly easily into my code.

  • Another thing you might try is to use the READ UNCOMMITTED transaction isolation level. That will reduce overhead to some degree.

  • If blocking becomes an issue, I'll switch READ_COMMITTED_SNAPSHOT on. I have it enabled in most of my databases.

    Not sure about read uncommitted as helping, but I could possibly use a table lock for part of the process. Most of the work is actually against a series of temp tables.

    Didn't mention that aspect of it. The first say 10% of the process is to extract ONLY the necessary data.

    Maybe I could explain the process a little.

    1) First I find the series that I need. There are say 10k possible main series. Normally I might need examine 2000 of those.

    2) I then find all of the time series that apply for each of those series. Each of those 10k series have 1-3 series(daily,weekly,monthly)

    Then for each of those series, You have a type id. there are 32 distinct types. But in most cases, I am only interested in 1 or two of those fields.

    3) I pivot all of this out, to get something like the following

    create table #TempTable

    (

    MainSeriesID INT,

    theDate DATETIME,

    Daily BIT,

    Weekly BIT,

    Monthly BIT,

    Daily_Type1 INT,

    Weekly_Type2 FLOAT,

    Monthly_Type3 FLOAT,

    Daily_Type1_Change INT,

    Weekly_Type2_Change FLOAT)

    That part is actually not that bad, and the worst case I have found only takes in a worst case situation about 5 min. This is what actually hits the 1billion row table. The #TempTable will have between 2M-25M rows.

    Now where this gets painful.

    4) The _Change fields would be the difference between the day before and the current day for the specific type. This can be quite painful.

    ;WITH X (RN,StoredCalcDataID,AsOfDate,[Weekly_Type1]) AS (

    SELECT ROW_NUMBER() OVER (ORDER BY MainSeriesID ,theDate ) AS RN,

    MainSeriesID ,theDate ,[Weekly_Type1]

    FROM #ComplexFilter

    WHERE WEEKLY=1)

    UPDATE #TempTable

    SET Weekly_Type1_Change =

    CASE

    WHEN Newer.Weekly_Type1 > Older.Weekly_Type1 THEN 1

    WHEN Newer.Weekly_Type1 < Older.Weekly_Type1 THEN -1

    ELSE 0 END

    FROM #TempTable

    INNER JOIN X AS Newer

    ON Newer.MainSeriesID = #TempTable.MainSeriesID

    AND Newer.theDate = #TempTable.theDate

    INNER JOIN X AS Older

    ON Older.RN + 1= Newer.RN

    AND Older.MainSeriesID = Newer.MainSeriesID

  • Hi Bob,

    The read uncommitted thingy might help by reducing the overhead of taking locks on the tables. It won't help the execution plan, but it could cut down on overhead.

    You might consider placing indexes on the temp table. Another possibility would be computing the RN +1 value and placing it in the temp table so that you can join directly, rather than performing the scalar operation (does THIS RN = THAT RN + 1) on every possible combination of RNs. This column could also be indexed.

    It may be worthwhile to tune this rather than try to break it up into chunks.

  • RN on base table doesn't work because of Day, Week, Month Gaps. I didn't show it, but I have to run through this 3x to update days, then weeks, then months. Oh yeh. Then I go back and fill in the NULLS. I store just one value for the month.(no you can't compute it, I have to use what was stored). But then in order to do the comparison, I have to fill in the month value for ALL days where MONTH=0, but that day (or week) is part of the same month where MONTH=1 (Date is the last day of the month). Same goes for week and day/Month. UGHH!!

    Actually tried indexes, with/without clustered, not, etc.

    yes, indexes helped somewhat with the joins, but everything that was gained in this part was lost on the insert (AND MORE). Also, I am looking at everything in this table, so a HASH join is the most efficient, and indexes aren't of much help.

    I spent 2 days trying every possible combination of indexes and table structures. Partitioned, non-partitioned. My head hurts.

    And if I could just pre-compute all this that would work too, but the data can be modifed after the fact.

    I tried cursors so I could step through the table once vs re-reading them in batch. (that worked really well ---- NOT!!) It took 30x longer for a very small dataset (80k).

Viewing 15 posts - 1 through 15 (of 58 total)

You must be logged in to reply to this topic. Login to reply