Parallel Processing Using Jobs

  • Comments posted to this topic are about the item Parallel Processing Using Jobs

  • Hey all,

    Great idea - and perfect timing for me.

    I am currently looking to rewrite some of our TL (E is done prior) to a better system then it currently runs on.

    oCan be used for different systems (we have multiple systems that will use it)

    oHas precedence, but can fail elements and continue others.

    oHas parallelism where possible.

    oIs easy to maintain

    oWith detailed error logging, and simple monitoring

    Currently i have considered SSIS (although not easy to maintain in my view) and using service broker to queue (a little complex).

    What do people think of this article as an idea? So effectively i could have a set of tables to run this with - and procs to do the TL.

    Thoughts?

    Dan

  • I'm already doing something very similar on one of our systems. The process requires me to load data from a standby database into one which the users can access. The load is a complete refresh due to the nature of the system.

    As part of this load, I have 4 threads of data copy running. These are actually managed from metadata in a table, but in an SSIS package which loops through four dynamically generated queues. This could be converted very easily to the T-SQL create / execute / delete job approach.

    I then use the create / execute / delete job approach for building the indexes on these tables. The idea is that as soon as the procedure has completed that copies the data across for a table, it spawns a job to rebuild all of the indexes for that table. This means that the data copy over gets done quickly, and the indexes follow, off the critical path.

    I was worried about the overhead on the server for doing this, but the copy over doesn't appear to be badly affected by the index rebuild.

    Cheers

    Ben

  • Interesting idea...but the article has a somewhat incomplete feel to it.

    I notice that you never get beyond the 'Introduction' heading.

    A little more explanation - perhaps a working example...and more importantly a conclusion (does it work? is it a good idea? is there a risk?) would have been welcome.

    Perhaps Part II is on the way? If so, I take it all back!

    Thanks.

  • This method does have limited usefulness, but should be viewed as a last resort after making every attempt to streamline and simplify processes. It can easily be over-used (and abused) as an easy cure-all for long running processes that need to be fixed.

    I am not just speaking hypothetically. We added an almost identical process to our system years ago and we are now spending a lot of time getting rid of it because it is a constant headache. The job queue table backs up regularly on a busy system, and even when it is not backed up it introduces load that competes with user interactive processes. The idea of "do it later" doesn't work when the system is so busy processing interactive requests from users that there is no "later" during working hours.

  • David McKinney (4/29/2013)


    Interesting idea...but the article has a somewhat incomplete feel to it.

    I notice that you never get beyond the 'Introduction' heading.

    A little more explanation - perhaps a working example...and more importantly a conclusion (does it work? is it a good idea? is there a risk?) would have been welcome.

    I was wondering how you make it known to others how these jobs are getting into the job list. If everyone "just knows" about the metadata table that creates jobs, then perhaps I'm being pedantic. However, if/when a new DBA sees self-deleting jobs appearing and disappearing (before doing the exhaustive task of identifying the metadata table and the SP that consumes it) there may be some confusion. Also, is this cause for a security concern? Can a jr.dev add schedule entries to the metadata table via an insert right and bypass any security on scheduled job creation? Whose credentials are these jobs running under?

    fwiw - i was expecting to read an article about problems/solutions involving parallel execution and scheduled jobs. Something along the line of "T-SQL with a singleton usage expectation got scheduled twice... it took {too long} to find/fix the deadlocks that were killing performance."

  • gfish (4/29/2013)


    I am not just speaking hypothetically. We added an almost identical process to our system years ago and we are now spending a lot of time getting rid of it because it is a constant headache. The job queue table backs up regularly on a busy system, and even when it is not backed up it introduces load that competes with user interactive processes. The idea of "do it later" doesn't work when the system is so busy processing interactive requests from users that there is no "later" during working hours.

    What have you replaced it with?

    I am looking to use it to run a load of stored procs first thing in the morning running parallel (the way i would code it also means that some rely on others). Currently they run concurrently but this isnt needed for much process.

    Dan

  • danielfountain (4/29/2013)


    gfish (4/29/2013)


    I am not just speaking hypothetically. We added an almost identical process to our system years ago and we are now spending a lot of time getting rid of it because it is a constant headache. The job queue table backs up regularly on a busy system, and even when it is not backed up it introduces load that competes with user interactive processes. The idea of "do it later" doesn't work when the system is so busy processing interactive requests from users that there is no "later" during working hours.

    What have you replaced it with?

    I am looking to use it to run a load of stored procs first thing in the morning running parallel (the way i would code it also means that some rely on others). Currently they run concurrently but this isnt needed for much process.

    Dan

    We replaced it by simply looking at the real need for what was being done in the background jobs and deciding that it probably didn't need to be done at all. Without boring you with a lot of details, let's just say that we were spending a lot of time generating extremely accurate information when a simple approximation would have been sufficient. I know that this is not always possible and you may have looked at this already. I was just cautioning against over-use of this method as a way to avoid fixing bad processes.

    For what you are doing it's probably fine, as long as "first thing in the morning" means before working hours. What we did (and what I was cautioning against) caused queued jobs to be added throughout the day, primarily by triggers. Users performing common tasks were unknowingly adding to the pile of queued jobs even though the system seemed to respond quickly to their request.

    I would say that as long as the process is initiated by some sort of scheduled or manual administrative action and there is a reasonable certainty that all of the jobs will be completed before they start competing with users for server time then you are probably OK.

  • There are a couple of other things to be aware of when using this method.

    The size of your SQLAGENT.OUT file will continue to grow at a much faster rate than it currently does. SQL Agent logs the deletion of every job, even if it was created with the auto-destruct parameter. You may need to purge the SQLAGENT.OUT file regularly. It also makes it much harder to look for actual useful information in the Agent log because you have to filter out all of the delete messages.

    We write commercial software, not an in-house proprietary system. In one instance this caused a problem with third-party monitoring software that a customer had installed on their server. One thing that was monitored was long-running Agent jobs. We created the auto generated jobs with a GUID in the name, so the monitoring software added a new entry to its own tables every time one of them exceeded the monitoring software's run-time threshold. There were eventually hundreds of thousands of rows in this table, and because you would only expect to find a few rows in this table it was not carefully indexed. This caused the monitoring software to gradually consume more and more server resources. This lead to even more long-running jobs, which just made the problem grow faster.

  • i wrote a CLR Procedure to do a similar work:

    stpParallel_Processing:

    Parameters:

    SqlString Ds_Database, (Database Name)

    SqlString Ds_Procedure, (Procedure Name)

    SqlString Ds_Table, (Table Name)

    SqlInt32 Qt_Threads, (Numbers of Threads)

    SqlInt32 Fl_Return (Return status)

    the idea:

    1 - Create a custom procedure with parameters, which will process a unit of calc.

    2 - Create a table with columns (same name of procedures parameters)

    3 - Execute "stpParallel_Processing"

    the procedure will pick each line of "parameters table" and call the custom procedure in parallel (Using the number of threads passed in parameter and parameters suplied in table).

    it's working fine :), i'm using it to process variety of data instead of simple while loop.

    in some cases, the process time is twenty times faster

  • Wonderful.

    Share a website with you ,

    ( http://www.ccmalls.net/ )

    Believe you will love it.

    We accept any form of payment.

  • Wonderful.

    Share a website with you ,

    ( http://www.ccmalls.net/ )

    Believe you will love it.

    We accept any form of payment.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply