Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

SQLRunner

This blog is syndicated from SSIS - SQL Server Tidbits(http://www.josefrichberg.com/)

Implement a stack in SQL Server using stored procedures

In this article I will show you how to implement a stack.  In programming, stacks are a great way to manage data or work.  Everyone pulls from the same pool and there is no duplication (once a unit is pulled off the stack, it is no longer available to any other process).  This can be accomplished in SQL Server using stored procedures.  We don't need semaphores, we have ACID transactions, which will come in handy.  If you will be attending my Virtual Presentation 'SSIS Workload Thread Balancing', this article will be a good foundation.

First thing we need to do is split the the workload into smaller units.  This will be specific to each process.  Second thing we need to do is provide a way to randomize the work we need to do and then distribute this work across as many threads as possible.  Since each unit of work can take a different amount of time, you want to minimize the chance of a single thread getting a majority or possibly all of the work that take a long time.  What's the point of having multiple threads if you have to wait for a single process to finish it's workload?  You want to spread the workload across as many threads as possible to reduce the total time to finish the overall task.

In my real world example I need to calculate sales information for reps per company.  I first create a table that I will later use to randomize the work.  I will use the HashBytes function of SQL Server to do this for me as a calculated field in a column.  This is a cryptographically secure function which will be excellent for randomizing the order of the rows (see Cryptographic Hash Functions).

 

 

 The Rep/Company combination is needed by other processes so we have to create an additional temporary table to extract and finalize the data.  I simply tacked on the Hash column as I developed the stack concept.  The temporary table looks like this:

 

We then load this table with the data from the first table (T002_RepAssignment), add an order by based upon the hash and voila, we have our stack.

The first part of implementing a stack is complete, we built our stack.  Now we need to build the process to emulate a 'pop'.  As a refresher, you 'pop' things off a stack, the opposite of a 'push', which is how you normally create a stack.  We can do this with a single stored procedure.  Let's dissect this procedure and see how it works.

The procedure returns two pieces of information, @rcount (which will actually be a 0 or 1) and the unit of work to be done, @query (which is a fully qualified sql statement).  What I do is send sql statements to the SSIS process to run.  This procedure builds those SQL statements, while using the SSIS framework to manage the threads.  The heart of the procedure is the blue highlighted section:A combination of Delete top(1) and output into a temporary table.  In one fell swoop I grab the first row in the table,  push it into a temporary table, then delete it.  No need for a begin/commit tran with a select followed by a delete.  Having multiple procedures running at the same time, they would only block for the delete and then continue on simultaneously.

There you have it.  The combination of using a cryptographic hash to randomize the data in the table (creating the stack), then using the Delete Top(1)/Output to pop the data off the stack in a quick and controlled manner.  

 

Comments

Posted by Steve Jones on 11 January 2010

I haven't seen a stack in years, and I wonder how many modern developers know what one is? This used to be a fundamental part of computer science, stacks, queues, managing your workload at a low level.

Posted by marc on 18 January 2010

This is a nice SQL pop() implementation, but this doesn't seem to be a stack.  It also may just be the hour, but I'm not sure I understand the need for randomizing the work order either.

Posted by mohd.nizamuddin on 18 January 2010

Very informative article. I have the following questions:

1. Do we need to create table (for stack) on the disk (could be ##temp table)?

2. When concurrent process call an SP which reads this stack, how does process come to know which has being processed currently?

Code snippet will help to understand the whole scenario.

Posted by Antonio Dias on 18 January 2010

I'm also not getting the  point of needin to randomize the work order. Is it to spread the workload across the threads?

If you know the type of work (like in your real-world example) you can get a rather good estimation of the time it will take to perform that work, no?

I'm a C# programmer first, so i'm probably wrong anyway... :)

Posted by sqlrunner on 18 January 2010

Marc,

 You could consider it a stack implementation since I 'push' the work onto the table and then pop it off.  It was more to get people into the mindset of what I was looking to accomplish.  Randomization of the work is just another step in keeping the workload evenly distributed among threads.

Posted by sqlrunner on 18 January 2010

Mohd,

 To answer your questions:

1.  You need to place the table(stack) some place where independent connections can access it, so it has to be either a ##table or static table (database.owner.table).

2. Take a look at the Delete Top(1)..That piece of code locks the table, pulls off the first record and then releases the lock.  The next sp is waitining on the delete and then once the lock is released, it can execute the Delete.

Posted by sqlrunner on 18 January 2010

Antonio,

 The desire to randomize the work is due in part to the fact that, in our case, the time for a unit of work to be done can change as we accumulate more data in our table (it accumulates lifetime sales).  This is a way to make sure you don't get a few work units that take a long period of time and they wind up being either the first out of the gate or occupy one or two threads.  If by chance a long running process happens to be the last out of the gate your entire process will take longer.  You want to sprinkle those long running work units throughout your framework.  If you know your work takes roughly the same time per unit, you don't need to randomize, no.

Posted by Joon on 18 January 2010

Hi Joseph

An interesting concept, although I am curious why you would want to implement this functionality yourself, when SQL Server already has a very powerful model for queueing built in, SQL Server Service Broker.

To me, that would be the way to go to handle queued work items on a database.

It is available in all versions of SQL Server (including express), and gives you a very powerful and flexible mechanism for queueng work, dealing with failures etc. it also allows you to specify activation stored procs on your queues that fire when a message comes in, and you can have multiple processing running simultaneously, which you can configure.

They also have a very interesting locking model, based around the concept of a conversation - basically only other messages in the same conversation are blocked, the entire queue is only very rarely locked.

Anyways, just my 2 cents :-)

Posted by sqlrunner on 18 January 2010

Hi Jdurandt,

 Service Broker is a very complex mechanism to implement, let alone tie it into SSIS.  This is a simple mechanism to improve the workload assigned to threads within SSIS.

Leave a Comment

Please register or log in to leave a comment.