Blog Post

A Quick Queue Process

,

As SQLServerCentral grew, we evolved through a few email sending solutions to meet the demand. We started with a manual process, then went to a homegrown automated one, then moved through 2 purchased software solutions before coming back to a (new) homegrown one.

When we used to try and send 100,000 emails a night, scale became an issue. We had to complete the sending overnight so that when the load on the servers increased during the US workday, we would not be overloaded. However we also had to ensure that we didn’t mail multiple times to a person, so we had to build a solid, stable, reliable system.

We chose to implement email as a Queue system using a database table. This was prior to SQL Server 2005, though I’m not sure we would have used Service Broker if it had been later.

We stored all emails in a table like this:

create table Emails(

mailingid int identity(1,1),

datecreated datetime,

datesent datetime,

emailid int,

priority tinyint

recipient varchar(200),

sender varchar(50)

)

We would load this table with 100,000+ rows, one for each person receiving the email. We stored the actual email text in another table and joined on emailid to get that data.

Our sending process had a series of client machines that would query this table for a batch of rows, send a mail to each one, and then update this table. We built this for scalability as we could easily (and cheaply) add new client machines for sending mail. Now we just needed to handle concurrency issues.

Our first idea was to read the table, update some records, and repeat. So we’d do this

select top 100

emailid, recipient

from emails

where sender is null

and dateSent is null

set rowcount 100

update emails

set sender = ‘Client1’

where sender is null

and dateSent is null

set rowcount 0

-- processing on the client here

update emails

set datesent = getdate()

where emailid = x

This assumes that Client1 was connecting. Client2 would use that name in the update.

If you read this and have any experience with T-SQL, you’d quickly realize there’s an issue here. Between the SELECT and the UPDATE, another client could read those same rows. So we enclosed it in a transaction, which means you could UDDATE then SELECT, or reverse that.

However that causes a concurrency issue. With 2 clients it took a little time, but we did some testing and found that there was repeated blocking. That wasn’t an issue at the time, but it could easily have become a bigger issue as we added more machines and increased the volume of sending.

So we went back to the drawing board and came up with a new approach.

We changed our code to this:

set rowcount 100

update emails

set sender = ‘Client1’

where sender is null

and dateSent is null

set rowcount 0

select

emailid, recipient

from emails

where sender =’Client1’

and dateSent is null

Not much different but two significant changes in the SELECT. First, but doing the update, we essentially removed those 100 rows from the queue. The update is quick, and as soon as it’s finished the next machines can begin reading their own rows. SQL Server provides the locking which prevents any machine from overwriting any other machine’s rows, so the “marking” of them by name keeps them from being updated again.

The SELECT statement is now quicker, since the number of rows scanned is small. We index those fields, and the number set to the sender is low, so this is a less resource intensive query.

Note that I’m not sure if we’re doing things the same way now. When Red Gate bought the site, they took over development and I know they kept some pieces of this, but potentially they upgraded or changed some of this process. However for the 7 years I ran the site and worked on it’s development, this proved to be the best solution for us.

Recently there was also an article on SQLServerCentral on dealing with queues that takes a slightly different approach.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating