Blog Post

Breaking Up is Easy to Do

,

Breaking Up is Easy to Do
Hard

It's nice to be able to package a process into a single, tidy, elegant query, but it isn't always possible to do so. And even if it is possible, it may not be the best way to do it. Often, we can get better performance out of large or complex queries by breaking them up into smaller pieces. I encountered a great example of this today.

A developer asked me about a query that was taking a really long time to process in the test environment. The particular step that was having issues is building a long string by concatenating short strings to pass through to a remote server for processing of data on the remote server. It was a simple, recursive string building query.

Declare @RowIDs nvarchar(max)

Select @RowIDs = IsNull(@RowIDs + ',', '') + '''' + RowID + ''''

From StagingTable with(nolock)

When there were only 12,000 ID's to concatenate, it ran in about 9 seconds. When the number increased to 18K ID's, it took twice as long, 18 seconds. At 20K ID's, it was taking 2 minutes, and now at 90K records, it is taking almost 1.5 hours. Obviously, the developer was not happy with performance and asked if I knew why the process time increased exponentially.

Upon observing the query run, I determined that the query was using a really large amount of memory, CPU time, and causing a lot of disk I/O. It was simply trying to do too much at once. So, I decided to see what would happen if I broke it down into smaller, more manageable chunks.

I set up a looping process to build several smaller strings. Then it would loop through the smaller strings and concatenate them together into the large, final string. The process could now build the string of 90K+ ID's, starting with short strings of 10K ID's per string, in 26 seconds. If I lowered the number of ID's in the initial set of shorter strings, it ran even faster. 5K of ID's per short string ran in 21 seconds and 1K of ID's ran in 19 seconds.

This is the final query I ended up with:

Declare @Rows nvarchar(max),

        @Loops int,

        @CurrLoop int,

        @MaxIDsPerLoop int

Declare @RowIDs Table (RowIDKey int identity(1, 1) not null primary key,

                        RowID nvarchar(15) not null)

Declare @RowIDRows Table (RowIDRowKey int identity(1, 1) not null primary key,

                        RowIDRow nvarchar(max) not null)

Set NoCount On

Set @MaxIDsPerLoop = 1000

Set @CurrLoop = 0

Insert Into @RowIDs (RowID)

Select RowID

From StagingDatabase with(nolock)

Select @Loops = (count(RowIDKey) / @MaxIDsPerLoop>) + 1

From @RowIDs

While @CurrLoop < @Loops

    Begin

        Set @Rows = Null

        Select @Rows = IsNull(@Rows + ',', '') + '''' + RowID + ''''

        From @RowIDs

        Where RowIDKey % @Loops = @CurrLoop

        Insert Into @RowIDRows (RowIDRow)

        Select @Rows

        Set @CurrLoop = @CurrLoop + 1

    End

Set @Rows = Null

Select @Rows = IsNull(@Rows + ',', '') + RowIDRow

From @RowIDRows

Select @Rows

Set NoCount Off

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating