TOP in SQL Server 2005

,

Recently someone asked me about what had become my feature in SQL 2005 and if

it was the feature I thought it would be. That's really a pretty good question

since it's easy to get excited about the 'wow' and overlook the more mundane.

Going into the release my favorite feature that I was looking forward to was the

ability to modify replicated columns (if you've done replication I'm sure you

share my enthusiasm for that feature!). My path has taken a twist and turn since

then so that I don't use replication daily. Instead I find that out of all the

new things the one I seem to use most is Top N in updates and deletes (followed

closely by the output clause). No, not exciting, but definitely practical.

Assuming you've been using SQL 2000 for a while you know that you can use Top

N in selects, either as a number or a percentage. For example, you might write

this:

select top 100 * from sometable

What you couldn't do was parameterize it. It only worked with selects too. If

you wanted to restrict the number of rows affected by a select at runtime, or

restrict the rows affected by an update or delete you

had to explicitly describe the rows using a join or where clause, or you could

cheat a bit by using ROWCOUNT, like this:

set rowcount 1000
delete from table where somefield=1
set rowcount 0

It worked well enough (and still does in SQL 2005). The downside was a bit of

a clunky syntax and it created the risk that somehow rowcount did not get reset back to 0,

not a good thing since it is a session based setting. Forgetting to change it

back would mean that all the statements would be restricted to 100 rows. Not the

best thing if you're trying to select all employees that need a paycheck. SQL 2005 let's you use Top N in selects, plus updates and deletes

- but the syntax has some quirks as you'll see in a second.

For select statements you must specify parentheses if you are passing

a parameter, otherwise they are optional

Select Top (@TopN) * from employees
Select Top 10 * from employees

When doing an update or delete, you have to use the parentheses in both

cases:

Delete Top (@TopN) from employees
Delete Top (10) from employees
update Top (@TopN) employees set payrate = payrate * 1.1
update Top (10) from employees set payrate = payrate * 1.1

So a simple rule to follow would be to always wrap the 'N' in parentheses!

The most common reason I've had to do restrict the number of rows has been

when doing operations against millions of records. Updating millions of records

can not only cause blocking, but it can bloat your transaction log even if

you're in simple recovery mode (because it has to log the entire transaction).

Doing the update in batches doesn't guarantee no blocking of course, but it

should reduce the time the query takes to complete. It often takes some

experimenting to figure out the optimum batch size. I suggest making sure that

the query will complete in less than 30 seconds, as that is typically the

default timeout for a database action. Even if you're in full recovery mode

there is an advantage to doing chunks. If the size of your transaction won't fit

into the current virtual log segment the log will grow (or you'll come to a

halt!). By doing batches and running log backups you can easily keep the log

from growing to a huge size.

I've shared my favorite feature and an application for it as well - how about

writing about your favorite feature and submitting it for the SQL world to see?

Rate

4.83 (6)

Share

Share

Rate

4.83 (6)