RE: When would be the order by faster ?

SSC Guru

Points: 150467

August 10, 2010 at 3:26 pm

#1205064

Hey Bhuvnesh,

My advice, generally, would be to omit the ORDER BY clause completely. The main reasons to include it are (a) to enforce the order of IDENTITY assignment in the destination table; or (b) if your SELECT query includes a TOP clause - in which case the ORDER BY would define the qualification of rows for the TOP operation.

My advice is to leave the sorting decisions up to the optimiser. It will consider all interesting indexes on both source and destination tables before forming a query plan based on the estimated costs of the alternatives.

The optimiser balances several factors including:

1. A useful index on the source table will quickly locate the records that need to be inserted

2. Inserts will be more efficient if the rows are presented in destination table clustered index order

Factor 1 saves time and effort compared to a full table scan. How much time and effort is saved depends on how selective the source table conditions are.

Factor 2 will minimise page splitting and random I/O on the destination table.

Ideally, of course, the optimiser would be able to find a source table index that finds the rows to insert quickly *and* naturally presents the found rows in the clustered index order of the destination table. In that case, no sorts are needed at all.

However, in your case, there is no such index - the clustered index on the source table can only produce rows sorted by (acct_id, evt_code). The destination table would prefer rows in evt_stub order.

The plan chosen by the optimiser in these circumstances depends on estimated costs. If it estimates that the source table query will produce *very very* few rows, it will choose an index seek on the source, and just insert those into the destination table in the order they happen to arrive. This may cause a page split or two, and some random I/O, but the very small number of rows make this cheaper than doing a sort.

For more than a *very very few* rows, the optimiser will typically introduce a sort operation. Rows that arrive from the source table query in (acct_id, evt_code) order will be sorted by (evt_stub) and then inserted into the destination table. The cost of the sort is more than compensated for by the savings in random I/O and page splitting.

When you add an explicit ORDER BY clause to an INSERT statement, you are second-guessing the optimiser. In general, it is smarter than we are - or at least it has better information, and it can be bothered to do all the complex calculations to decide which of many alternative plans will work best.

When you specify an ORDER BY, the optimiser will enforce that order at the end of the SELECT phase of the plan (the read cursor). If it happens to decide that a destination-order sort is worthwhile, it will re-sort the data it just sorted to respect your ORDER BY. That is a dumb outcome and you want to avoid that.

Anyway, the two plans you uploaded illustrate the point nicely:

In the query with ORDER BY (evt_stub) the optimiser produces a plan that produces rows from the source in (acct_id, evt_code) order. It then sorts those rows in (evt_stub) order (in accordance with your ORDER BY instructions). No further sorting is required because the order of the rows now match the destination clustered index. The estimated cost of this plan is 0.0246455

In the query with ORDER BY (acct_id, evt_code) the optimiser spots that a clustered index seek on the source table will produce rows in that requested order, so no sort is needed at that stage. Now it looks to the insert. The rows are in the wrong order for the destination, so you might expect an explicit sort next. But, look at the estimated number of rows: just one. For such a small number of rows, the optimiser decides that it will be cheaper to just go ahead and insert the rows in a potentially unhelpful order. So, you see a plan with no sorts in it. The estimated cost of that plan is 0.0132842

If there were a larger number of rows that matched acct_id = 2000122, the optimiser would almost certainly add a sort on evt_code to optimise the insert process - even if you specified no ORDER BY clause.

Ok, so that's quite a long explanation, but it does explain what you see, and why I think you should omit the ORDER BY clause completely.

One final reason: if anyone ever changes the clustered index order on the destination table, all your carefully-crafted ORDER BY clauses will be exactly wrong.

Paul

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi