When would be the order by faster ?

  • use DOMAIN_event

    go

    CREATE table email_destination_bk

    (

    msg_stub ut_stub not null ,

    acct_id int not null ,

    evt_stub ut_stub not null,

    evt_code nvarchar(20)

    )

    create clustered index idx on email_destination_bk ( evt_stub)

    CREATE table email_source_bk

    (

    acct_id int not null ,

    evt_stub ut_stub not null,

    [ut_stub] ut_stub not null,

    evt_code nvarchar(20)

    )

    create clustered index idx on email_source_bk ( acct_id, evt_code)

    ---------------------------------------------------------------------------

    truncate table email_destination_bk

    insert into email_destination_bk

    select ut_stub,acct_id , evt_stub , evt_code

    from email_source_bk

    where acct_id = 2000122

    order by evt_stub, acct_id

    truncate table email_destination_bk

    insert into email_destination_bk

    select ut_stub,acct_id , evt_stub , evt_code

    from email_source_bk

    where acct_id = 2000122

    order by acct_id,evt_code

    when order by will behave better , when is it according to source table ? or destination table ?

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • Bhuvnesh

    I'm not sure what you're asking, but in any case, why would you use an ORDER BY in a SELECT statement for an INSERT? This doesn't guarantee that the rows will be inserted in that order. Even if it did, you still couldn't guarantee that the rows will come out in that order... unless you use an ORDER BY!

    Hope that makes sense

    John

  • John Mitchell-245523 (8/9/2010)


    This doesn't guarantee that the rows will be inserted in that order.

    thanks

    Then in which case , "order by" will be relevant for insert statement.

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • When there's an identity column on the destination table. The order by will set the order of the identity. Nothing else.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Bhuvnesh

    What I meant is that you should use an ORDER BY when SELECTing, not when INSERTing. So instead of this:

    -- INSERT data into destination table

    insert into email_destination_bk

    select ut_stub,acct_id , evt_stub , evt_code

    from email_source_bk

    where acct_id = 2000122

    order by acct_id,evt_code

    -- SELECT data from destination table

    select ut_stub,acct_id , evt_stub , evt_code

    from email_destination_bk

    you'd do this:

    -- INSERT data into destination table

    insert into email_destination_bk

    select ut_stub,acct_id , evt_stub , evt_code

    from email_source_bk

    where acct_id = 2000122

    -- SELECT data from destination table

    select ut_stub,acct_id , evt_stub , evt_code

    from email_destination_bk

    order by acct_id,evt_code

    Bear in mind that if you want data stored in a table in a certain order then you need to create the appropriate clustered index on the table. Bear in mind also that even this doesn't absolutely guarantee that your SELECT statement will return the data from that table in that order, unless you include an ORDER BY clause.

    John

  • One of my Senior DBA told me that if we have "order by" clause according to source table then there would be page split during insertion as the data is not in the order as we have clustered key in destination table.but this would happen in case of heavy volume of data.

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • Tell your senior DBA that he should check his facts before making statements like that. It's trivial to prove he's wrong.

    CREATE TABLE TestingInsertOrder (

    ID UNIQUEIDENTIFIER PRIMARY KEY,

    Filler CHAR(50)

    )

    GO

    INSERT INTO TestingInsertOrder (ID)

    SELECT TOP (50000) NEWID()

    FROM master.sys.columns a CROSS JOIN master.sys.columns b

    I'm sure we'll all agree that there's no way that the newID will be generated in order? If what your senior DBA says is true, SQL will insert them in the random order they're generated. But check the execution plan.

    http://www.sqlservercentral.com/Forums/Attachment6774.aspx

    There's a sort operator in there, sorting the rows into the clustered index order prior to the insert.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Gail, reading between the lines, I think what the senior DBA is saying is that if you insert data that is already sorted in the order of the clustered index of the destination table, then SQL Server has less work to do when the data is inserted. However, the ORDER BY and the clustered index insert both use a sort operation, as your test shows, so I don't think it makes much difference. Bhuvnesh, you need to test and test this before deciding how to proceed - you'll be especially interested in what happens when there's already a lot of data in the destination table.

    John

  • John Mitchell-245523 (8/10/2010)


    Gail, reading between the lines, I think what the senior DBA is saying is that if you insert data that is already sorted in the order of the clustered index of the destination table, then SQL Server has less work to do when the data is inserted.

    The question is, how would SQL know that the data is in order if it doesn't put a sort in?

    In the initial post, neither of the order by statements matched the clustered index, so if SQL honoured those order bys, it would have to order twice during the insert.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Gail

    The first ORDER BY in the original post matches the clustered index, albeit with the unnecessary addition of a second column.

    I think you're right - a sort would be necessary. Maybe the sort would have to do less work if the data is already ordered. I don't know - I haven't tried it. This is why I said to Bhuvnhesh that he must test thoroughly.

    John

  • GilaMonster (8/10/2010)


    In the initial post, neither of the order by statements matched the clustered index

    Ok agree but see the below case with exec plan

    truncate table email_destination_bk

    insert into email_destination_bk

    select ut_stub,acct_id , evt_stub , evt_code

    from email_source_bk

    where acct_id = 2000122

    order by evt_stub

    truncate table email_destination_bk

    insert into email_destination_bk

    select ut_stub,acct_id , evt_stub , evt_code

    from email_source_bk

    where acct_id = 2000122

    order by acct_id,evt_code

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • Hey Bhuvnesh,

    My advice, generally, would be to omit the ORDER BY clause completely. The main reasons to include it are (a) to enforce the order of IDENTITY assignment in the destination table; or (b) if your SELECT query includes a TOP clause - in which case the ORDER BY would define the qualification of rows for the TOP operation.

    My advice is to leave the sorting decisions up to the optimiser. It will consider all interesting indexes on both source and destination tables before forming a query plan based on the estimated costs of the alternatives.

    The optimiser balances several factors including:

    1. A useful index on the source table will quickly locate the records that need to be inserted

    2. Inserts will be more efficient if the rows are presented in destination table clustered index order

    Factor 1 saves time and effort compared to a full table scan. How much time and effort is saved depends on how selective the source table conditions are.

    Factor 2 will minimise page splitting and random I/O on the destination table.

    Ideally, of course, the optimiser would be able to find a source table index that finds the rows to insert quickly *and* naturally presents the found rows in the clustered index order of the destination table. In that case, no sorts are needed at all.

    However, in your case, there is no such index - the clustered index on the source table can only produce rows sorted by (acct_id, evt_code). The destination table would prefer rows in evt_stub order.

    The plan chosen by the optimiser in these circumstances depends on estimated costs. If it estimates that the source table query will produce *very very* few rows, it will choose an index seek on the source, and just insert those into the destination table in the order they happen to arrive. This may cause a page split or two, and some random I/O, but the very small number of rows make this cheaper than doing a sort.

    For more than a *very very few* rows, the optimiser will typically introduce a sort operation. Rows that arrive from the source table query in (acct_id, evt_code) order will be sorted by (evt_stub) and then inserted into the destination table. The cost of the sort is more than compensated for by the savings in random I/O and page splitting.

    When you add an explicit ORDER BY clause to an INSERT statement, you are second-guessing the optimiser. In general, it is smarter than we are - or at least it has better information, and it can be bothered to do all the complex calculations to decide which of many alternative plans will work best.

    When you specify an ORDER BY, the optimiser will enforce that order at the end of the SELECT phase of the plan (the read cursor). If it happens to decide that a destination-order sort is worthwhile, it will re-sort the data it just sorted to respect your ORDER BY. That is a dumb outcome and you want to avoid that.

    Anyway, the two plans you uploaded illustrate the point nicely:

    In the query with ORDER BY (evt_stub) the optimiser produces a plan that produces rows from the source in (acct_id, evt_code) order. It then sorts those rows in (evt_stub) order (in accordance with your ORDER BY instructions). No further sorting is required because the order of the rows now match the destination clustered index. The estimated cost of this plan is 0.0246455

    In the query with ORDER BY (acct_id, evt_code) the optimiser spots that a clustered index seek on the source table will produce rows in that requested order, so no sort is needed at that stage. Now it looks to the insert. The rows are in the wrong order for the destination, so you might expect an explicit sort next. But, look at the estimated number of rows: just one. For such a small number of rows, the optimiser decides that it will be cheaper to just go ahead and insert the rows in a potentially unhelpful order. So, you see a plan with no sorts in it. The estimated cost of that plan is 0.0132842

    If there were a larger number of rows that matched acct_id = 2000122, the optimiser would almost certainly add a sort on evt_code to optimise the insert process - even if you specified no ORDER BY clause.

    Ok, so that's quite a long explanation, but it does explain what you see, and why I think you should omit the ORDER BY clause completely.

    One final reason: if anyone ever changes the clustered index order on the destination table, all your carefully-crafted ORDER BY clauses will be exactly wrong.

    Paul

  • @paul-2 : thanks for this informative description. 🙂

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • CELKO (8/11/2010)


    You really don't understand SQL yet and you keep posting the wrong kind of questions. Please get a basic RDBMS book read it until you have an epiphany.

    Sorry. I'll buy a book immediately! 😀

    SQL is a SET ORIENTED LANGUAGE. An insertion places an entire set, all at once, into the target table. An ORDER BY on the SELECT statement in an INSERT statement is meaningless.

    Well that may be true in theory, but specific implementations have specific behaviours which may or may not endear themselves to you. SQL Server does respect the ORDER BY clause in this scenario, as I hope I explained in my previous post.

    I also hope I have done something to dissuade Bhuvnesh from using it.

    Paul

  • Bhuvnesh (8/10/2010)


    @PAUL : thanks for this informative description. 🙂

    No worries. Ignore Joe's post if you can - and keep learning! 🙂

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply