Optimizing ORDER BY query

  • Hi,

    In my SSIS Package, I have a query which is using a combination of views to retrieve records to insert into a stage table. The query retrieves millions of rows and is running very slow. On analysis I figured out that the order by clause is the one which is causing the query to run for hours. The order by is on columns from the different views.

    This ordering is necessary and unfortunately , I cannot put any indices as these views belong to a different team. The only solution I can think of is removing the order by from the current query and then inserting the rows to another stage table by ordering on the first stage table.

    Is there any other approach I can take...

  • ORDER BY doesn't seems logical on an ETL process. Why would you need it?

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • The sproc which has the business logic relies on ordering . The stage table has an identity column called id and there is a comparison logic which has a self join for the stage table and compares id =id+1

  • Is that id an identity column? Could you remove the identity property to be able to update it with a ROW_NUMBER() by the natural key? Or simply use the ROW_NUMBER in the code? Identity columns are not safe to use because they can have gaps.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • There's a 3rd party SSIS component that assigns sequential numbers to data in a task. Can you not use that instead, possibly with an SSIS sort. May be faster, may not.

    Sorting is a very expensive operation. Short of more CPU, more memory, there's not really that much which can be done.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Arijit, from a high level view, do I understand what you're doing here correctly?

    SourceSystem Proc with order by on output

    -> OleDB Data Source in SSIS with advanced editing and changed IsSorted = True and set your Sort Key Positions

    -> Flows through task to an OleDB output to a staging table with an identity column that is truncated and reset for each time this is run

    ->You run a final query for dealing with your warehouse/target that works only against the destination staging table for what was ordered on insert this last pass, and need to have it do things in the target system.

    That sound about right?

    Edit: Forgot to add, try to avoid SSIS Sorting or Aggregations at all costs unless you absolutely have to. The SQL Engine is much better at it than SSIS is.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Actually it is not a sp but a inline query with order by in the OLEDB source. The query is using 3 views in the join condition which belong to a different team . Hence I cannot making any changes to those views.

    As soon as I remove the order by the query starts populating the data into the Staging table instantly . Sometmes the number of rows are 200-400 million.

    Regarding the Issorted property and the sortkeyposition I tried using that but that actually does not help a great deal as it tells SSIS that the data is pre-sorted .

    Regarding the final business logic sproc that generates the output extracts yes I need the order by because it is comparing id with id+1 and doing some manipulations in a cursor. Again I dont want to touch this sproc for now .if optimization can be done at the extraction level then it would be good for starters.

  • arijit rath (6/23/2014)


    Actually it is not a sp but a inline query with order by in the OLEDB source. The query is using 3 views in the join condition which belong to a different team . Hence I cannot making any changes to those views.

    As soon as I remove the order by the query starts populating the data into the Staging table instantly . Sometmes the number of rows are 200-400 million.

    Regarding the Issorted property and the sortkeyposition I tried using that but that actually does not help a great deal as it tells SSIS that the data is pre-sorted .

    Regarding the final business logic sproc that generates the output extracts yes I need the order by because it is comparing id with id+1 and doing some manipulations in a cursor. Again I dont want to touch this sproc for now .if optimization can be done at the extraction level then it would be good for starters.

    The ORDER BY may be making one or more of the views fully materialize. Your original suggestion of put the data into a working table and sorting that seems to be the best option at this point,

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply