Order By Slowness

Question

Order By Slowness

SSCommitted

Points: 1898

December 7, 2020 at 7:42 pm

My query only returns a single shipment record, so it should be really quick. But with the ORDER BY clause it does an entire scan of the wspkglin clustered index. Any ideas? I feel like there may be an index I could use to speed it up. The query runs in less than a second without the ORDER BY clause, but takes about 4.5 sec with it.

Here is link to plan with ORDER BY clause : https://www.brentozar.com/pastetheplan/?id=Hy-oEZ2jw

Here is link to plan w/o ORDER BY clause: https://www.brentozar.com/pastetheplan/?id=ryLdr-3iP

declare @Shipment_no INT = 168726

SELECT TOP 1 ALT.phone_no AS phone_noAlt
,LIN.loc
,LIN.ord_no AS PPOrderNo
,LIN.Line_no
,LIN.item_no
,wspkgShipment.ship_to_name
,wspkgShipment.ship_to_addr_1
,wspkgShipment.ship_to_addr_2
,wspkgShipment.ship_to_addr_3
,wspkgShipment.ship_to_country
,HDR.Frt_amt
,HDR.sls_tax_amt_1 + HDR.sls_tax_amt_2 + HDR.sls_tax_amt_3 AS TotalTax
,HDR.Edi_fg
,AR.phone_no
,HDR.ship_to_name AS OeHdrShip_to_name
,HDR.ship_to_addr_1 AS OeHdrShip_to_addr_1
,HDR.ship_to_addr_2 AS OeHdrShip_to_addr_2
,HDR.ship_to_addr_3 AS OeHdrShip_to_addr_3
,HDR.ship_to_country AS OeHdrShip_to_country
,HDR.ship_to_city AS OeHdrShip_to_city
,HDR.Ship_to_state AS OeHdrShip_to_state
,HDR.Ship_to_zip AS OeHdrShip_to_zip
,HDR.ship_to_addr_4 AS OeHdrShip_to_addr_4
,wspkgShipment.ship_to_city
,wspkgShipment.Ship_to_state
,wspkgShipment.Ship_to_zip
,wspkgShipment.ship_to_country AS ship_country
,HDR.email_address AS Order_email
,AR.email_address AS Customer_email
,wspkgShipment.ship_to_addr_4
,HDR.bill_to_addr_4
,HDR.bill_to_city
,HDR.bill_to_state
,HDR.bill_to_zip
,HDR.bill_to_name
,HDR.bill_to_addr_1
,HDR.bill_to_addr_2
,HDR.bill_to_addr_3
,HDR.bill_to_country
,HDR.cus_no
,HDR.cus_alt_adr_cd
,HDR.ar_terms_cd
,HDR.ship_via_cd
,HDR.STATUS
,LIN.id AS ID
,HDR.Misc_amt
,HDR.frt_amt
,HDR.Oe_po_no
FROM wsPKG PKG WITH (NOLOCK)
INNER JOIN wsPKGLin LIN WITH (NOLOCK) ON PKG.PKG_ID = LIN.PKG_ID
INNER JOIN oeordhdr_sql HDR WITH (NOLOCK) ON LIN.Ord_no = HDR.ord_no
INNER JOIN wspkgShipment WITH (NOLOCK) ON PKG.Shipment_No = wspkgShipment.Shipment_No
INNER JOIN arcusfil_sql AR WITH (NOLOCK) ON AR.cus_no = HDR.cus_no
LEFT OUTER JOIN ARALTADR_SQL ALT WITH (NOLOCK) ON ALT.cus_no = HDR.cus_no
AND ALT.cus_alt_adr_cd = HDR.cus_alt_adr_cd
WHERE PKG.Shipment_No = @shipment_no
ORDER BY lin.id

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

Sergiy SSC Guru Points: 110209 More actions · Answer 1

How many records returned if you remove TOP 1?

You ask to return the only one record with the smallest Lin.id from the entire record set.

To find that record the engine must scan all the returned records and pick the one with the smallest Lin.id. Indexing is no help here, as you need to scan not the data in the table, but the subset of it returned by the query.

it might be a lot of extra work for the engine, if the record set returned by the query (without TOP 1) is a big one.

_____________
Code for TallyGenerator

Jackie Lowery SSCommitted Points: 1898 More actions · Answer 2

Oddly enough, rebuilding the indexes on the wspkg and wspkglin tables fixed the issue.

Jackie Lowery SSCommitted Points: 1898 More actions · Answer 3

Now I'm really confused. The problem seems to have come back. It's taking 4 to 5 seconds again. If I remove the TOP 1 part it returns 32 rows. When i remove the TOP 1 part, it also uses the plan that is much faster, taking only .5 sec. Why would the order by cause it to slow down so much?

This reply was modified 5 years ago by Jackie Lowery.

Mr. Brian Gale SSC-Insane Points: 24988 More actions · Answer 4

That is also a lot of NOLOCK hints... are you sure you need those?

I wonder if you'd get a performance boost by dumping the data to a temp table or table variable to get the 32 records, then selecting the top 1 ordered by the ID? Or possibly even doing it as a nested select and put the ORDER BY on the outside of the nested select?

Looking at the 2 plans, one is doing a SEEK and one is doing a SCAN on the wsPKGLin table which is where the difference comes in. 1 row returned vs 2 million rows returned. Now, as to why the ORDER BY is doing that, I am not certain, but I think it is related to your FROM clauses and joins being different in the 2 plans as well. With the ORDER BY:

FROM wsPKGShipment WITH (NOLOCK)
INNER JOIN wsPKG PKG WITH (NOLOCK)
 INNER JOIN wsPKGLin LIN WITH (NOLOCK) ON PKG.PKG_ID = LIN.PKG_ID
 ON wspkgShipment.Shipment_No = PKG.Shipment_No
INNER JOIN oeordhdr_sql HDR WITH (NOLOCK) ON LIN.Ord_no = HDR.ord_no
INNER JOIN arcusfil_sql AR WITH (NOLOCK) ON AR.cus_no = HDR.cus_no
LEFT OUTER JOIN ARALTADR_SQL ALT WITH (NOLOCK) ON ALT.cus_no = HDR.cus_no
 AND ALT.cus_alt_adr_cd = HDR.cus_alt_adr_cd
WHERE wsPKGShipment.Shipment_No = @shipment_no

First Red flag I see - the first JOIN has no "ON". If you look at the plan without the ORDER BY, the FROM clause looks like this:

FROM wsPKG PKG WITH (NOLOCK)
INNER JOIN wsPKGLin LIN WITH (NOLOCK) ON PKG.PKG_ID = LIN.PKG_ID
INNER JOIN oeordhdr_sql HDR WITH (NOLOCK) ON LIN.Ord_no = HDR.ord_no
INNER JOIN wspkgShipment WITH (NOLOCK) ON PKG.Shipment_No = wspkgShipment.Shipment_No
INNER JOIN arcusfil_sql AR WITH (NOLOCK) ON AR.cus_no = HDR.cus_no
LEFT OUTER JOIN ARALTADR_SQL ALT WITH (NOLOCK) ON ALT.cus_no = HDR.cus_no
 AND ALT.cus_alt_adr_cd = HDR.cus_alt_adr_cd
WHERE PKG.Shipment_No = @shipment_no

The join order is also completely different between the two queries. I expect that is much more likely to be what is causing your performance differences than removing the ORDER BY.

Now if that was more of a copy-paste error, could you paste the 2 plans that are the same except for the ORDER BY clause?

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

Jackie Lowery SSCommitted Points: 1898 More actions · Answer 5

I finally found what i was doing that had sped up the query. I rebuilt the index manually. Normally, my indexing is done by Ola Hallengren's maintenance scripts, but it appears a small portion of the rows are sampled, only 67K of 2.6M. When i rebuild the index manually, all of the rows are sampled. Looking to see if I can sample all of the rows with the Ola scripts.

Jackie Lowery SSCommitted Points: 1898 More actions · Answer 6

I found that i can add:

@StatisticsSample = 100 ,

to the script. I run this script on Fridays at 10PM. Is there any reason I shouldn't set the sample size manually to 100%?

Sergiy SSC Guru Points: 110209 More actions · Answer 7

Why do you think you need rebuilding the indexes?

_____________
Code for TallyGenerator

Jackie Lowery SSCommitted Points: 1898 More actions · Answer 8

I may not have had to rebuild them, but doing a full sample on the statistics definitely fixed the slowness.

Should I have to do a full sample on the statistics every week?

This reply was modified 5 years ago by Jackie Lowery.

Sergiy SSC Guru Points: 110209 More actions · Answer 9

There are some questions I'd like you to answer in order for me to help you.

Is there Customer Number or any other direct reference to HDR in wspkgShipment ?
Can you post CREATE INDEX statements for all indexes on the tables?
Are PKG.PKG_ID and LIN.PKG_ID of the same data type?

Can you run this query

SELECT COUNT(*)
FROM wspkgShipment SHP
INNER JOIN wsPKG PKG ON PKG.Shipment_No = SHP.Shipment_No
--INNER JOIN wsPKGLin LIN ON PKG.PKG_ID = LIN.PKG_ID
--INNER JOIN oeordhdr_sql HDR ON LIN.Ord_no = HDR.ord_no
--INNER JOIN arcusfil_sql AR ON AR.cus_no = HDR.cus_no
--LEFT OUTER JOIN ARALTADR_SQL ALT ON ALT.cus_no = HDR.cus_no
--AND ALT.cus_alt_adr_cd = HDR.cus_alt_adr_cd
WHERE SHP.Shipment_No = @shipment_no

uncommenting lines one by one and say what time does it take to execute on each occasion?

You are using TOP 1 and you say the query returns 32 rows without it. What do you do with the remaining 31 rows?

_____________
Code for TallyGenerator

Sergiy SSC Guru Points: 110209 More actions · Answer 10

Jackie Lowery wrote:

I may not have had to rebuild them, but doing a full sample on the statistics definitely fixed the slowness.
Should I have to do a full sample on the statistics every week?

Your indexing obviously does not really help the query.

You need to adjust either of them to fit another.

_____________
Code for TallyGenerator

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 11

Jackie Lowery wrote:

I may not have had to rebuild them, but doing a full sample on the statistics definitely fixed the slowness.
Should I have to do a full sample on the statistics every week?

From what I've seen of other people's tests, it takes about the same amount of time to rebuild stats at 25% as it does at 100%. 25% can also miss a whole lot for skewed samples and for IDENTITY columns.

To wit, I selectively rebuild stats using a full scan on a couple of my production databases every night and the rest once per week. It's a long story but I also don't do regular index maintenance and I certainly do use the current but faulty industry standard supposed "Best Practices" for index maintenance and haven't done so (nor needed to) since Monday, 18 Jan 2016.

My "selectivity" is that if a statistic has been rebuilt in the last 24 hours, then don't rebuild it again until 24 hours has passed. It's not perfect but it's been working fantastic for me, especially on large tables.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 12

First, Do you always run the query for a specific PKG.Shipment_No?

Second, could you please post the execution plan?

Jackie Lowery SSCommitted Points: 1898 More actions · Answer 13

Jeff Moden wrote:

Jackie Lowery wrote:
I may not have had to rebuild them, but doing a full sample on the statistics definitely fixed the slowness.
Should I have to do a full sample on the statistics every week?
From what I've seen of other people's tests, it takes about the same amount of time to rebuild stats at 25% as it does at 100%. 25% can also miss a whole lot for skewed samples and for IDENTITY columns.
To wit, I selectively rebuild stats using a full scan on a couple of my production databases every night and the rest once per week. It's a long story but I also don't do regular index maintenance and I certainly do use the current but faulty industry standard supposed "Best Practices" for index maintenance and haven't done so (nor needed to) since Monday, 18 Jan 2016.
My "selectivity" is that if a statistic has been rebuilt in the last 24 hours, then don't rebuild it again until 24 hours has passed. It's not perfect but it's been working fantastic for me, especially on large tables.

Right now I'm only rebuilding the index and doing a full scan on the stats on Fridays, so should i should be good right?

Sergiy SSC Guru Points: 110209 More actions · Answer 14

Queries like this against the data structured this way will always be prone to bad plans.

Some times statistics suggest optimizer to choose key lookup, some times it will be index (table) scan.

If you could link Shipment directly to OrderHeader and then join to account info and drill down to OrderLine and Package - then the query would follow the structure of the data and full scans would never be considered by the optimizer.

So, once again - do you have any Order ID stored in Shipment table?

_____________
Code for TallyGenerator