How to change the Scan to Seek and optimize

Question

How to change the Scan to Seek and optimize

sizal0234

SSCrazy

Points: 2272
More actions
July 6, 2020 at 5:22 pm

#3767766

Hello,
Attached sample query and actual plan, I want to further optimize the query and change the scan to seek?
Do I have to create an IDX or modify the query in this case? Please let me know thoughts

Attachments:
You must be logged in to view attached files.

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Mr. Brian Gale One Orange Chip Points: 25018 More actions · Answer 1

For others who want to help without downloading, here is the query:

USE Database1
set statistics io on;
set statistics time on;
 
SELECT
Object1.Column1, 
Object1.Column2, 
Object1.Column3,
Object1.Column4,
Object1.Column5,
Object1.Column6,
Object1.Column7,
Object1.Column8, 
Object1.Column9

FROMSchema1.Object2 Object1
INNER JOIN    
(SELECT Column1, Function1(Column3)Column3
FROM Schema1.Object2 GROUP BY Column1) Object3 ON Object1.Column1=Object3.Column1 AND Object1.Column3 = Object3.Column3

GO

My question is why do you want to change it from a SCAN to a SEEK? If you look at the execution plan, your SEEK has a higher estimated cost than the SCAN and had 8 executions vs the 1 for the scan. This is because your SCAN (the subquery) needs to get all of the rows from Object2 in order to compare it to the main query on Object2.

Blindly trying to change SCANs to SEEKSs may hurt performance or offer no benefit. If you NEED to look at all rows, a SCAN is often the best way to do it.

Adding indexes won't change a SCAN to a SEEK. Adding a WHERE clause on an indexed column should change the SCAN to a SEEK, but you will be filtering out your result set.

But your table has 8 rows; chances are this is not a query that is worth tuning any further than it already is. I expect this completes nearly instantly.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

sizal0234 SSCrazy Points: 2272 More actions · Answer 2

Thanks for your input, Yes the query is running in a second now, however the table is expected to grow over the time, from 8 hows to more rows. So, I am thinking to tune it as proactive measure..SCAN will be a problem at a later point when we load million record in this table. If not SCAN do you see any other things which can run this query more less mm secs? by where clause you do mean add where=null ( something like this) at the end of the query? please advise. Thanks!!

Mr. Brian Gale One Orange Chip Points: 25018 More actions · Answer 3

There is always something; it is just what you are willing to do.

Selecting fewer columns can improve performance, removing functions can improve performance, reducing the result set can improve performance.

Fewer columns may require a different index to get the best performance, but indexes are a tricky beast when doing performance tuning. Each index you add will slow down INSERT, UPDATE and DELETE performance.

Removing the function may help performance too as it is presumably a calculation being done on Column3 or possibly a lookup related to Column3 or it could just be returning Column3 after doing no real work. I cannot see that function, so it is hard to say if removing that will help or not. Depending on what Function1 does, you might be able rewrite the query without the INNER JOIN. You may get a performance boost by changing the INNER JOIN to a WHERE clause since you aren't actually using any columns in that subquery apart for filtering the result set.

Since you are not filtering the result set, but looking at every row, even if you do switch the SCAN to a SEEK, you are still going to be essentially doing a SCAN since you are looking at every row.

Reducing the result set (ie adding a WHERE clause and filtering out some of the data) will likely improve performance, but likely isn't what you are wanting to do. Heck, adding "WHERE 1=0" will improve performance but will give you no results.

Since your subquery has no WHERE clause, you are going to be looking at all rows in the table. The main query is doing a SEEK, but looking at every rows, so it's going to be essentially a SCAN. Since the query will need to look at all rows with every execution, as the table grows in size, it will get slower.

If you are wanting to remove the INNER JOIN, I THINK this query should be identical and should have similar performance (test it though):

USE Database1
set statistics io on;
set statistics time on;
 
SELECT
Object1.Column1, 
Object1.Column2, 
Object1.Column3,
Object1.Column4,
Object1.Column5,
Object1.Column6,
Object1.Column7,
Object1.Column8, 
Object1.Column9

FROMSchema1.Object2 Object1
WHERE Object1.Column3 = Function1(Object1.Column3)
GO

My understanding, the above query should be identical to what you had but only 1 lookup on the table. It'll still be a SCAN because we are needing to look at each and every row, but only a single table lookup SHOULD be faster than 2 table lookups.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

sizal0234 SSCrazy Points: 2272 More actions · Answer 4

Thank you!! for sharing the updated query, I will test it out for sure.

For your question about the functions. it is a MAX function..e.g. max(column3)

Sincerely!

Jeffrey Williams SSC Guru Points: 90351 More actions · Answer 5

That function is going to be a performance killer - no matter what. As it is used - it must be a scalar function and therefore needs to execute for every row.

If the function is required - then converting it to an inline-table valued function could possibly improve the performance. No way to tell for sure unless you post that function.

The subquery is performing a group by...not sure if that is necessary either, but it depends on the data and the requirement for this query. Since the query does not return any data from that subquery, it may be possible to convert that to an EXISTS instead of a join but again it will depend on what the function does.

Any way you can post that function and maybe some sample data?

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

Mr. Brian Gale One Orange Chip Points: 25018 More actions · Answer 6

If the function is just a MAX, I would change the query to be something more like this:

DECALRE @maxValue int;
SELECT @maxValue = MAX(column3)
FROM Schema1.Object2

SELECT Object1.Column1,
       Object1.Column2,
       Object1.Column3,
       Object1.Column4,
       Object1.Column5,
       Object1.Column6,
       Object1.Column7,
       Object1.Column8,
       Object1.Column9
FROM Schema1.Obeject2
WHERE Column3 = @maxValue

That should get you a bit of a performance boost as your function only needs to be run once. If you have an index on column3, that will help this query but likely hurt the performance of other queries. I am also assuming that Column3 is an INT... since it was a MAX, I just assumed it was an INT.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

sizal0234 SSCrazy Points: 2272 More actions · Answer 7

sizal0234

SSCrazy

Points: 2272

July 6, 2020 at 8:11 pm

#3767824

Yes, you are correct! it is PK, INT, not null.

Mr. Brian Gale One Orange Chip Points: 25018 More actions · Answer 8

oh nice. Then it is likely already sorted, so the query I gave SHOULD give you pretty good performance.

My opinion, that is likely the best you will be able to get. Offhand, I am not thinking of anything that could optimize that query any more apart from doing something incredibly strange. What I mean by incredibly strange is having a second table that ONLY holds the columns from Object2 where Column3 = MAX(Column3) and it would be populated by a trigger (after INSERT, UPDATE, DELETE). Which would make your query as fast as possible, but hurt the performance of INSERT, UPDATE and DELETE a lot and could cause deadlocks and blocking and will make end users grumpy.

That being said, if a close to real-time result is all you require, you could set up a job to call an SSIS package that pulls out the data from Obejct2 where Column3 = MAX(Column3) on a schedule. But, that too feels like it'll likely be overkill for what you are going for. Using the query I mentioned previously, you should get decent performance even with large data sets. And if you start getting slower performance as the max value for Column3 changes, you could add to the end:

OPTION (OPTIMIZE FOR (@maxValue UNKNOWN))

This will help with the parameter sniffing problem and (in the case of a stored procedure) does not need the recompile hint; that is presuming the above query is the slow bit.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

Jeffrey Williams SSC Guru Points: 90351 More actions · Answer 9

@Brian - your solution is not the same as the original query. The original query is joining to the max date for each column1 - not the max date across the table.

We would need to see the original query - and not the anonymized version. I suspect that the subquery is actually using the same table and is setup to get the latest row available. If that is the case, this can be done using ROW_NUMBER() OVER().

SELECT TOP 1 WITH TIES
       Object1.Column1,
       Object1.Column2,
       Object1.Column3,
       Object1.Column4,
       Object1.Column5,
       Object1.Column6,
       Object1.Column7,
       Object1.Column8,
       Object1.Column9
FROM Schema1.Object1
ORDER BY row_number() over(Partition By Object1.Column1 Order By Object1.Column3 desc)

You could also use a CTE to get the row number...

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

Mr. Brian Gale One Orange Chip Points: 25018 More actions · Answer 10

Thanks Jeffrey! You are right. My query is not going to work due to the comparison on Column1. My query would have only returned rows that had Column3 = MAX(Column3) which may result in some rows that should be included being excluded.

Jeffrey's query hits the nail on the head though (assuming I understand the anonymized query correctly).

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

Sergiy SSC Guru Points: 110209 More actions · Answer 11

The query does not have any WHERE condition, so there is nothing to SEEK there.

Aggregation query still has to go through every row in the table to return the result set.

Having an index on (Column1,Column3) might help a bit by reducing the amount of data to be scanned, but not much. Still will be Index Scan.

_____________
Code for TallyGenerator

How to change the Scan to Seek and optimize

Attachments: