Where clause prevents index seek

Question

Where clause prevents index seek

PHXHoward

SSCrazy Eights

Points: 8388
More actions
November 23, 2015 at 11:38 am

#322430

A vendor script is doing a query like this:
use databaseABC
select column5 from table10
where '_' + cast(pk as varchar) = '_10000'
pk is and integer primary key so it has a clustered index. Due to the where clause, it is doing a clustered index scan instead of a seek.
I can modify the database but not the query. Is there any kind of index or other modification that would help here?
Thanks

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 1

PHXHoward (11/23/2015)
A vendor script is doing a query like this:
use databaseABC
select column5 from table10
where '_' + cast(pk as varchar) = '_10000'
pk is and integer primary key so it has a clustered index. Due to the where clause, it is doing a clustered index scan instead of a seek.
I can modify the database but not the query. Is there any kind of index or other modification that would help here?
Thanks

My question would be why is the vendor adding an underscore to the numeric value? If you want to improve the query strip the leading underscore and convert the value to an int.

Not knowing the details of the table (number of columns, data length of the columns), there really isn't much you can do without modifying the query itself. Leaving the query alone, creating a persistent computed column converting the pk to varchar and prepending an underscore, and then creating a nonclustered index on that column would simple take the query from a clustered index scan to a nonclustered index scan with a lookup IF SQL Server thought it would be faster than a clustered index scan.

PHXHoward SSCrazy Eights Points: 8388 More actions · Answer 2

They are using it to create an export of the column to a text file. The _ + primary key is how they want to label the file name that is created.

There are 5.6 million rows in the table and it is taking over 1 second per row. Query cost is 36.

The DTA suggests creating a new non clustered index with PK as the column. After that, running the query does a non clustered index scan and a key lookup. It drops the query cost from 36 to 16.

Why is it faster to make a non clustered index of the same column as the clustered index and have it scan that and then do a lookup instead of just scanning the clustered index?

PHXHoward SSCrazy Eights Points: 8388 More actions · Answer 3

I created a persisted computed column to matchthe where line.

alter table table10 add merged as '_' + cast(pk as varchar) persisted

create index test2 on table10 (merged asc)

The query plan does not use it.

ScottPletcher SSC Guru Points: 100946 More actions · Answer 4

That's extremely bizarre. They can just do:

WHERE pk = '10000'

and if pk is a numeric type, SQL will convert the 10000 to that type and be able to do a seek. Query labels/naming are 100% irrelevant to this WHERE condition(!).

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Luis Cazares SSC Guru Points: 183698 More actions · Answer 5

The problem in here is the query. Other than adding hardware, there's no other viable solution.

If they're labeling the file name with an underscore, it's fine, but they don't need to include the underscore in the where clause.

Why is it faster to make a non clustered index of the same column as the clustered index and have it scan that and then do a lookup instead of just scanning the clustered index?

Because the nonclustered index conatins only the pk column, while the clustered index is the whole table. In other words, it needs to read less pages. If you include the columns used in the query as covering columns, it might avoid the lookup, but it will also take more space in the leaf pages.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 6

Did you still have the other nonclustered index? Then, yes it would still use that index and it is fine. The reason it is faster is there is more data per page than on the clustered index (the table itself). Still, there are better ways to accomplish what they want without going through the hassles they are going through from looking at that query.

PHXHoward SSCrazy Eights Points: 8388 More actions · Answer 7

This is very interesting (to me).

I deleted the other NCI so that I only have the CI and the also NCI that is based on the persisted computed column. If I run the query, it picks to do a slow clustered index scan. If I run the query with (RECOMPILE), it does a very fast nci seek and lookup. If I take with (RECOMPILE) off, it will unfortunately revert to the slow plan.

Any idea why and if I can force it to recompile each time or else use the better plan? I can't change the query that the export script runs.

Thanks for the help!

Howard

Jason-299789 SSC-Insane Points: 21601 More actions · Answer 8

I suspect that the engine is reverting to a cached plan, and reading BoL in regards to the RECOMPILE query hint

Instructs the SQL Server Database Engine to discard the plan generated for the query after it executes, forcing the query optimizer to recompile a query plan the next time the same query is executed. Without specifying RECOMPILE, the Database Engine caches query plans and reuses them. When compiling query plans, the RECOMPILE query hint uses the current values of any local variables in the query and, if the query is inside a stored procedure, the current values passed to any parameters

it seems to back this hypothosis up.

Can anyone with more knowledge on the query engine internals confirm my understanding is correct.

_________________________________________________________________________
SSC Guide to Posting and Best Practices

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 9

PHXHoward (11/23/2015)
They are using it to create an export of the column to a text file. The _ + primary key is how they want to label the file name that is created.
There are 5.6 million rows in the table and it is taking over 1 second per row. Query cost is 36.
The DTA suggests creating a new non clustered index with PK as the column. After that, running the query does a non clustered index scan and a key lookup. It drops the query cost from 36 to 16.
Why is it faster to make a non clustered index of the same column as the clustered index and have it scan that and then do a lookup instead of just scanning the clustered index?

What are they using to facilitate the export? SSIS?

"The _ + primary key is how they want to label the file name that is created" Only one row per file?

What is the association between the WHERE clause and the file name?

Why can't you rewrite their awful script?

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 10

ChrisM@Work (11/24/2015)
PHXHoward (11/23/2015)
They are using it to create an export of the column to a text file. The _ + primary key is how they want to label the file name that is created.
There are 5.6 million rows in the table and it is taking over 1 second per row. Query cost is 36.
The DTA suggests creating a new non clustered index with PK as the column. After that, running the query does a non clustered index scan and a key lookup. It drops the query cost from 36 to 16.
Why is it faster to make a non clustered index of the same column as the clustered index and have it scan that and then do a lookup instead of just scanning the clustered index?
What are they using to facilitate the export? SSIS?
"The _ + primary key is how they want to label the file name that is created" Only one row per file?
What is the association between the WHERE clause and the file name?
Why can't you rewrite their awful script?

I agree. If you can make changes to the database tables, why can't you fix their queries?

PHXHoward SSCrazy Eights Points: 8388 More actions · Answer 11

Ok, I'll explain.

We have a product that the company has been using since the late 1980s. They are switching to another product to replace it. The new product uses Oracle. The vendor for the new system tells us that they have no SQL Servers in their data center so I can not just send them a .bak file and let them do the export/import.

The developer wrote a process to extract the tables in the SQL Server database to csv that I can zip up and send to them.

The process that the developer created runs a script in sql server. I save the output to a text file that I run as a batch file. The batch file uses bcp to dump each table to a csv file. It is very quick.

The catch is that data in very large "text" fields is skipped in the initial run. The second part of the script does a bcp for every row to put the text field into a file for each row. They have created a directory for each table and they use bcp to extract those very large fields to later merge back into the Oracle database.

For example

file 1 D:\Export\table1\<text column name>_<pk name>

file 2 D:\Export\table1\<text column name>_<pk name>

file 3 D:\Export\table1\<text column name>_<pk name>

etc...

The table has 5.7 million rows in it. The batch file has pre-created 5.7 million lines using that query provided at the top of this thread. Due to its construction, is trying to do very slow scans.

The text file is almost 2GB so I can't open it and modify the query 5.7 million times.

Anyway, that's what I'm stuck with. I was hoping to speed it up. The suggestion that Lynn gave about a persisted computed column does speed it up a lot but it only uses that plan if I use option (RECOMPILE). I was hoping there was a plan guide or something that would get it to use the better plan.

I should not be reengineering their export process so that is why I can't change the query. Instead I'm just trying to get it done.

Thanks for all the help.

Luis Cazares SSC Guru Points: 183698 More actions · Answer 12

That doesn't explain why would the query use the underscore in the WHERE clause.

If this is an export for a migration and not a permanent process, I suggest that you change the query to make it SARGable, or demand them to do it.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

PHXHoward SSCrazy Eights Points: 8388 More actions · Answer 13

Absolutely the query should be rewritten but it is TG week and the person is out of the office all week.

It would be useful for me to learn something about working around an unchangeable query. 🙂

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 14

PHXHoward (11/24/2015)
Absolutely the query should be rewritten but it is TG week and the person is out of the office all week.
It would be useful for me to learn something about working around an unchangeable query. 🙂

Thanks for the explanation.

What proportion of rows have a non-empty value in the text column?

"The text file is almost 2GB so I can't open it and modify the query 5.7 million times."

What are you proposing to do with it? Have you tested to ensure it will work?

Wouldn't it be more efficient to write one small batch to run against 5.7 million rows, exporting each non-empty text column value to file?

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden