Top 1 makes very fast query into very slow query

Question

Top 1 makes very fast query into very slow query

pdanes

SSCrazy Eights

Points: 8502
More actions
April 13, 2021 at 3:00 pm

#3870557
I have a very simple query:
```
SELECT  AkcesAutoID  FROM
( SELECT
P.AkcesAutoID, A.AkcesitPred, A.Akcesit, A.Rok
FROMdbo.Podrobnosti P
INNER JOINRO.fntStringsToTable ('J') Ltrs on Ltrs.EvidenceLetter = P.EvidenceLetter
INNER JOINdbo.Akces A ON P.AkcesAutoID = A.AkcesAutoID) PP

Order By PP.Rok, PP.AkcesitPred, PP.Akcesit
```
It returns 281 records, from a quite modest set of tables: Podrobnosti around 120,000 records, Akces around 60,000 and the fntStringsToTable TVF exactly one record, and does so in a fraction of a second. But when I change the query to return only the first record, and NO other changes, the query goes nuts.
```
SELECT Top 1 AkcesAutoID  FROM
( SELECT
P.AkcesAutoID, A.AkcesitPred, A.Akcesit, A.Rok
FROMdbo.Podrobnosti P
INNER JOINRO.fntStringsToTable ('J') Ltrs on Ltrs.EvidenceLetter = P.EvidenceLetter
INNER JOINdbo.Akces A ON P.AkcesAutoID = A.AkcesAutoID) PP

Order By PP.Rok, PP.AkcesitPred, PP.Akcesit
```
Execution time jumps to several minutes, and the execution plan is right out of the twilight zone - almost 600 MILLION records returned, cost percentages over one million percent, and so on. I've found a number of beefs with a similar theme on the net, but no definite solution. They all say essentially, "Try wording the query differently and see what happens."
I did try different things, and swapping the TVF join for a WHERE clause fixes the problem, but I also see many posts claiming that JOINs are generally preferred over WHERE clauses. And it doesn't make sense anyway. I already have the sort in place - the recordset built is identical in both versions. All I want is to return only the first record, rather than all 281 of them. Can anyone shed some light on this?
- This topic was modified 4 years, 8 months ago by pdanes.
Attachments:
You must be logged in to view attached files.

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply

ScottPletcher SSC Guru Points: 101222 More actions · Answer 1

Yeah, seems like a bug in SQL optimizer. I'd make sure SQL didn't use a LOOP join (gack!) for the two main tables. I think the hint will also force the main tables to be joined before the Ltrs table. Typically EXISTS() is more efficient than JOIN, assuming only one row in Ltrs, which seems to be the case.

( SELECT
P.AkcesAutoID, A.AkcesitPred, A.Akcesit, A.Rok
FROMdbo.Podrobnosti P
        INNER HASH JOIN     dbo.Akces A ON P.AkcesAutoID = A.AkcesAutoID
INNER JOINRO.fntStringsToTable ('J') Ltrs on Ltrs.EvidenceLetter = P.EvidenceLetter ) PP

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

pdanes SSCrazy Eights Points: 8502 More actions · Answer 2

Yep, that did it. Inner HASH Join, instead of Inner Join, and the query runs normally. The execution plan is still slightly more convoluted than without the Top clause, but the insanity is gone and the speed is back. Thank you - any idea what is going on, and how to avoid it in general?

This reply was modified 4 years, 8 months ago by pdanes.

pdanes SSCrazy Eights Points: 8502 More actions · Answer 3

It also doesn't matter where I put the HASH directive. Both of these work fine, and yield what looks to be the same execution plan:

INNER JOIN RO.fntStringsToTable ('J') Ltrs on Ltrs.EvidenceLetter = P.EvidenceLetter
INNER HASH JOINdbo.Akces A ON P.AkcesAutoID = A.AkcesAutoID) PP

INNER HASH JOIN RO.fntStringsToTable ('J') Ltrs on Ltrs.EvidenceLetter = P.EvidenceLetter
INNER JOIN dbo.Akces A ON P.AkcesAutoID = A.AkcesAutoID) PP

pdanes SSCrazy Eights Points: 8502 More actions · Answer 4

BTW, the TVF does not always have only one record. Usually one, but it is possible to have almost 50. However, that would be an extreme case, and probably an error in using the application, or one of my edge case tests. In actual service, it's mostly one, occasionally up to around four.

ScottPletcher SSC Guru Points: 101222 More actions · Answer 5

If it could ever have more than one, then it needs to be a JOIN, just like you have it.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

pdanes SSCrazy Eights Points: 8502 More actions · Answer 6

Well, I was thinking of replacing the entire JOIN clause with WHERE P.EvidenceLetter In('J') [or In('J','K','L')] and so on. I tried that in my initial tests, and it also made the problem go away, but I would have to change a fair amount of code that generates the dynamic SQL. Simply plopping in the HASH was a lot simpler.

But I have to wonder now, will I screw up some other queries that don't like the HASH directive? I have a bunch of testing ahead of me anyway, over some other development, so this is something I will be watching.

ScottPletcher SSC Guru Points: 101222 More actions · Answer 7

Yes, in theory could be less efficient in some cases than a LOOP join. But it's still reasonable performance. You'll never totally drop off a performance cliff going from LOOP to HASH, but you can vice versa (as you're seen).

Another way to do it, although it would then effect every join in the query and not just a specific join, is to use:

OPTION(HASH JOIN, MERGE JOIN)

MERGE joins are extremely efficient when applicable, so you don't want to prevent those. So this just prevents LOOP joins.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 8

Looking at the two plans, it seems that the TOP 1 version tries to avoid the sort. (With disastrous results.) Apparently it can read the correct sequence from the Akces table. I believe it is then doing a loop join to both the the Podrobnoti table and the fntStringsToTable function, assuming that it will get a "hit" fairly quickly. But it has to join the entirety of the the two tables before it can ever join to the function.

I know you already have a fix, but just out of curiosity, I would like to see what the table-valued function does. I assume it is a multiline table valued function function. At our shop, best practice is to avoid multiline table valued functions and to APPLY, not JOIN, inline table valued functions.

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Steve Collins SSCrazy Points: 2241 More actions · Answer 9

The query could be simplified, no? Maybe something like this

select top(1) p.AkcesAutoID
from dbo.podrobnosti p
     join ro.fntStringsToTable('J') ltrs on ltrs.EvidenceLetter = p.EvidenceLetter
     join dbo.Akces a on p.AkcesAutoID = a.AkcesAutoID
order by a.Rok, a.Akcesitpred, a.Akcesit;

"I was thinking of replacing the entire JOIN clause with WHERE..." Nah, I wouldn't do it. A SELECT statement is a procedural sandwich which is evaluated in a fixed order (in most cases). FROM, ON, and JOIN are evaluated 1st, 2nd, 3rd. My tendency is to load up as much as possible in the FROM clause because it seems to make the rest of the query simpler and easier to get right. It's also somewhat more self-documenting imo

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

pdanes SSCrazy Eights Points: 8502 More actions · Answer 10

The Dixie Flatline wrote:

Looking at the two plans, it seems that the TOP 1 version tries to avoid the sort. (With disastrous results.) Apparently it can read the correct sequence from the Akces table. I believe it is then doing a loop join to both the the Podrobnoti table and the fntStringsToTable function, assuming that it will get a "hit" fairly quickly. But it has to join the entirety of the the two tables before it can ever join to the function.
I know you already have a fix, but just out of curiosity, I would like to see what the table-valued function does. I assume it is a multiline table valued function function. At our shop, best practice is to avoid multiline table valued functions and to APPLY, not JOIN, inline table valued functions.

Here you go. It's pretty trivial - just turns a comma-delimited string into a single-column table. The individual string elements are one or two letters, indicating a collection catalog. Mostly people work within a single catalog, but a few operations, mostly searches, can cross catalog boundaries.

CREATE FUNCTION [RO].[fntStringsToTable] (@LtrFiltr varchar(1000))
RETURNS 
@Ltrs TABLE(EvidenceLetter varchar(2) not null
primary key clustered (EvidenceLetter)
)

with schemabinding
AS
BEGIN
declare @CommaPos int
while LEN(@LtrFiltr) > 0
begin
set @CommaPos = CHARINDEX(',',@LtrFiltr,1)
if @CommaPos > 0
begin
insert @Ltrs (EvidenceLetter) Select left(@LtrFiltr,@CommaPos-1)
set @LtrFiltr = SUBSTRING(@LtrFiltr,@CommaPos+1,len(@LtrFiltr)-@CommaPos+1)
end
else
begin
insert @Ltrs (EvidenceLetter) Select @LtrFiltr
set @LtrFiltr = ''
end
end
return
END

pdanes SSCrazy Eights Points: 8502 More actions · Answer 11

ScottPletcher wrote:

Yes, in theory could be less efficient in some cases than a LOOP join. But it's still reasonable performance. You'll never totally drop off a performance cliff going from LOOP to HASH, but you can vice versa (as you're seen).
Another way to do it, although it would then effect every join in the query and not just a specific join, is to use:
OPTION(HASH JOIN, MERGE JOIN)
MERGE joins are extremely efficient when applicable, so you don't want to prevent those. So this just prevents LOOP joins.

Wow, didn't even know you could do that. I'm fairly competent with basic SQL, but there are so many variations, and new elements being added all the time. Makes my head spin trying to keep up. Thank you, I'll give that a try as well.

Steve Collins SSCrazy Points: 2241 More actions · Answer 12

You could maybe replace the whole WHILE LOOP'ing function with Eirikur's LEAD splitter. If this is SQL Server 2016+ you could use the built-in STRING_SPLIT itvf (since the order of the split string(s) doesn't seem to matter (which is another reason to leave in the FROM clause imo)). Something like this

select top(1) p.AkcesAutoID
from dbo.podrobnosti p
     join dbo.DelimitedSplit8K_LEAD('J') ds on p.EvidenceLetter=ds.Item
     join dbo.Akces a on p.AkcesAutoID=a.AkcesAutoID
order by a.Rok, a.Akcesitpred, a.Akcesit;

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

pdanes SSCrazy Eights Points: 8502 More actions · Answer 13

Steve Collins wrote:

The query could be simplified, no? Maybe something like this
select top(1) p.AkcesAutoID
from dbo.podrobnosti p
     join ro.fntStringsToTable('J') ltrs on ltrs.EvidenceLetter = p.EvidenceLetter
     join dbo.Akces a on p.AkcesAutoID = a.AkcesAutoID
order by a.Rok, a.Akcesitpred, a.Akcesit;
"I was thinking of replacing the entire JOIN clause with WHERE..." Nah, I wouldn't do it. A SELECT statement is a procedural sandwich which is evaluated in a fixed order (in most cases). FROM, ON, and JOIN are evaluated 1st, 2nd, 3rd. My tendency is to load up as much as possible in the FROM clause because it seems to make the rest of the query simpler and easier to get right. It's also somewhat more self-documenting imo

It could be simplified, and I tried that when testing, before I posted the problem, but it didn't help much. The major bugaboo seems to be the join to the TVF. Anyway, this query is built by a series of functions - one that opens the select clause, another that generates the fields, another for the joins, another for inner Where, outer Where, and finally sort. It's the result of years of trying different approaches to developing and maintaining all the queries that feed the final user app. It's not the absolute peak in performance - the queries generated could usually be tweaked for better response, but the performance is more than adequate for the setup I'm working with, and it makes both maintenance and additional development much more manageable.

I originally trotted out the idea at a SQL in the City meeting, back when they were still being conducted as real meetings in London. Several heavyweights told me the idea stunk (although they were much more polite about it), but I kept at it and got it working pretty well. I probably couldn't get away with it in a heavily loaded environment, but I have the luxury of a machine that is loafing around 99% of the time, and has essentially unlimited disk space. As long as individual queries respond reasonably well, around 1 second or less (and almost all do - even with my less than optimal dynamic query generation, response time is usually instant), everyone is happy. Resource contention, blocking, deadlocks, memory pressure - all that sort of stuff simply doesn't exist for me.

A thought, though - what about putting the Where clause into the From, as a subquery:

SELECT  AkcesAutoID  FROM
( SELECT
P.AkcesAutoID, A.AkcesitPred, A.Akcesit, A.Rok
FROM(Select * From dbo.Podrobnosti Where EvidenceLetter In ('J')) P
INNER JOINdbo.Akces A ON P.AkcesAutoID = A.AkcesAutoID) PP
Order By PP.Rok, PP.AkcesitPred, PP.Akcesit

I'll give that a try at work tomorrow and see what happens.

pdanes SSCrazy Eights Points: 8502 More actions · Answer 14

Steve Collins wrote:

You could maybe replace the whole WHILE LOOP'ing function with Eirikur's LEAD splitter. If this is SQL Server 2016+ you could use the built-in STRING_SPLIT itvf (since the order of the split string(s) doesn't seem to matter (which is another reason to leave in the FROM clause imo)). Something like this
select top(1) p.AkcesAutoID
from dbo.podrobnosti p
     join dbo.DelimitedSplit8K_LEAD('J') ds on p.EvidenceLetter=ds.Item
     join dbo.Akces a on p.AkcesAutoID=a.AkcesAutoID
order by a.Rok, a.Akcesitpred, a.Akcesit;

I can't use STRING_SPLIT (yet), I'm still on 2014. I have a virtual 2019 machine online, but still waiting for the staff to sort out some connection bugs - I can experiment with it and talk to the instance with SSMS, but I can't get an ODBC connection, which is kind of important. The app is in MSACCESS, and ODBC is how it communicates with the backend. None of the staff are DB types, and I have no admin rights in the network (domain), so troubleshooting this kid of stuff is problematic. They don't know what I need, and I can't test things on my own.

But I don't think the performance of the TVF is really the issue. The problem is SQL Server is generating a shitty plan when I add the TOP 1 specifier. A faster function may run somewhat faster, but not nearly enough to make up for the abysmal performance hit from a bad execution plan.

Top 1 makes very fast query into very slow query

Attachments: