Fulltext issue...

Question

Fulltext issue...

PiMané

SSChampion

Points: 13965
More actions
March 5, 2016 at 3:35 am

#311871

I'm using fulltext to search words within entities...
I created an extra table with all the words the entities have and the fulltext searches that table (fulltext.modulesWords).
I was making a search and noticed a "strange", or not, behavior...
The 3 queries below are my tests... I used the 2nd one has my first approach... It was slow, 10 secs and 400.000 reads..... I started "breaking" the query and noticed that the fulltext alone was very fast and the query without the fulltext was also very fast.... So I went for the 3rd query posted: created a subset with the fulltext ids and joined with the other query.... But the results were the same... SQL is "smart" and converted the query to the same as the 2nd one...
So I went for another try... the first query.... 70ms and 3.000 reads!!!!!
Can someone explain why this happens?!? The queries are identical...
If I execute the same queries without the wildcard in the search words the 2nd and 3rd queries take 2secs and 130.000 reads.. the first one 20ms and 2.500 reads.
Thanks,
Pedro
PS: If with the 2nd query I replace the CONTAINS with a LIKE operator and '%maria%' it takes 200ms and 100.000 reads.
DECLARE @ids TABLE (id BIGINT)
INSERT INTO @ids SELECT id FROM fullText.modulesWords WHERE idModule = 7 AND CONTAINS(words, '"maria*"')
;WITH e AS (
SELECT [idEntity] , ROW_NUMBER() OVER ( ORDER BY TRY_CONVERT(BIGINT, e.reference), [idEntity]) position
FROM realEstate.entities e
INNER JOIN entities.GetAgentAccessesByActions(4042,61,1) t1 ON e.idNetwork = t1.idNetwork AND e.idCompany = t1.idCompany AND e.idAgency = t1.idAgency
WHERE
e.idagent = e.idAgent AND
[idEntity] IN (SELECT id FROM @ids)
)
SELECT e.* FROM e
GO
;WITH e AS (
SELECT [idEntity] , ROW_NUMBER() OVER ( ORDER BY TRY_CONVERT(BIGINT, e.reference), [idEntity]) position
FROM realEstate.entities e
INNER JOIN entities.GetAgentAccessesByActions(4042,61,1) t1 ON e.idNetwork = t1.idNetwork AND e.idCompany = t1.idCompany AND e.idAgency = t1.idAgency
WHERE
e.idagent = e.idAgent AND
[idEntity] IN (SELECT id FROM fullText.modulesWords WHERE idModule = 7 AND CONTAINS(words, '"maria*"'))
)
SELECT e.* FROM e
GO
;WITH ids AS (
SELECT id FROM fullText.modulesWords WHERE idModule = 7 AND CONTAINS(words, '"maria*"')
), e AS (
SELECT [idEntity] , ROW_NUMBER() OVER ( ORDER BY TRY_CONVERT(BIGINT, e.reference), [idEntity]) position
FROM realEstate.entities e
INNER JOIN entities.GetAgentAccessesByActions(4042,61,1) t1 ON e.idNetwork = t1.idNetwork AND e.idCompany = t1.idCompany AND e.idAgency = t1.idAgency
WHERE
e.idagent = e.idAgent AND
[idEntity] IN (SELECT id FROM ids)
)
SELECT e.* FROM e

If you need to work better, try working less...

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 1

First of all, comparing fulltext with like is kind of an apples and oranges thing. So I will just ignore that last remark.

Based on what you tell us, my first guess would be that there is a significant difference between the estimated rowcount and the actual rowcount for the CONTAINS operator. And becuase of that, you get a plan optimized for the estimated rowcount when you have a single-query apporach, but a plan optimized for the real rowcount if you dump the CONTAINS results in a temp table and then build the actual query on that - queries with a temporary table get recompiled when enough data has been inserted in the table, and then the actual rowcount will be used.

In your case you use a table variable, which is yet another matter. This will use a fixed rowcount of, depending on SQL Server version, 1 or 100 rows. When the real rowcount is completely different, you might shoort yourself not only in the foot but in at least two full legs with this query form. For search strings with a low number of hits, it will probably work - but are you sure that nobody will ever use a search string that has ten thousand hits?

EDIT: Forgot to add: the above is pure speculation based on what you write. If you can post the (actual!!) execution plans, I can verify this and perhaps tell you a bit more.

Posting a repro script might also help, but I guess that it's hard to write a short-enough repro script for an issue such as this.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

PiMané SSChampion Points: 13965 More actions · Answer 2

Hugo Kornelis (3/5/2016)
First of all, comparing fulltext with like is kind of an apples and oranges thing. So I will just ignore that last remark.
Based on what you tell us, my first guess would be that there is a significant difference between the estimated rowcount and the actual rowcount for the CONTAINS operator. And becuase of that, you get a plan optimized for the estimated rowcount when you have a single-query apporach, but a plan optimized for the real rowcount if you dump the CONTAINS results in a temp table and then build the actual query on that - queries with a temporary table get recompiled when enough data has been inserted in the table, and then the actual rowcount will be used.
In your case you use a table variable, which is yet another matter. This will use a fixed rowcount of, depending on SQL Server version, 1 or 100 rows. When the real rowcount is completely different, you might shoort yourself not only in the foot but in at least two full legs with this query form. For search strings with a low number of hits, it will probably work - but are you sure that nobody will ever use a search string that has ten thousand hits?
EDIT: Forgot to add: the above is pure speculation based on what you write. If you can post the (actual!!) execution plans, I can verify this and perhaps tell you a bit more.
Posting a repro script might also help, but I guess that it's hard to write a short-enough repro script for an issue such as this.

Thanks for the reply...

I just mentioned LIKE since it's faster than the CONTAINS in the second and third query and it shouldn't...

The table variable is not an issue since we will limit the records by 100 at most in production.. if the users searches for a word and gets more than 100 results it has to be more specific about what he's searching...

Here are the plans for the 3 queries...

Thanks,

Pedro

If you need to work better, try working less...

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 3

Thanks for posting the plans, Pedro!

Let's first look at the "fast" query 1. This one is actually quite terrible, and would probably go a lot faster if you use a temporary table instead of a table variable, and/or if you add OPTION(RECOMPILE) to the second query.

The first query does two independent searches, one for the fulltext match and another for the idModule match. For the fulltext match the estimate is quite okay (8042 estimated, 7767 actual), for the second it is spot on (56348.3 estimated, 56350 actual). After combining the two search results, the estimate is that 1359 rows will be left; this is not correct because in reality there are 5223 matches. This is the number of rows that gets inserted in the table variable.

As explained in my earlier post, the second execution plan is based on an estimate of 1 row in the table variable. A lot of stuff is done in parts of the plan that will be repeated once per row. The optimizer expects 1 iteration, in reality it are over 5000 iterations. This may be the fastest plan you found, but I can tell you that it's not cheap, and there is an easy tuning opportunity here.

Now let's look at the other two plans, which appear to be identical (I did not check if they are indeed *exactly* the same, I assume that they are, though).

These plans approach the query very differently. They first access the agents, agencies, and agentsRelations tables in five different ways (four of them conditional based on a variable) and appear to combine the results using UNION; I assume that this logic is embedded in the entities.GetAgentAccessesByActions user-defined table-values function (and by the way, kudos on making that an inline function - well done!). Without looking at the details of this part, I did look at the resulting rows - 3 actual versus 3.1 estimated so no problems there.

For each of those three rows, the plan that attempts to find matching data in the entities table. And here is where it goes wrong. The optimizer assumes that this join will reduce the rowcount from 3 to 1 - so it assumes that two of the rows will have no match and hence be kicked out because of the INNER JOIN. (In fact, I think the optimizer expects no matches at all; 1 is the minimum estimate and SQL Server will only go below that for queries that simply cannot return rows, e.g. WHERE 1 = 0). However, in reality the three rows returned from the function find matches in a lot of entities; after the join the actual row count is 12,108.

The last thing that happens in the plan is to further filter based on the search in modulesWords. For each row from the intermediate result (remember, 1 expected; 12,108 actual), SQL Server first finds the row(s) with idModule7 and matching id. For each of those rows, it will then evaluate the fulltext filter.

This plan is chosen because SQL Server expects that it will have to execute the fulltext logic 1 times (which you can read as "at most a few times" - estimates are never exact), but at run-time it turns out it had to do this over 12 thousand times. And yes, that will be slow.

I expect that even if you remove the IN clause with the fulltext (which will return the 12 thousand rows from the join between entities and the table-valued function), you will still get a slow query that does a huge amount of IO. And you might even have gotten a similar slow query if you had used a temporary table in the first version. SQL Server always tries to trim down work as soon as possible. With the table variable it expected to have found a very fast way to get to just 1 row. If it had known that the table variable had more rows, it might have chosen to get back to the same plan used for the other two queries, becuase the optimizer thinks that starting with the function and then joining it to the entities table is a fast way to get to 1 row and reduce the remaining workload.

The root cause of all of this appears to be the incorrect estimate where SQL Server expects to find no matching rows in the entities table. The most probable explanation for this is statistics that are out of date or incorrect. Can you rebuild the statistics on all tables used in the query and in the table-valued function, preferably with fullscan? That by itself might already fix it.

If it doesn't, then the complex logic in the function somehow confuses the cardinality estimator enough that it assumes that values will be returned from the function that do not exist in the entities table. That one can be a lot harder to tackle. Switching between the "old" and the "new" cardinality estimator might help - so if rebuilding stats didn't help, I suggest trying to run the query with either OPTION(QUERYTRACEON 9481) or OPTION(QUERYTRACEON 2312) and see if that makes a difference.

If all of those have no effect, then it might become hard to find a good solution for this. Perhaps something can be gained by reviewing the logic of the table-valued function, so you might want to post it here, along with the CREATE TABLE scripts for all tables involved, including all constraints and all indexes.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

PiMané SSChampion Points: 13965 More actions · Answer 4

Hugo Kornelis (3/5/2016)
Thanks for posting the plans, Pedro!

No, thank you for taking time to analise this and post...

I'll try all your suggestions on monday, since now I'm "out" :), and let you know what happened...

Meanwhile I made a change to the query by removing the iTVF and replacing it with a sub query of the iTVF in the WHERE clause..

EXISTS (SELECT 1 FROM entities.agentsRelations t1 WHERE e.idNetwork = t1.idNetwork AND e.idCompany = t1.idCompany AND e.idAgency = t1.idAgency aND t1.idAgent = 4042 )

entites.agentsRelations is a table that stores the "source" of information that agents can access (it tells an agent can access data from company X and agency Y or company Z and agency F, ...). A "direct" table access...

But the times are still very very bad...

I'll try the TRACEFLAGS and let you know how it went...

Once again thanks a lot,

Pedro

If you need to work better, try working less...

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 5

The simplification you made should make the plan a lot simpler, and should help the optimiizer get a good estimate.

Before playing with the trace flags, please try updating statistics first!

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

PiMané SSChampion Points: 13965 More actions · Answer 6

Hi,

I updated the statistics, indexes rebuild, including the fulltext and it didn't improve much...

Then I changed the fullText.modulesWords table... This table had a field (words) that had all the words from an item... but it add all kind of items from the database (entities, companies, agencies, properties, etc...) and I split the table into as many as the tables I'd like to search (I created a fullText.entities, fullText.companies, ...).

From the 10secs and 400.000 reads it went to 4secs and 180.000 reads... but still a lot...

So I tried trace flag 9481 and it was very very fast... 47ms and 4.000 reads...

Then I tried just the RECOMPILE option and it was very fast still, but slower than the flag: 80ms and 74.000 reads....

Our server still has SQL 2014 SP1 WITH CU4 installed... I'll try to install CU5 this week to see if it solves the problem.. otherwise I'll let the flag every time a full text search is used....

Thanks,

Pedro

If you need to work better, try working less...

PiMané SSChampion Points: 13965 More actions · Answer 7

I just noticed that there's a global trace flag enabled: 4199...

Can this be causing the problem since this trace flag has to do with cardinality estimator?

Thanks,

Pedro

If you need to work better, try working less...

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 8

PiMané (3/7/2016)So I tried trace flag 9481 and it was very very fast... 47ms and 4.000 reads...

Then you should add that trace flag to the query.

Don't worry, it is a documented and supported trace flag, no hidden voodoo black magic here. This trace flag was introduced when the cardinality estimiator received a major overhaul in SQL Server 2014. Since it is impossible to always be right with estimates, Microsoft realised that some queries could get a worse estimates, and potentially worse plans because of that. Trace flag 9481 was created for just that situation - with the QUERYTRACEON syntax, you can use it to force the old cardinality estimator on a single query, if you find that the errors of the new estimator produce a worse plan than the errors of the old estimator.

PiMané (3/7/2016)
I just noticed that there's a global trace flag enabled: 4199...
Can this be causing the problem since this trace flag has to do with cardinality estimator?

Trace flag 4199 is not related to the cardinality estimator, but to the query optimizer. (Related, but not the same). It is used to enable all the many small changes and optimizations that have been introduced over the years. Until SQL Server 2014, Microsoft was hesitant to enable them by default because it's always possible that a change that is an improvement for most is actually detrimental to others, and MS was afraid that customers might get upset. So they disabled all those changes by default, and gave us trace flag 4199 to enable all the changes. (Starting either SQL Server 2014 or SQL Server 2016, not sure which, that behaviour is slightly altered; "older" improvements are now enabled by default and only the "newer" QO fixes get hidden behind TF 4199).

Many people enable TF4199 by default on their servers and only disable it if they find that they suffer from a regression bug. There are also people who do the reverse, which is to only enable TF4199 if they find that it fixes a performance issue for them without causing other queries to slow down.

Changing the TF-4199 status on your system might or might not fix your full-text query. However, it is an instance-wide setting, so it can also affect performance of other queries; I would not change the setting without doing a full test cycle.

If the improvement you already got by using TF9481 makes the query "fast enough", then I suggest not bothering and focusing on more pressing matters.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

PiMané SSChampion Points: 13965 More actions · Answer 9

Thanks for your help....

I also played with parametrization forced and simple but no change there either.

I have forced since we use a lot of dynamic sql and use spexecutesql.

The server has enough memory, 64gb, so no problem.. PLE is almost 300.000

Once again thanks for all your help

Pedro

If you need to work better, try working less...