Full Text: AND vs. OR

Question

Full Text: AND vs. OR

Viewing 4 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply

RBarryYoung SSC Guru Points: 143329 More actions · Answer 1

Paul White (3/7/2010)
A few constructive criticisms, Barry:
Running a full-text search with a variable as a parameter can be problematic. You might have noticed that the full-text engine can provide very good quality estimates to the optimizer when a literal string is used, even if the full-text query is complex.

Nope. I've never noticed that. Of course that's probably because I've never used FTS before. Ever. 🙂 Seriously. (not directly anyway)

My focus was just on the security & injection issues and I was solely working on finding the safest and most secure way to so this, especially given the corporate sensitivity on SQL Injection these days. I always try to put safety before performance, which was easy for me in this case, because I know next to nothing about FTS performance. 😀

When a variable is used to provide the search terms, the optimizer resorts to a guess - which might be the ubiquitous 10,000 rows - I don't recall off-hand.
If the query optimizer uses a guess, it might mistakenly produce a plan that it driven by some other input, and ends up calling the full-text query many, many times. I have seen this happen on production systems.
In any case, since the full-text search is generally the most expensive part of a plan, we usually want to ensure that it is performed only once. This can be done by using CONTAINSTABLE rather than CONTAINS, and forcing the order of the joins.
Alternatively, run the FT CONTAINSTABLE query first and save the KEY values returned in a table (usually a variable rather than a #temp table). This default cardinality estimate of '1' for table variables is normally sufficient for SQL Server to choose it as the driving table in the main query.

All good stuff to know Paul, thanks. In fact what I'd suggest, is that since you clearly know a lot more about FTS and FTS performance than me, could you rewrite my proc in a more performant way? The one provisio of course is that the user supplied text needs to stay isolated in a variable (or temp table, or whatever) and must not be injected the actual SQL command stream. (what I actually spent most of my time on this question was confirming that that "conditional-text' was not some kind of back-door to the command execution stream.)

I know it would help me out if I ever have to help a customer with this again and it would probably be better for anyone reading this later on to have a proper example from someone who knew what they were doing. Thanks!

COUNT(*) OVER (PARTITION BY -1) is a little redundant: COUNT(*) OVER () is equivalent.

Thanks Paul, I couldn't remember what the right syntax was for that, so I was testing with -1 to make sure that it didn't do that old ORDER BY thing of grabbing the Nth column instead.

... COUNT(*) OVER produces a plan which features one of those stacked Table-Spool arrangements reminiscent of recursive CTEs (in fact they run in a different mode, but you see my point). It can be more efficient to use ROW_NUMBER twice, as shown in the following demonstration code: ...

Yeah, couple of things here,... I've seen this one go many different ways, depending on all kinds of things, especially the presence or absence of indexes and ORDER BYs. And I've seen that reversing ROW_NUMBER trick produce bad plans so many times that I try to avoid it now.

Given that I didn't have Lou's table def's, data and couldn't really test FTS or its effect on these tricks, I decided to keep it simple. In retrospect, I probably should have just kept a standalone SELECT COUNT(*).

USE tempdb;
GO
DROP TABLE dbo.Sample;
GO
CREATE TABLE dbo.Sample
(
row_id INTEGER IDENTITY PRIMARY KEY,
group_id INTEGER NOT NULL,
padding CHAR(200) NOT NULL DEFAULT(SPACE(200))
);
GO
WITH Numbers (n)
AS (
SELECT TOP (50000)
ROW_NUMBER() OVER (ORDER BY (SELECT 0))
FROM master.sys.columns C1,
master.sys.columns C2,
master.sys.columns C3
)
INSERT dbo.Sample
(group_id)
SELECT n % 25 + 1
FROM Numbers;
GO
DECLARE @BitBucket INTEGER;
SET STATISTICS IO, TIME ON;
SELECT @BitBucket = COUNT(*) OVER ()
FROM dbo.Sample;
SELECT @BitBucket =
-1 +
ROW_NUMBER() OVER (ORDER BY row_id ASC) +
ROW_NUMBER() OVER (ORDER BY row_id DESC)
FROM dbo.Sample
SET STATISTICS IO, TIME OFF;
DROP TABLE dbo.Sample;
Results:
-- COUNT(*) OVER ()
Table 'Worktable'. Scan count 3, logical reads 100781, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Sample'. Scan count 1, logical reads 1358, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
CPU time = 219 ms, elapsed time = 224 ms.
-- Double ROW_NUMBER()
Table 'Sample'. Scan count 1, logical reads 1358, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
CPU time = 78 ms, elapsed time = 190 ms.

Mehh. These tests seem to me to be both too narrow and too different from the actual question to be reliable. In fact one of my general concerns with the performance demonstrations that we do hear at SSC, is a huge bias towards extremely narrow row widths, which does not seem representative of what I see at my customer's sites. And the effective "width" of the rows, it definitely can produce very different plans.

The other thing that concerns me with this test if the use of "@BitBucket = ..." in the SELECTS. I assume that this is to try to factor out the variable stall time of displaying a bunch of rows? That's good, but it introduces two other problems, first it adds the overhead of assigning to a variable 50,000 times. True, its probably less than the cost/time to display 50,000 rows, and less random, but its still something and what is a question.

The second thing is that, it's a Different kind of statement altogether. Now instead of a data-selection query, it's a variable-assignment query, and I just do not trust it to always produce comparable performance, or even necessarily the same plan. Let alone that performance-wise, there is a lot of stuff (overhead & scalar calcs, it seems to me) that the QP either doesn't tell us about or that it applies a very broad brush to, in terms of costing.

What I like to do in these cases, is to wrap what I call a "opaque filter" or a "blind filter" around the outermost select that eliminates most or all of the output rows. By opaque or blind, I mean a where clause that the compiler can neither use an index on, or possibly know anything about ahead of time.

Done correctly, this pretty much forces the optimizer to construct the same query that it would have without the filter, and then to apply the filter just before output (because that's where the fewest rows are, and it doesn't require another scan, seek or join).

Finally, the search criteria string is NVARCHAR, so should be passed as such to avoid an implicit conversion.

Is NVarchar an FTS thing? I had looked for it in the stuff that Lou posted, but I didn't see it. Sorry...

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Paul White SSC Guru Points: 150468 More actions · Answer 2

Hey Barry,

I take your point about the double-ROW_NUMBER. As always, the important thing is to test.

As far as re-writing your excellent effort is concerned - thanks, but I think I will pass. Hopefully some of the points I mentioned will be useful to 'Lou' anyway, as general as they are.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

lou-612048 SSC Rookie Points: 48 More actions · Answer 3

Just wanted to say thanks again to both of you.

Unfortunately, Paul, most of the stuff in your original post is way over my very SQL-challenged head. I'll have to read through it again and then look up all the stuff you talked about to try to get a handle on it.

But right now, I'm very happy that my search is working within a parameterized query rather than via building the query dynamically in my code and running it locally.

Hope you both have a great week 🙂

RBarryYoung SSC Guru Points: 143329 More actions · Answer 4

Glad we could help. 🙂

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]