Advice on an OR statement

  • The PK on my table is PREFIX, DOCMENT and SEQNO. This is the clustered index.

    If I run this statement after clearing the SQL buffer (DBCC DROPCLEANBUFFERS) etc, it will take over 15 seconds to return the 4 fields.

    Select Document, rc from [AP].[dbo].[ILines] with (NoLock) where

    (Prefix='C' and Document='STCR124171' and seqno=1) or

    (Prefix='C' and Document='STCR124172' and seqno=2)

    If I split the query into 2, it returns the same results instantaeneously.

    Select Document, rc from [AP].[dbo].[ILines] with (NoLock) where

    Prefix='C' and Document='STCR124171' and seqno=1

    Select Document, rc from [AP].[dbo].[ILines] with (NoLock) where

    Prefix='C' and Document='STCR124172' and seqno=2

    Both execution plans show it is using a clustered index seek. The first query does use parallelism but even if switch this off with (OPTION (maxdop 1)) on the query, the results are the same. Both execution plans show exactly the same except there is 2 for the 'split' results.

    After the query has been run once, it then also comes back instantly that is why I was clearing the buffer each time. Why is there a 40x increase in the completion time on what would esseentially be the same query?

    Anybody any ideas?

    If any more info is required, I can easily provide this if it is relevant.

    Thanks.

  • Also, if the SEQNO is the same value on the 2 clauses, it returns the results instantly too. If they are different, the delay is there. < This is really confusing!

    The order of the clustered index in PREFIX, DOCUMENT & SEQNO, all ascending.

    I have also just rebuilt the entire index yet the problem still exists.

  • It all boils down to the way SQL Server estimates cardinality and cost when an OR is present.

    Are you using SQL Server 2008? The optimizer frequently manages this type of query much better in 2008.

    One solution is to rewrite the query without the OR, using a UNION ALL.

    SELECT Document, rc

    FROM [AP].[dbo].[ILines]

    WHERE Prefix = 'C'

    AND Document = 'STCR124171'

    AND seqno=1

    UNION ALL

    SELECT Document, rc

    FROM [AP].[dbo].[ILines]

    WHERE Prefix = 'C'

    AND Document = 'STCR124172'

    AND seqno=2;

    You might also try the new FORCESEEK hint.

  • Thanks for the reply.

    No, this test was on a SQL2000 server (using 2008 Managemeent studio). I'll move the data to a 2008 SQL sever to see if the results are returned in the same manner. Although, this won;t be a complete solution as we have hundreds of customers still using 2000 and 2005, very few have actually moved to 2008 at this point.

    If the union returns the results quick, like the 2 seperate statements, I will ask our developers to change the code to work in this manner.

    As stated previously, what confuses me is if the SEQNO is the same on both clauses, it works fine.

    If this has been fixed in 2008, then fair enough but should still be implemented into pevious systems. If it hasn't it seems like a 'real' issue to me as the optimizer doesn't seem to be optimizing anything.

    Thanks for the reply. If anyone else has any further suggestions, feel free to let me know.

  • DrJogalog (3/31/2010)


    The PK on my table is PREFIX, DOCMENT and SEQNO. This is the clustered index.

    If I run this statement after clearing the SQL buffer (DBCC DROPCLEANBUFFERS) etc, it will take over 15 seconds to return the 4 fields.

    Select Document, rc from [AP].[dbo].[ILines] with (NoLock) where

    (Prefix='C' and Document='STCR124171' and seqno=1) or

    (Prefix='C' and Document='STCR124172' and seqno=2)

    If I split the query into 2, it returns the same results instantaeneously.

    Select Document, rc from [AP].[dbo].[ILines] with (NoLock) where

    Prefix='C' and Document='STCR124171' and seqno=1

    Select Document, rc from [AP].[dbo].[ILines] with (NoLock) where

    Prefix='C' and Document='STCR124172' and seqno=2

    Both execution plans show it is using a clustered index seek. The first query does use parallelism but even if switch this off with (OPTION (maxdop 1))on the query, the results are the same. Both execution plans show exactly the same except there is 2 for the 'split' results.

    After the query has been run once, it then also comes back instantly that is why I was clearing the buffer each time. Why is there a 40x increase in the completion time on what would esseentially be the same query?

    Anybody any ideas?

    If any more info is required, I can easily provide this if it is relevant.

    Thanks.

    In addition to what Paul suggests (using a UNION), you should remove the NOLOCK hint. Unless you don't mind sending potentially wrong data back to the user. 😉

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • WayneS (3/31/2010)


    In addition to what Paul suggests (using a UNION ALL), you should remove the NOLOCK hint. Unless you don't mind sending potentially wrong data back to the user. 😉

    Observant readers will have noticed that the code I posted dispensed with that hint 😎

  • In addition to what Paul suggests (using a UNION), you should remove the NOLOCK hint. Unless you don't mind sending potentially wrong data back to the user. 😉

    Thanks for the reply.

    Our devlopers have a passion for writing 'no lock's' in their statements. The data returned in these queries should not actually change once written so there is no real potential to return inaccurate data in this example. We had quite a few issues with deadlocking a fair while back and the solution seemed to be to use 'no lock' clauses on every statement they could. I personally use them very sparingly, depending on what I am doing, but at this point, its not my choice.

    You might also try the new FORCESEEK hint.

    Im not at work at the moment but I know the execution plans showed that both queries were already using the clustered index seek. There was no difference in the execution plan itself, just the time it took to return the results. Although I will still give this a try too.

    Thanks.

  • DrJogalog (3/31/2010)


    As stated previously, what confuses me is if the SEQNO is the same on both clauses, it works fine.

    Could you post the exact definition (CREATE INDEX statements) for *all* the indexes on the table please?

    It will probably help explain why this is happening.

  • Comments removed as no longer valid.

  • Ok, there wasn't as many as I thought.

    The primary key is PREFIX, DOCUMENT, SEQNO in that order, all ascending.

    CREATE INDEX [Corder] ON [dbo].[ILines]([COrder]) WITH FILLFACTOR = 90 ON [PRIMARY]

    GO

    CREATE INDEX [Part] ON [dbo].[ILines]([Part]) WITH FILLFACTOR = 90 ON [PRIMARY]

    GO

    CREATE INDEX [PG] ON [dbo].[ILines]([PG]) WITH FILLFACTOR = 90 ON [PRIMARY]

    GO

    CREATE INDEX [RC] ON [dbo].[ILines]([Rc]) WITH FILLFACTOR = 90 ON [PRIMARY]

    GO

    CREATE INDEX [Supp] ON [dbo].[ILines]([Supp]) WITH FILLFACTOR = 90 ON [PRIMARY]

    GO

    CREATE INDEX [BinCard] ON [dbo].[ILines]([StkPart], [Bincard]) WITH FILLFACTOR = 90 ON [PRIMARY]

    GO

    CREATE INDEX [Today] ON [dbo].[ILines]([Today]) WITH FILLFACTOR = 90 ON [PRIMARY]

    GO

    CREATE INDEX [Credit] ON [dbo].[ILines]([Credit]) ON [PRIMARY]

    GO

    CREATE INDEX [Invoice] ON [dbo].[ILines]([Invoice]) ON [PRIMARY]

    GO

    CREATE INDEX [BOrder] ON [dbo].[ILines]([BOrder]) ON [PRIMARY]

    GO

  • Paul White NZ (3/31/2010)


    WayneS (3/31/2010)


    In addition to what Paul suggests (using a UNION ALL), you should remove the NOLOCK hint. Unless you don't mind sending potentially wrong data back to the user. 😉

    Observant readers will have noticed that the code I posted dispensed with that hint 😎

    Paul, I apologize. I did notice that, and meant to mention it, but it slipped past me.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • it will use only PK. Union all could give improved performance. Please post the results

  • Thanks for the index definitions - though in this case, they did not help. Never mind.

    If your query plan shows a single seek instead of two, it is likely that the SQL Server 2000 optimizer is combining the two searches into a single range seek with a predicate. The range seek will not be very selective, so the predicate has to be evaluated on a large number of rows. Because the seek portion selects a relatively large number of rows (before the predicate), a large number of data pages must be scanned. The effect is made more pronounced by the fact that the clustered index is the widest possible index - it includes all the data at the leaf level.

    My guess is that your plan shows a seek on (Prefix = 'C') only, with the predicate evaluating the remaining conditions. Obviously, you have a lot of records where Prefix = 'C', and this is what causes the large number of physical reads. The fillfactor of the recently rebuilt clustered index will also make the scan less efficient, and the presence or absence of read-ahead reads will also make a difference.

    Here is a test rig I set up to see if I could reproduce your problem. I tested this on 2005 and 2008, since I do not have an instance of SQL Server 2000. Both are correctly optimized to two seeks and a concatenation (exactly as the UNION ALL form of the query), and therefore complete in milliseconds, even with a cleared buffer pool.

    -- For demonstration purposes only

    USE tempdb;

    GO

    -- Drop the test table if it exists

    IF OBJECT_ID(N'dbo.Test', N'U')

    IS NOT NULL

    DROP TABLE dbo.Test;

    GO

    -- Create the test table

    -- Padding added to widen the rows

    CREATE TABLE dbo.Test

    (

    prefix CHAR(1) NOT NULL,

    document CHAR(10) NOT NULL,

    seq_no INTEGER NOT NULL,

    rc INTEGER NOT NULL,

    padding CHAR(1000) NOT NULL DEFAULT ''

    );

    GO

    -- Add 100,000 random-ish rows

    INSERT dbo.Test

    (prefix, document, seq_no, rc)

    SELECT TOP (100000)

    CHAR(RAND(CHECKSUM(NEWID())) * 26 + 65),

    'STCR' + RIGHT(1000000 + CHECKSUM(NEWID()) % 1000000, 6),

    CHECKSUM(NEWID()) % 5 + 1,

    CHECKSUM(NEWID())

    FROM master.sys.columns C1,

    master.sys.columns C2,

    master.sys.columns C3;

    GO

    -- Remove any duplicates

    -- (should be very few)

    WITH CTE

    AS (

    SELECT *,

    rn = ROW_NUMBER() OVER(

    PARTITION BY prefix, document, seq_no

    ORDER BY (SELECT 0))

    FROM dbo.Test

    )

    DELETE CTE

    WHERE rn > 1;

    GO

    -- Ensure our target rows exist

    INSERT dbo.Test

    (prefix, document, seq_no, rc)

    SELECT 'C', 'STCR124171', 1, 12345678

    WHERE NOT EXISTS

    (

    SELECT *

    FROM dbo.Test T2

    WHERE T2.prefix = 'C'

    AND T2.document = 'STCR124171'

    AND seq_no = 1

    );

    INSERT dbo.Test

    (prefix, document, seq_no, rc)

    SELECT 'C', 'STCR124172', 2, 12345678

    WHERE NOT EXISTS

    (

    SELECT *

    FROM dbo.Test T2

    WHERE T2.prefix = 'C'

    AND T2.document = 'STCR124172'

    AND seq_no = 2

    );

    GO

    -- Add the primary key

    ALTER TABLE dbo.Test

    ADD PRIMARY KEY CLUSTERED

    (prefix, document, seq_no)

    WITH (FILLFACTOR = 80);

    GO

    -- Clear the buffer pool

    CHECKPOINT;

    DBCC DROPCLEANBUFFERS;

    GO

    -- Test the query

    SELECT Document, rc

    FROM dbo.Test

    WHERE (prefix = 'C' AND document = 'STCR124171' AND seq_no = 1)

    OR (prefix = 'C' AND document = 'STCR124172' AND seq_no = 2);

    -- Tidy up

    DROP TABLE dbo.Test;

    This is the (actual) execution plan:

  • Another thing I should mention: the current clustered index is not a great choice for the clustered index.

    The key is not outrageously wide (at 15 bytes), but inserts are likely to be spread across the index, resulting in either a sub-optimal page density (due to a low fillfactor), or high levels and page splitting and therefore high fragmentation levels. Fragmentation can have a very detrimental effect on range scan performance. The key is included in every non-clustered index (it is the row locator), and this adds 15 bytes to every row of every index too.

    I would be tempted to change the primary key to NONCLUSTERED, and create a clustered index on a narrow column which monotonically increases and never changes. An INTEGER IDENTITY column is convenient for this, but by no means the only option.

    An updated test rig to demonstrate this follows.

    -- For demonstration purposes only

    USE tempdb;

    GO

    -- Drop the test table if it exists

    IF OBJECT_ID(N'dbo.Test', N'U')

    IS NOT NULL

    DROP TABLE dbo.Test;

    GO

    -- Create the test table

    -- Padding added to widen the rows

    CREATE TABLE dbo.Test

    (

    row_id INTEGER IDENTITY NOT NULL

    UNIQUE CLUSTERED

    WITH (FILLFACTOR = 100),

    prefix CHAR(1) NOT NULL,

    document CHAR(10) NOT NULL,

    seq_no INTEGER NOT NULL,

    rc INTEGER NOT NULL,

    padding CHAR(1000) NOT NULL DEFAULT ''

    );

    GO

    -- Add 100,000 random-ish rows

    INSERT dbo.Test

    (prefix, document, seq_no, rc)

    SELECT TOP (100000)

    CHAR(RAND(CHECKSUM(NEWID())) * 26 + 65),

    'STCR' + RIGHT(1000000 + CHECKSUM(NEWID()) % 1000000, 6),

    CHECKSUM(NEWID()) % 5 + 1,

    CHECKSUM(NEWID())

    FROM master.sys.columns C1,

    master.sys.columns C2,

    master.sys.columns C3;

    GO

    -- Remove any duplicates

    -- (should be very few)

    WITH CTE

    AS (

    SELECT *,

    rn = ROW_NUMBER() OVER(

    PARTITION BY prefix, document, seq_no

    ORDER BY (SELECT 0))

    FROM dbo.Test

    )

    DELETE CTE

    WHERE rn > 1;

    GO

    -- Ensure our target rows exist

    INSERT dbo.Test

    (prefix, document, seq_no, rc)

    SELECT 'C', 'STCR124171', 1, 12345678

    WHERE NOT EXISTS

    (

    SELECT *

    FROM dbo.Test T2

    WHERE T2.prefix = 'C'

    AND T2.document = 'STCR124171'

    AND seq_no = 1

    );

    INSERT dbo.Test

    (prefix, document, seq_no, rc)

    SELECT 'C', 'STCR124172', 2, 12345678

    WHERE NOT EXISTS

    (

    SELECT *

    FROM dbo.Test T2

    WHERE T2.prefix = 'C'

    AND T2.document = 'STCR124172'

    AND seq_no = 2

    );

    GO

    -- Add the primary key

    ALTER TABLE dbo.Test

    ADD PRIMARY KEY NONCLUSTERED

    (prefix, document, seq_no)

    WITH (FILLFACTOR = 80);

    GO

    -- Clear the buffer pool

    CHECKPOINT;

    DBCC DROPCLEANBUFFERS;

    GO

    -- Test the query

    SELECT Document, rc

    FROM dbo.Test

    WHERE (prefix = 'C' AND document = 'STCR124171' AND seq_no = 1)

    OR (prefix = 'C' AND document = 'STCR124172' AND seq_no = 2);

    -- Tidy up

    DROP TABLE dbo.Test;

    The query plan in this case is:

  • Thanks for the detailed analysis Paul, I understand why the query is slow from your explanation.

    Moving the data to a SQL2008 server stops the speed issue. The updated optimizer must be able to deal with this type of query a little better.

    Using the UNION ALL clause returns the data at the same speed as the 2 separate statements but in the same format as the OR statement so, it looks like this is the option we will take.

    Here are the execution plans

    CHECKPOINT

    DBCC DROPCLEANBUFFERS

    GO

    SELECTDocument, rc

    FROM[ILines]

    WHERE(Prefix='C' and Document='STCR124171' and seqno=2)

    OR(Prefix='C' and Document='STCR124172' and seqno=1)

    OPTION (maxdop 1))

    The maxdop command is there to keep the comparison the same across the queries.

    That took over 15 seconds to complete.

    CHECKPOINT

    DBCC DROPCLEANBUFFERS

    GO

    SELECTDocument, rc

    FROM[ILines]

    WHEREPrefix='C'

    ANDDocument='STCR124171'

    ANDseqno=2

    SELECTDocument, rc

    FROM[ILines]

    WHEREPrefix='C'

    ANDDocument='STCR124172'

    ANDseqno=1

    That takes milliseconds.

    CHECKPOINT

    DBCC DROPCLEANBUFFERS

    GO

    SELECTDocument, rc

    FROM[ILines]

    WHEREPrefix='C'

    ANDDocument='STCR124171'

    ANDseqno=2

    UNION ALL

    SELECTDocument, rc

    FROM[ILines]

    WHEREPrefix='C'

    ANDDocument='STCR124172'

    ANDseqno=1

    That takes milliseonds again but returns 1 set of data as in the original OR query.

    With regard to the clustered index on this table, we are fully aware of the issues that it causes. This has been pointed out quite a while ago by ourselves and external consultants (as well as the 'no lock' clauses too). Again, we can only advise, but do not make the decisions to allow these fundamental flaws to be changed. These databases were designed many years ago when the development team knew very little about SQL design and structure. Newer tables that are created are designed with speed in mind so the indexes are controlled in a different manner. We are starting to have quite a few issues with speed of the queries at the moment so it is inevitable that a BIG change will be on the cards soon. Either that or we grind to a halt eventually! lol

    I appreciate ALL your help and will be a regular visitor to this site now.

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply