Dropping primary key improves performance

  • I am continually plagued by performance problems. When I display the estimated execution plan, the estimated cost and number of rows are usually way off, the plans it chooses are often really poor, and it often suggests new indexes that don't help.

    Right now I have a specific recurring issue I'd like to throw out there. I have a 500,000 row table with a two-column clustered primary key, let's call them A and B. I have some queries that join to this table on those two columns and they take an hour to run, returning about 25000 rows. SQL Server suggests I need a new nonclustered index on column B, and include column A. When I drop the primary key entirely, and don't create a new index, the query runs in a few seconds.

    First, why would it suggest that index, and second, why does a primary key kill the query performance? I've already tried rebuilding indexes and updating stats with no improvement. I also tried creating that suggested index and it didn't help; it just came up with another suggested index that doesn't help either.

    We're on 2008 R2. Thanks.

  • Have you tried updating the statistics on that table?

  • i'd like to see the actual query and an execution plan;

    you might have a non-Sargable query, functions in the Where clause, a catch all query containing a lot of OR statements, or something else that is preventing a decent plan and performance.

    i have lots of queries joining multiple tables with tens of millions of rows, and they don't perform poorly.

    if you can provide more details, i'm willing to bet we can offer some solid guidance.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Thanks, yes, I updated statistics. In the interest of being able to show it conveniently here I reduced the query down to a very simple example. The original was more complicated and had a where clause of just a single column equal to a constant that has no effect on the issue.

    I've attached two plan images. The first one is without a primary key; you will see a full table scan on the EKPO table. The second with a primary key (on columns EBELN and EBELP) has a clustered index scan. From what I see here I don't think the access of the EKPO table is the issue; you will see that the access to the MRP_RESULT_FACT table changed. This is what I see all the time -- a change is made to a table or query and triggers the optimizer to choose a completely different plan, changing things that had nothing to do with the actual change that was made.

    SELECT

    PURCH_DOC_FACT.PURCH_DOC_ID

    ,PURCH_DOC_FACT.PURCH_LINE_ITEM_ID

    ,PURCH_DOC_FACT.MATERIAL_ID

    ,PURCH_DOC_FACT.PURCH_DOC_DATE

    ,EKPO.ADPRI

    ,MRP_RESULT_FACT.MRP_DATE

    FROM

    PURCH_DOC_FACT

    INNER JOIN EKPO ON (PURCH_DOC_FACT.PURCH_DOC_ID = EKPO.EBELN

    AND PURCH_DOC_FACT.PURCH_LINE_ITEM_ID = EKPO.EBELP)

    LEFT OUTER JOIN MRP_RESULT_FACT ON (PURCH_DOC_FACT.PURCH_DOC_ID = MRP_RESULT_FACT.MRP_ELEM_NUM

    AND PURCH_DOC_FACT.PURCH_LINE_ITEM_ID = MRP_RESULT_FACT.MRP_ELEM_LINE_NUM)

  • smithhead (3/16/2016)


    Thanks, yes, I updated statistics. In the interest of being able to show it conveniently here I reduced the query down to a very simple example. The original was more complicated and had a where clause of just a single column equal to a constant that has no effect on the issue.

    I've attached two plan images. The first one is without a primary key; you will see a full table scan on the EKPO table. The second with a primary key (on columns EBELN and EBELP) has a clustered index scan. From what I see here I don't think the access of the EKPO table is the issue; you will see that the access to the MRP_RESULT_FACT table changed. This is what I see all the time -- a change is made to a table or query and triggers the optimizer to choose a completely different plan, changing things that had nothing to do with the actual change that was made.

    SELECT

    PURCH_DOC_FACT.PURCH_DOC_ID

    ,PURCH_DOC_FACT.PURCH_LINE_ITEM_ID

    ,PURCH_DOC_FACT.MATERIAL_ID

    ,PURCH_DOC_FACT.PURCH_DOC_DATE

    ,EKPO.ADPRI

    ,MRP_RESULT_FACT.MRP_DATE

    FROM

    PURCH_DOC_FACT

    INNER JOIN EKPO ON (PURCH_DOC_FACT.PURCH_DOC_ID = EKPO.EBELN

    AND PURCH_DOC_FACT.PURCH_LINE_ITEM_ID = EKPO.EBELP)

    LEFT OUTER JOIN MRP_RESULT_FACT ON (PURCH_DOC_FACT.PURCH_DOC_ID = MRP_RESULT_FACT.MRP_ELEM_NUM

    AND PURCH_DOC_FACT.PURCH_LINE_ITEM_ID = MRP_RESULT_FACT.MRP_ELEM_LINE_NUM)

    Any chance of you posting the actual execution plans by saving the actual plans to file instead of snipping pictures? It would be interesting to see how many scans are actually being done.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • What about the other tables in the query?

    What indexes do they have?

    Are there any FK constraints between these tables?

    _____________
    Code for TallyGenerator

  • There's no warning on the result operator to back this up (if the warning string is too long to display in the "hover over" property sheet, it can still be seen in the right-click property sheet), but that pattern of compute scalars, one for each table, between read and join, from a query which has eccentric performance characteristics, is often synonymous with implicit conversions.

    What are the data types of the join columns in each table?

    Use table aliasing to reduce noise in your queries:

    SELECT

    pd.PURCH_DOC_ID

    ,pd.PURCH_LINE_ITEM_ID

    ,pd.MATERIAL_ID

    ,pd.PURCH_DOC_DATE

    ,EKPO.ADPRI

    ,r.MRP_DATE

    FROM PURCH_DOC_FACT pd

    INNER JOIN EKPO

    ON (pd.PURCH_DOC_ID = EKPO.EBELN

    AND pd.PURCH_LINE_ITEM_ID = EKPO.EBELP)

    LEFT OUTER JOIN MRP_RESULT_FACT r

    ON (pd.PURCH_DOC_ID = r.MRP_ELEM_NUM

    AND pd.PURCH_LINE_ITEM_ID = r.MRP_ELEM_LINE_NUM)

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thanks everyone. I can post more details later, but here are some quick comments/questions:

    Good to know about compute scalars, I was wondering what that meant.

    What do you mean by "warning on the result operator"?

    There are no foreign keys in this database. (I didn't create this database.)

    I generally use table aliasing; this query was auto-generated by a BI tool (but with a little cleanup by myself).

    At any rate, I very much appreciate your expertise and will spend some more time looking at those things, but my main goal with this post is more general. I've used other databases for over 20 years and have some tuning experience, but I am relatively new to SQL Server and have never been so frustrated. It continually makes horrible plan choices, and query or table changes often trigger a strategy change that has nothing whatsoever to do with the change that was made. Its estimates are often way off, but supposedly we rebuild indexes and update statistics every day (I am not the DBA). Could there be something wrong with our configuration or management, or is the optimizer for 2008R2 just really bad? (We are on a path to upgrade in a few months so maybe that will help some?)

    Thanks.

  • smithhead (3/17/2016)


    Thanks everyone. I can post more details later, but here are some quick comments/questions:

    Good to know about compute scalars, I was wondering what that meant.

    What do you mean by "warning on the result operator"?

    There are no foreign keys in this database. (I didn't create this database.)

    I generally use table aliasing; this query was auto-generated by a BI tool (but with a little cleanup by myself).

    At any rate, I very much appreciate your expertise and will spend some more time looking at those things, but my main goal with this post is more general. I've used other databases for over 20 years and have some tuning experience, but I am relatively new to SQL Server and have never been so frustrated. It continually makes horrible plan choices, and query or table changes often trigger a strategy change that has nothing whatsoever to do with the change that was made. Its estimates are often way off, but supposedly we rebuild indexes and update statistics every day (I am not the DBA). Could there be something wrong with our configuration or management, or is the optimizer for 2008R2 just really bad? (We are on a path to upgrade in a few months so maybe that will help some?)

    Thanks.

    Any chance you could post up one of those execution plans? As a .sqlplan attachment? There are plenty of questions here which could be answered with just a quick peek...

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • What is the data type of the keys?

  • You need to review the indexes from scratch. That is, analyze SQL's index usage stats, missing index stats and index operational stats. The longer time period you have these stats for the better, as they will better reflect the actual usage of the table and its indexes.

    Hopefully you have another copy of the db, either live or as a backup, where you haven't dropped the existing pk yet, so you can still see all the stats on that index too. When an index is dropped, the corresponding stats are also dropped (except perhaps missing index stats, that one depends).

    By analyzing the stats you can determine:

    1) the best clustered index on every table -- this is the most critical factor in performance.

    2) any nonclus index(es) you still need, with included column(s).

    There is an iterative process. But it gets better as it goes along.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • smithhead (3/16/2016)


    Thanks, yes, I updated statistics. In the interest of being able to show it conveniently here I reduced the query down to a very simple example. The original was more complicated and had a where clause of just a single column equal to a constant that has no effect on the issue.

    I've attached two plan images. The first one is without a primary key; you will see a full table scan on the EKPO table. The second with a primary key (on columns EBELN and EBELP) has a clustered index scan. From what I see here I don't think the access of the EKPO table is the issue; you will see that the access to the MRP_RESULT_FACT table changed. This is what I see all the time -- a change is made to a table or query and triggers the optimizer to choose a completely different plan, changing things that had nothing to do with the actual change that was made.

    SELECT

    PURCH_DOC_FACT.PURCH_DOC_ID

    ,PURCH_DOC_FACT.PURCH_LINE_ITEM_ID

    ,PURCH_DOC_FACT.MATERIAL_ID

    ,PURCH_DOC_FACT.PURCH_DOC_DATE

    ,EKPO.ADPRI

    ,MRP_RESULT_FACT.MRP_DATE

    FROM

    PURCH_DOC_FACT

    INNER JOIN EKPO ON (PURCH_DOC_FACT.PURCH_DOC_ID = EKPO.EBELN

    AND PURCH_DOC_FACT.PURCH_LINE_ITEM_ID = EKPO.EBELP)

    LEFT OUTER JOIN MRP_RESULT_FACT ON (PURCH_DOC_FACT.PURCH_DOC_ID = MRP_RESULT_FACT.MRP_ELEM_NUM

    AND PURCH_DOC_FACT.PURCH_LINE_ITEM_ID = MRP_RESULT_FACT.MRP_ELEM_LINE_NUM)

    Just an FYI, A table scan on a heap (no clustered index) and a clustered index scan are actually the same thing, a table scan.

  • Lynn Pettis (3/17/2016)


    smithhead (3/16/2016)


    Thanks, yes, I updated statistics. In the interest of being able to show it conveniently here I reduced the query down to a very simple example. The original was more complicated and had a where clause of just a single column equal to a constant that has no effect on the issue.

    I've attached two plan images. The first one is without a primary key; you will see a full table scan on the EKPO table. The second with a primary key (on columns EBELN and EBELP) has a clustered index scan. From what I see here I don't think the access of the EKPO table is the issue; you will see that the access to the MRP_RESULT_FACT table changed. This is what I see all the time -- a change is made to a table or query and triggers the optimizer to choose a completely different plan, changing things that had nothing to do with the actual change that was made.

    SELECT

    PURCH_DOC_FACT.PURCH_DOC_ID

    ,PURCH_DOC_FACT.PURCH_LINE_ITEM_ID

    ,PURCH_DOC_FACT.MATERIAL_ID

    ,PURCH_DOC_FACT.PURCH_DOC_DATE

    ,EKPO.ADPRI

    ,MRP_RESULT_FACT.MRP_DATE

    FROM

    PURCH_DOC_FACT

    INNER JOIN EKPO ON (PURCH_DOC_FACT.PURCH_DOC_ID = EKPO.EBELN

    AND PURCH_DOC_FACT.PURCH_LINE_ITEM_ID = EKPO.EBELP)

    LEFT OUTER JOIN MRP_RESULT_FACT ON (PURCH_DOC_FACT.PURCH_DOC_ID = MRP_RESULT_FACT.MRP_ELEM_NUM

    AND PURCH_DOC_FACT.PURCH_LINE_ITEM_ID = MRP_RESULT_FACT.MRP_ELEM_LINE_NUM)

    Just an FYI, A table scan on a heap (no clustered index) and a clustered index scan are actually the same thing, a table scan.

    Almost, except you won't get a merge join from a table scan without a sort first.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • That's not really a lot of records. I'm in the data type mismatch of your join fields camp, but also, how much memory does your server have and what else is it doing? Could your instance be starved for memory?

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply