Tasty Performance scenario to feast on, looking for general feedback and theories only

  • Looking for some ideas on a performance scenario we are seeing in production. I can't make any changes to production for a while, and I haven't found a way to repro in non-PROD, so just looking for feedback and ideas right now.

    First I'll give some background on the setup, then a description of the problem, then my only theory so far

    BACKGROUND

    -----------

    We have a tall-skinny table that we use as a caching mechanism in our application. It's only 2 columns wide and we have 1 unique index on ColumnA asc, columnB asc.

    As far as the data shape and DML behavior goes, here are some notes:

    - The first column is a unique identifier for a cache entry NVARCHAR(50), and the second column are the actually values for the cache entry NVARCHAR(30). The combination of GUID/Value is unique.

    - Cardinality on the first column is low, and on the second column it is very high.

    - For the majority of the day, the count of the table is zero rows

    - for 2 hours between 12am and 2am midnight the business processes which use this cache table come to life and slam this table with inserts and deletes (no updates). One cache entry (ColumnA - GUID) can contain between a couple 100K to a couple Million entries (ColumnB - Value). Volume and activity do vary from night to night, but generally heavy during these 2 hours.

    - I call it the accordian table. It can grow from zero to 100 million and then back to zero within a very small amount of time.

    - the sub-select queries used to populate the tables are static and predictable nothing very complex.

    - Most of the time the insert/delete queries all perform well

    PROBLEM

    --------

    Every 3-4 weeks we get a hanging query that never finishes. This is a query that runs every day, and normally it runs in less than a minute. On the days we see the problem, I've let it run for over 6 hours and it hasn't completed. The minute I kill it and run it again, it completes in the sub-minute time frame again.

    THEORY

    ------

    I think the problem could be due to contention triggered by auto statistics in this scenario. SQL server can't keep up with constantly having to recompute stats for the index while the heavy insert/delete activity is going on. At some point query picks up a very bad plan and hangs. Has anyone else ever seen something similar caused by auto-stats?

    QUESTIONS

    ----------

    - Any way to pull the execution plan being used by a currently running query?

    - Would it be advisable to turn off auto-stats for my index based on the described scenario above?

    - Possibly unrelated question: From my understanding of Fill Factor, I think it would be preferable to have a low fill factor for the index on the table described above, but would there potentially be a huge performance gain if I lowered the FF from current 95% to say 70%?

    I'm working to somehow reproduce this, but in the meantime, I was hoping I could get some general feedback on this kind of scenario.

  • M_E_K (1/18/2013)


    Looking for some ideas on a performance scenario we are seeing in production. I can't make any changes to production for a while, and I haven't found a way to repro in non-PROD, so just looking for feedback and ideas right now.

    First I'll give some background on the setup, then a description of the problem, then my only theory so far

    BACKGROUND

    -----------

    We have a tall-skinny table that we use as a caching mechanism in our application. It's only 2 columns wide and we have 1 unique index on ColumnA asc, columnB asc.

    As far as the data shape and DML behavior goes, here are some notes:

    - The first column is a unique identifier for a cache entry NVARCHAR(50), and the second column are the actually values for the cache entry NVARCHAR(30). The combination of GUID/Value is unique.

    - Cardinality on the first column is low, and on the second column it is very high.

    - For the majority of the day, the count of the table is zero rows

    - for 2 hours between 12am and 2am midnight the business processes which use this cache table come to life and slam this table with inserts and deletes (no updates). One cache entry (ColumnA - GUID) can contain between a couple 100K to a couple Million entries (ColumnB - Value). Volume and activity do vary from night to night, but generally heavy during these 2 hours.

    - I call it the accordian table. It can grow from zero to 100 million and then back to zero within a very small amount of time.

    - the sub-select queries used to populate the tables are static and predictable nothing very complex.

    - Most of the time the insert/delete queries all perform well

    PROBLEM

    --------

    Every 3-4 weeks we get a hanging query that never finishes. This is a query that runs every day, and normally it runs in less than a minute. On the days we see the problem, I've let it run for over 6 hours and it hasn't completed. The minute I kill it and run it again, it completes in the sub-minute time frame again.

    THEORY

    ------

    I think the problem could be due to contention triggered by auto statistics in this scenario. SQL server can't keep up with constantly having to recompute stats for the index while the heavy insert/delete activity is going on. At some point query picks up a very bad plan and hangs. Has anyone else ever seen something similar caused by auto-stats?

    QUESTIONS

    ----------

    - Any way to pull the execution plan being used by a currently running query?

    - Would it be advisable to turn off auto-stats for my index based on the described scenario above?

    - Possibly unrelated question: From my understanding of Fill Factor, I think it would be preferable to have a low fill factor for the index on the table described above, but would there potentially be a huge performance gain if I lowered the FF from current 95% to say 70%?

    I'm working to somehow reproduce this, but in the meantime, I was hoping I could get some general feedback on this kind of scenario.

    Lots going on...

    - Any way to pull the execution plan being used by a currently running query?

    You can pull the estimated execution plans from the cache using sys.dm_exec_text_query_plan. On 2005, IIRC, to get actual execution plans I think your only option is to use Profiler/Trace.

    - Would it be advisable to turn off auto-stats for my index based on the described scenario above?

    I am guessing that it would be a bad idea in your situation given how much activity you have on the table in such a short amount of time. If you turned off auto-stats I think you would still need to run some kind of update during your processing window else bad plans would likely ensue possibly making things much worse.

    - Possibly unrelated question: From my understanding of Fill Factor, I think it would be preferable to have a low fill factor for the index on the table described above, but would there potentially be a huge performance gain if I lowered the FF from current 95% to say 70%?

    It will depend on a lot of other factors. Is the unique index on the table clustered? Are the unique identifiers in the NVARCHAR(50) issued in a sequence or are they effectively random? Why is the data type NVARCHAR(50) and not UNIQUEIDENTIFIER? I would like to see the complete DDL for the table in question including the index and anything else on that table. Feel free to change the table, index and column names if you want in case that might expose sensitive info or your identity. A few rows of sample data that might appear in the table could be helpful as well, so we can see what the data going into the NVARCHAR(50) looks like.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Hi opc.three,

    Thanks for the feedback so far. Really appreciate your time.

    Actual execution plan of a currently running query is what I'm looking to get, I'll take a look at profiler, SST options.

    In one of your responses, you said: "I'm guessing that it would be a bad idea",

    Did you mean to say: "wouldn't be a bad idea"?

    Answers to your follow up questions on FF:

    - It is not clustered

    - The GUIDs are generated/managed by the application. Here are some examples:

    b3d4a540-0c4e-1000-8029-a571dc530000

    b3d4a541-0c4e-1000-8029-a571dc530000

    b3d4a542-0c4e-1000-8029-a571dc530000

    b3d4a543-0c4e-1000-8029-a571dc530000

    b3d4a544-0c4e-1000-8029-a571dc530000

    b3d4a545-0c4e-1000-8029-a571dc530000

    b3d4a546-0c4e-1000-8029-a571dc530000

    Here are some examples of the QUALIFIED IDs:

    1-12TVTL1

    1-1CDYQZQ

    1-HDPSS4

    Here is DDL for table and index:

    CREATE TABLE [dbo].[M_C_CONTACT] (

    [GUID] [nvarchar](50) NOT NULL,

    [QUALIFIED_ID] [nvarchar](30) NOT NULL

    ) ON [PRIMARY]

    CREATE UNIQUE NONCLUSTERED INDEX [M_C_CONTACT_U1] ON [dbo].[M_C_CONTACT]

    (

    [GUID] ASC,

    [QUALIFIED_ID] ASC

    ) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 95) ON [PRIMARY]

  • M_E_K (1/22/2013)


    Hi opc.three,

    Thanks for the feedback so far. Really appreciate your time.

    Actual execution plan of a currently running query is what I'm looking to get, I'll take a look at profiler, SST options.

    In one of your responses, you said: "I'm guessing that it would be a bad idea",

    Did you mean to say: "wouldn't be a bad idea"?

    No, that it would be bad idea. With that many data changes in the table, without auto-stats enabled stats will become stale in a hurry. This means that the execution plan generated when the table had no or very few rows could be used when the table had 100K or millions of rows, and that could make for some very bad performance.

    Answers to your follow up questions on FF:

    - It is not clustered

    - The GUIDs are generated/managed by the application. Here are some examples:

    b3d4a540-0c4e-1000-8029-a571dc530000

    b3d4a541-0c4e-1000-8029-a571dc530000

    b3d4a542-0c4e-1000-8029-a571dc530000

    b3d4a543-0c4e-1000-8029-a571dc530000

    b3d4a544-0c4e-1000-8029-a571dc530000

    b3d4a545-0c4e-1000-8029-a571dc530000

    b3d4a546-0c4e-1000-8029-a571dc530000

    Here are some examples of the QUALIFIED IDs:

    1-12TVTL1

    1-1CDYQZQ

    1-HDPSS4

    Here is DDL for table and index:

    CREATE TABLE [dbo].[M_C_CONTACT] (

    [GUID] [nvarchar](50) NOT NULL,

    [QUALIFIED_ID] [nvarchar](30) NOT NULL

    ) ON [PRIMARY]

    CREATE UNIQUE NONCLUSTERED INDEX [M_C_CONTACT_U1] ON [dbo].[M_C_CONTACT]

    (

    [GUID] ASC,

    [QUALIFIED_ID] ASC

    ) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 95) ON [PRIMARY]

    I would consider changing the data type of the GUID column to UNIQUEIDENTIFER, which is 16 bytes wide. You'll save yourself 86 bytes per row (84 for data + the 2-byte overhead in-built for NVARCHAR).

    As for your indexes, I would also try changing M_C_CONTACT_U1 to clustered and see how you do with fragmentation. Re: the GUIDs and how they might affect the FILLFACTOR, are they generated in sequence or are they effectively random?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply