Simple query maxing CPU

  • Good afternoon everyone,

    I've got a simple query that should be no problem at all but which maxes out the CPU and causes the server to lock up. Here's the query:

    ;WITH e_Dupe AS

    (

    SELECT mcrn

    ,mpidm

    FROM Registration_loaderr

    GROUP BY mcrn, mpidm

    HAVING COUNT(*) > 1

    )

    SELECT *

    FROM Registration_loaderr le

    INNER JOIN e_Dupe ed

    ON le.mcrn = ed.mcrn

    AND le.mpidm = ed.mpidm

    WHERE le.error_check = '142285'

    The table definition is:

    CREATE TABLE [dbo].[Registration_loaderr]

    (

    [mcrn] [varchar](41) NULL,

    [mpidm] [int] NULL,

    [credit_hrs] [numeric](6, 0) NULL,

    [grade_alpha] [varchar](25) NULL,

    [grade_numeric] [numeric](5, 2) NULL,

    [grade_numeric_possible] [numeric](5, 2) NULL,

    [credits_earned] [numeric](5, 2) NULL,

    [GPA_Credits] [numeric](5, 2) NULL,

    [GPA_QP] [numeric](5, 2) NULL,

    [Early_College] [char](1) NULL,

    [Dual_Credit] [char](1) NULL,

    [Error_check] [varchar](50) NULL,

    [D_Acad_yr] [char](7) NULL,

    [D_TERM] [char](3) NULL,

    [D_IPEDS] [int] NULL,

    [D_CRN] [char](6) NULL

    ) ON [PRIMARY]

    GO

    As this table is for staging ETL data there are no indexes. There are roughly 900k rows in the table and the given query should return 0 records. If the CTE is excluded the query runs in 2 seconds returning ~122k records. All records have non-null Mcrn and Mpidm values. The CTE returns zero rows. Other simple queries hitting this table cause the same CPU issue. In the past the simple queries ran with zero issues but suddenly they max the CPU. The only recent change has been the addition of 4 persisted, calculated columns. I removed the columns and the issue persists. Attached is the execution plan.

    I don't believe this is a T-SQL issue, I think something is wrong with the table. Any suggestions on what issues I should look for?

  • What do these give you?

    USE [OSBE_ILEADS];

    GO

    EXEC sp_spaceused Registration_loaderr;

    USE [OSBE_ILEADS];

    GO

    SELECT index_id, avg_fragmentation_in_percent, page_count, avg_page_space_used_in_percent FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('Registration_loaderr'), NULL, NULL, 'SAMPLED');

    The SQL Guy @ blogspot[/url]

    @SeanPearceSQL

    About Me[/url]

  • Here are the results from the those two queries:

    name rows reserved data index_size unused

    Registration_loaderr881866 266888 KB 250784 KB 8 KB 16096 KB

    index_idavg_fragmentation_in_percentpage_countavg_page_space_used_in_percent

    0 0 31300 49.3581912527798

  • Hi,

    Your pages are only using 49% of the available space. This is a particular behaviour of heaps that have frequent deletes. I have blogged about this behaviour.

    Heaps of Trouble

    Try build a clustered index on the table and see if that improves your query time.

    The SQL Guy @ blogspot[/url]

    @SeanPearceSQL

    About Me[/url]

  • INdexes is definitely a MUST thing here or everywhere .

    other thing can below will return the same data too ?

    ;WITH e_Dupe AS

    (

    SELECT mcrn

    ,mpidm

    FROM Registration_loaderr

    WHERE error_check = '142285'

    GROUP BY mcrn, mpidm

    HAVING COUNT(*) > 1

    )

    SELECT *

    FROM Registration_loaderr le

    INNER JOIN e_Dupe ed

    ON le.mcrn = ed.mcrn

    AND le.mpidm = ed.mpidm

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • Adding a clustered index fixed it. After adding the clustered index on those two columns the query returns in 0 seconds. Here are the new results from those two queries:

    name rows reserveddata index_sizeunused

    Registration_loaderr881866 93704 KB92392 KB656 KB 656 KB

    index_idavg_fragmentation_in_percentpage_countavg_page_space_used_in_percent

    1 0.0606113083383843 11549 99.2834198171485

    Notice the table went from 266M to 93M and the page count went from 31k to 11k. I didn't know heaps reacted that way to a delete statement. It's really good to know, thank you.

    Bhuvnesh, that would certainly return the same results and would probably be a better way of writing the query.

    Thank you both for your help! It's greatly appreciated!

  • ElijahE (10/24/2013)


    Adding a clustered index fixed it.

    Thank you both for your help! It's greatly appreciated!

    You're welcome. I'm glad to assist.

    ElijahE (10/24/2013)


    Bhuvnesh, that would certainly return the same results and would probably be a better way of writing the query.

    There is a logical difference between the two though.

    Your query checks for rows duplicating mcrn & mpidm then returns any of those with an error_check of '142285'.

    The second query checks for rows duplicating mcrn & mpidm with an error_check of '142285'.

    With the following data the results will be different:

    DECLARE @T TABLE

    ([mcrn] [varchar](41) NULL,

    [mpidm] [int] NULL,

    [Error_check] [varchar](50) NULL);

    INSERT INTO @T VALUES

    ('Row1', 1, '142285'),

    ('Row1', 1, '0'),

    ('Row1', 1, '0');

    The SQL Guy @ blogspot[/url]

    @SeanPearceSQL

    About Me[/url]

  • The mcrn field is a grouping of multiple fields into one (not my design). The information stored in the error_check field is also included in the mcrn field. That's why the queries would return the same data in this specific case.

  • ElijahE (10/24/2013)


    Adding a clustered index fixed it. After adding the clustered index on those two columns the query returns in 0 seconds.

    it will actually it must be . try to create indexes to support your quesries.

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • Bhuvnesh (10/24/2013)


    INdexes is definitely a MUST thing here or everywhere .

    Careful now... I'd have to say, "It Depends". There are quite a few places where adding an index is actually the worst thing you could do even for SELECTs.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • SscLover (11/7/2013)


    hi Jeff,

    adding an index is actually the worst thing you could do even for SELECTs.

    could you please enlighten us with some test samples or links would be appreciated.

    The paraphrasing you did on what I said makes it sound very dangerous. Here's what I actually said... (I've bolded the part you left out)...

    [font="Arial Black"]There are quite a few places where [/font]adding an index is actually the worst thing you could do even for SELECTs.

    I don't have a demonstrable example mostly because I never thought of building one but there are many examples that you could Google. For example, one example has to do with Table Scans (Clustered Index Scan in most cases) v.s. Index Seeks. I've seen many a query at work where someone added a new index to get rid of a CI Scan to get an Index Seek and it was absolutely the worst thing that could be done because the CI Scan was about 40 times more efficient than the 100,000 Index Seeks that occurred after the new index was added.

    My point is that I don't want anyone to think that "INdexes is definitely a MUST thing here or everywhere" is the solution in all or even in most cases although I do agree that the presence of a proper Clustered Index is usually (but not always) a given. "It Depends" and someone must sit down and do some serious analysis and testing. Just adding a bunch of indexes to "solve" problems increases INSERT/UPDATE/DELETE latency (can actually produce timeouts for the front-end), can cause the optimizer to have to work harder to figure out which index is the most appropriate, can cause severe bloating of the database and the resulting backups (I have an audit table with 8GB of data and 14GB of indexes that I'm currently working on), nightly maintenance times can increase drastically (most NCI's fragment a lot because they are not in temporal order), and the unnecessary bloat also slows down restores in the even of a DR situation.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ah! THERE it is! I remembered that I had an article out there somewhere where I had documented that the example code actually ran twice as slow in the presence of what seemed like the right index to use and I found it. It's in the article at the following link. You can certainly setup the test data from the article and give it a try yourself with the understanding that index requirements can and do vary from machine to machine if you consider available memory, number of processors, MAXDOP settings, etc.

    http://www.sqlservercentral.com/articles/T-SQL/94570/

    The quote from the article is...

    Also notice that we didn’t add any indexes. It turns out that adding an index to the EmployeeID column would actually make the upcoming code run about twice as slow even though we’d get a MERGE JOIN out of it. Do NOT add an index to this interim table! If you intend to keep this table with the Nested Sets calculations (the Left and Right Bowers, etc) in it, add the indexes you'll need AFTER we get done with the rest of the steps below.

    Like I said, "It Depends" and no one should ever take on the attitude that "INdexes is definitely a MUST thing here or everywhere." Adding indexes without analysis and testing can be the worst thing to do.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply