Stuck tracking down CXPACKET issue

  • We have a server that often has the following happen:

    1) It breaks a SPID down and assignes hundreds of threads to the SPID (not shown in sample below)

    2) All threads for the SIPD go into CXPACKET wait for an extended amount of time

    So we have two issues, how to keep a complex query from eating up all workers, and how to find out what is putting these queries on hold for extended amount of time (this one was on hold for 1560 seconds until I killed the SPID.

    SpidecidAge Secondsstatuswait_type

    125211560suspendedCXPACKET

    125221560suspendedCXPACKET

    125231560suspendedCXPACKET

    125241560suspendedCXPACKET

    125251560suspendedCXPACKET

    125261560suspendedCXPACKET

    125271560suspendedCXPACKET

    125281560suspendedCXPACKET

    125291560suspendedCXPACKET

    125301560suspendedCXPACKET

    125311560suspendedCXPACKET

    125321560suspendedCXPACKET

    125331560suspendedCXPACKET

    125341560suspendedCXPACKET

    125351560suspendedCXPACKET

    125361560suspendedCXPACKET

    125371560suspendedCXPACKET

    125381560suspendedCXPACKET

    125391560suspendedCXPACKET

    125401560suspendedCXPACKET

    125411560suspendedCXPACKET

    125421560suspendedCXPACKET

    125431560suspendedCXPACKET

    125441560suspendedCXPACKET

    125451560suspendedCXPACKET

    125461560suspendedCXPACKET

    125471560suspendedCXPACKET

    125481560suspendedCXPACKET

  • You can set MAXDOP to half the number of cores on your server if the instance is the only instance on the server.

    Fewer if more than one instance is on your server.

    That will prevent processes from taking over your instance.

  • Currently our Cost of THreshold for Parallelism is 5 and the Max Degree of Parallelism is 0.

    We have 24 cores.

    Does that change your suggestion any?

  • You could set maxdop to 12 or 8 and see if that helps.

  • Mind you, this is only a band-aid.

    You really should identify the query that is causing the issue and tune it.

  • dwilliscp (11/5/2013)


    Currently our Cost of THreshold for Parallelism is 5

    you also need to increase value of threshold to avoid early parallelism occurence

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • This morning when it failed there were only 7 SIPDs running on the server, we can subtract two... one was the base Trace and the second was database mail.

    So that leaves just 5. My capturing of processes.. showing wait.. is a SQL statement. So how would I modify the following to capture the execution plan, without creating a trace to capture all execution plans? Note I have this running every 15min during the two hours that we have the most failures due to query timeout. Side note only the Jobs created parrallel processing.. 49 threads, 25 threads, and 49 threads. (Of course in this case I can get an estimated execution plan by selecting the job.)

    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

    insert into zt_wait_Hist(

    Capture_DT

    , [Spid]

    , ecid

    , start_time

    , [Age Seconds]

    , nt_username

    , [status]

    , wait_type

    , [Individual Query]

    , [Parent Query]

    , [program_name]

    , Hostname

    , nt_domain

    )

    SELECT

    GETDATE() as Capture_DT

    , er.session_Id AS [Spid]

    , sp.ecid

    , er.start_time

    , DATEDIFF(SS,er.start_time,GETDATE()) as [Age Seconds]

    , sp.nt_username

    , er.status

    , er.wait_type

    , SUBSTRING (qt.text, (er.statement_start_offset/2) + 1,

    ((CASE WHEN er.statement_end_offset = -1

    THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2

    ELSE er.statement_end_offset

    END - er.statement_start_offset)/2) + 1) AS [Individual Query]

    , qt.text AS [Parent Query]

    , sp.program_name

    , sp.Hostname

    , sp.nt_domain

    FROM sys.dm_exec_requests er

    INNER JOIN sys.sysprocesses sp ON er.session_id = sp.spid

    CROSS APPLY sys.dm_exec_sql_text(er.sql_handle)as qt

    WHERE session_Id > 50

    AND session_Id NOT IN (@@SPID)

    ORDER BY session_Id, ecid

  • If I just take the Individual Query and run an estimated plan

    Plan A: it does not show any suggested index's .. you start with a Clustered Index Scan (cost 74%), move to Sort (20%) > Parallelism (6%) > Select (0%)

    This is the one that failed to execute in 20min.

    No blocking showed up in the trace...

    Estimated rows 28M, estimated cost 1,878, cashed plan 24b

    All SPIDs of these SPIDs were executing Select statements.. the writes were going on in the SSAS cubes. (for the jobs) and the users were only doing selects.

  • You know ... I am begining to wonder if the problem lies in SSAS, I have passed this mornings information off to the team that handles our cubes.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply