• Rob,

    To summarize your system here is what I have gathered so far:

    RamRaid QuantityBreak down

    Lab32GBRaid 1-04OS,Trans,DWH,Logs

    Prod24GBRaid 1-04OS,Trans,DWH,Logs

    Things we don’t know

    •Is it on a SAN

    •Processor Types and Cores Identical?

    •Maximum Memory

    •Minimum Memory

    •Max DOP setting on each system

    What I understand of your problem is this:

    In Lab you can run four simultaneous processes and the event translation takes around 50-60ms. In production there are 8 simultaneous processes which take nearly 10x the time (550ms to 650ms). There also appears to be a generally high CPU utilization of 50% in production; backups are also causing contention with I/O when they kick off.

    Based on that:

    SAN:

    I’m asking about your SAN because if indeed you are on a SAN your I/O stats could be lying to you. I believe SQLIO is based off WMI events and Perfmon counters. The metrics you have may or may not be accurate.

    Question:

    Are these disks on a SAN or no?

    RAM:

    The ram is different from production to LAB; in this case your lab has more. If your production server comes under pressure from too little RAM it can start paging out and causing I/O issues, CPU issues etc. So for instance with 24 GB in place maybe you have 20GB as your max and 8GB as your minimum setting. This would generally be ok provided that you don’t have other external programs running on this server which operate out of SQL Servers reserved memory pool. This would include items from within SQL such as CLR’s and Linked server calls.

    When SQL comes under pressure it will try flipping out what’s in memory but only from its reserved pool; any other memory being freed relies on the OS to release the memory. Again this will include linked server calls and CLR’s etc. In earlier editions of SQL; such as 2005 I sometimes have found it necessary to lock MIN and MAX memory in place with the same values as it couldn’t release the RAM in a timely fashion; but that is neither here nor there.

    Question:

    What are you minimum and maximum memory settings on your production server? Are there any other external programs running on the server? Are you using CLR’s extensively?

    Settings can be gathered via the GUI or via sp_configure.

    --Allow all options to be visable

    exec sp_configure 'show advanced options',1;

    GO

    --Execute this now as opposed to the next restart of the server

    Reconfigure with override;

    go

    --View all the settings

    exec sp_configure;

    go

    MaxDOP:

    Depending on how many cores you have and if this process is going parallel this setting matters a great deal. If you have a MAXDOP of 0 and this is going parallel then even a single processing event could be using all the cores on the system.

    Question:

    How many cores are on your system and what is the Maximum Degree of parallelism that is currently set?

    See sp_configure above.

    --Taken from http://msdn.microsoft.com/en-us/magazine/cc135978.aspx

    SELECT TOP 10

    [Wait type] = wait_type,

    [Wait time (s)] = wait_time_ms / 1000,

    [% waiting] = CONVERT(DECIMAL(12,2), wait_time_ms * 100.0

    / SUM(wait_time_ms) OVER())

    FROM sys.dm_os_wait_stats

    WHERE wait_type NOT LIKE '%SLEEP%'

    ORDER BY wait_time_ms DESC;

    *Apologies I forget where I got this from; if someone knows the author or URL I’ll add it

    Clustered Indexes:

    If you’re missing clustered indexes / Primary Keys it can cause your CPU to go nuts as SQL tries to satisfy a query across a heap. It’s worth taking a looks at the db’s in question as well as any others on the system to see if this is the case.

    --Missing Primary Keys:

    SELECT

    t.TABLE_CATALOG

    ,t.TABLE_SCHEMA

    ,t.TABLE_NAME

    ,t.TABLE_TYPE

    FROM

    INFORMATION_SCHEMA.TABLES t

    LEFT JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS pk

    ON t.TABLE_CATALOG = pk.TABLE_CATALOG

    AND t.TABLE_NAME = pk.TABLE_NAME

    AND t.TABLE_SCHEMA = pk.TABLE_SCHEMA

    AND pk.CONSTRAINT_TYPE = 'PRIMARY KEY'

    WHERE

    pk.TABLE_NAME IS NULL

    AND t.TABLE_TYPE='BASE TABLE'

    ORDER BY

    t.TABLE_CATALOG

    ,t.TABLE_SCHEMA

    ,t.TABLE_NAME

    --Missing Clustered Indexes

    SELECT name

    FROM sys.objects

    WHERE type = 'U'

    AND object_id NOT IN

    (SELECT object_id FROM sys.indexes WHERE index_id = 1)

    *Apologies I forget where I got this from; if someone knows the author or URL I’ll add it

    Missing indexes:

    This can also be a point of contention as your using statics to satisfy queries. It might be worth looking into it to see if you can optimize those tables and queries causing high CPU.

    --Count of missing indexes

    SELECT

    DatabaseName = DB_NAME(database_id)

    , count(*) as [Number Indexes Missing]

    FROM sys.dm_db_missing_index_details

    GROUP BY DB_NAME(database_id)

    ORDER BY 2 DESC;

    --Top ten missing indexes based on usage since last restart

    SELECT TOP 10

    [Total Cost] = ROUND(avg_total_user_cost * avg_user_impact * (user_seeks + user_scans),0)

    , avg_user_impact

    , TableName = statement

    , [EqualityUsage] = equality_columns

    , [InequalityUsage] = inequality_columns

    , [Include Cloumns] = included_columns

    FROM sys.dm_db_missing_index_groups g

    INNER JOIN sys.dm_db_missing_index_group_stats s

    ON s.group_handle = g.index_group_handle

    INNER JOIN sys.dm_db_missing_index_details d

    ON d.index_handle = g.index_handle

    ORDER BY [Total Cost] DESC;

    *Apologies I forget where I got this from; if someone knows the author or URL I’ll add it

    If all this doesn’t point you in the right direction or give you a better idea of what could be going on you have another couple of options that are more time consuming; which is why I left them at the end.

    Install the trial version of SQLSentry or SQLMonitor ; monitor your lab environment and production and see what is going on. Each of these products will provide metrics and a handy GUI for you to help better analyze your environment.

    Finally and least appealing to me (because it’s the most time consuming) would be to install SQL 2012 somewhere and hook up the Distributed Reply feature; this would allow you to capture and then reply what is going on in your production environment directly in your lab. I hope this helps.

    Teal Canady

    Sr. DBA

    Lieberman Research Worldwide

    *Please excuse any spelling or grammatical mistakes; I'm just a DBA: )