VM v Physical - Parallel Plan Problem

  • Hi all, hoping one of the bright sparks on here can point me in the right direction.

    I have 4 servers. 2 Physical and 2 VM.

    Running SQL 2008R2SP2.

    Physical I have 32Cores on Enterprise, and also 16 on Standard

    VM 8 Cores on Standard.

    Now, the juicy bit.

    I have a query (I know, shocking). Anyway, on the 2 physical machines this gets a lovely parallel plan out of the box, runs in under 10 minutes. With 'High Performance' power settings that drops to around 3, but I digress.

    On the VM boxes there is a bit of an issue. It won't pick a parallel plan. Stats are updated, indexes rebuilt, all to no avail.

    What has made a difference though, is changing the query from:

    select <massive field list> into #mytemptable from <11 way table join>

    to

    select * into #mytemptable from (select <massive field list> from <11 way table join> )x

    Now, in my exploration of this I have found all sorts of information - stuff like vm settings to pass through numa, os power settings, hyperthreading etc. But for some reason without this simple query rejig it won't take. Yes, I've fiddled the cost thresholds, the maxdops etc, with no results. The cost for the query without parallelism is so far past the cost threshold -talking 10's thousands here, not just 5.

    With the parallel plan on the vm it runs in about 5m, without it's about 25m, so it's quite key... but refactoring all the code is not going to be an easy job, and the physical boxes handle it anyway - even maxdopped down to 4.

    Anyone have any idea why this would be the case? Even tried slapping the new CU on to see if that cleared it, but no joy.

  • Same query, but different data sets right? Which means radically different statistics? It helps explain why you might see differences in the plans selected. But, it all comes back to the costing model. There are mechanisms you can try to force parallelism on the plan. Adam Machanic and Paul White make a number of suggestions. You can read about them here. Hopefully that will prove helpful.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Ah, no, I guess I wasn't clear.

    It's the same data set on each server. Hence the confusion 🙂

  • The differences between Standard and Enterprise could be part of it. There are options within the optimizer that are different there. But, it probably comes back to the available memory and available processors and the costing engine. If the costs are estimated higher for a parallel plan, it's off the table. Changing the query changed the costing.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • That's what I thought Grant - but then once the standard was on the new physical box, it still works.

    Trimming the physical box to numa 0 only has made it behave the same way, which is interesting. Ta.

  • Physical on maxdop 5 or less behaves the same as the vm. 6 or up parallel.

    But only NUMA 0 (8 cores) also ends up not parallel.

    Anyway, found it all a bit curious.

  • I'll bet if you compare costs by forcing plans to parallel using the trace flag, you'll see that, for whatever reason, the plans are higher in the systems where it's not choosing them. The thing about the costs is, they're all just calculations based on canned mechanisms. It doesn't actually measure the system it's on. It just looks at the memory and processors available and supplies the existing formula. Variations in the environments based on those settings, not the actual machines and their performance, could explain it. Not that it makes sense, but that it's explainable.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Yes, I don't doubt it.

    Any idea why using a single Numa node with 8 cores would throw away the parallel plan, whereas using both nodes, but limited to maxdop 6 on the server would use parallel?

  • Short answer, costing. Has to be something in the calculation engine that makes that determination. Not something I know. Adam might. Paul probably would.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Further fiddling suggests it is happy to parallelize even with only one core per numa node... I always understood it preferred to keep it on a single node.

  • Thanks for the input, found it a bit puzzling tbh.

  • Looking into this in more detail over at http://msdn.microsoft.com/en-us/library/gg415714.aspx

    This is a whitepaper Understanding and Controlling Parallel Query Processing in SQL Server

    Writers: Don Pinto, Eric Hanson

    Guessing it will leave me a little better informed than I was when I started this!

  • So... to summarise my findings:

    Server - Physical, 2x8CPU, 128GB Ram

    Parallelized:

    MAXDOP 5 or above

    and

    (

    Numa ALL Auto

    or

    Numa 1 or more Cores from BOTH nodes

    )

    Single thread:

    MaxDop 4 (or less)

    or

    Single Numa Node (either - even with 8 Cores)

    (note NUMA set in CPU affinity sql server settings)

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply