Issue with processor affinity mask settings

  • Hi everyone,

    I’m facing a bit of a strange problem with processor affinity mask settings. I recently installed two new SQL 2005 servers with SP3 (9.0.4035) at a customer site. Server one is for development and has just one CPU and 1GB of RAM. Server two will become the production server, has 4 CPU’s and 4GB of RAM.

    On both machines a linked server has been created to an Oracle database. The linked servers are configured exactly the same. Each night a reporting database gets filled with data from the Oracle machine.

    When I use the default settings for processor affinity (automatically) on both machines, the load takes about 20 minutes on the development server and 1hour and 20 minutes on the production server.

    So I started to play around with the affinity mask settings and it turned out that as long as I don’t check the boxes for automatically manage processor or IO affinity mask, the load will take about 24 minutes on the production server. That’s still a slower than the dev machine, but way faster than the default settings.

    It doesn’t matter if I select only a few of the CPU’s or all of them.

    Has anyone else seen this behaviour? I’m also wondering what will be the impact on other tasks. Since the server is not in production yet I only have very limited performance figures so far.

    [font="Verdana"]Markus Bohse[/font]

  • Does CPU consumed on PROD when the tasks is executed using perfmon significantly greater then DEV envrionment and if both CPU are from same vendor probably check with hardware vendor.

    Also are these servers both 32 bit servers ?

    is there anu changes in sp_configure setting relative to hardware on both server any except for Affinity Mask setting you toyed with ?

    -PUZZLED ONE 🙂

    Cheer Satish 🙂

  • Both machines are actually virtual machines running on the same type of hardware. The OS is Windows 2008 Standard 32-bit. I don't have the Perfmon data for CPU usage since the job runs at night, but I can set up a log for tonight.

    All other configuration settings are default.

    What puzzles me is that even when I check all the individual boxes for all processors for cpu and IO affinity in SSMS, it's faster than simply checking the manage automatically option.

    [font="Verdana"]Markus Bohse[/font]

  • While you are logging it tonight , here something to chew on if you have not already read it.

    http://blogs.solidq.com/EN/rdyess/Lists/Posts/Post.aspx?ID=13

    Microsoft article infact does not want users to change the settings , but does mention it needs to change for some circumstances which is explained well in the blog post above.

    http://support.microsoft.com/kb/319942

    -:-)

    Cheer Satish 🙂

  • Hi TECHBABU,

    Thanks for your replies. The articles are useful, but don't really offer an explanation for what is happening on my machine. In the mean-time I was able to do some more testing including Perfmon counters. Luckily we are still not live with that machine so I could reschedule the transfer and run it (partially) a couple of times.

    The pattern is as following:

    Whenever "Automatically set processor affinity mask for all processors" is enabled, the overall CPU Usage is around 20%. One CPU is at 45% and another at 20% and the last two are very low. And the data loading process is slow.

    With any other configuration one CPU around is around 90%, the other CPU’s are very low and the dataload is fast.

    So it seems like the "Automatically set processor affinity mask for all processors" setting is causing the slow performance.

    [font="Verdana"]Markus Bohse[/font]

  • Maybe the Oracle ODBC or OLE DB connector you are using is not multi-processor aware and that is causing issues? I'd try a different connector. Just my limited educated guess.

  • I'm wondering if you have looked at I/O affinity masks?

    Also, are you doing the datapulls at the same time, could they be conflicting, does one start before the other(perhaps the one that processes well).

    You mention linked server retrieval, have you looked at an SSIS package to do it, it would almost certainly be faster, my experience is that lonked servers SUCK for retrieval of large amounts of data but works pretty good for discrete queries, if you are pulling say 1M records, an SSIS package would likely be able to use Fast Bulk Load, even on a slow server, its not hard to get 10K rows a second inserted into a table.

    CEWII

  • Andrew Maxwell (7/7/2009)


    Maybe the Oracle ODBC or OLE DB connector you are using is not multi-processor aware and that is causing issues? I'd try a different connector. Just my limited educated guess.

    Since we are using the same oracle client driver on both servers I don't think that's the issue.

    But i must also say that my project at this specific customer is finished, so I won't be able to do any moe testing.

    [font="Verdana"]Markus Bohse[/font]

  • Elliott (7/7/2009)


    I'm wondering if you have looked at I/O affinity masks?

    Also, are you doing the datapulls at the same time, could they be conflicting, does one start before the other(perhaps the one that processes well).

    You mention linked server retrieval, have you looked at an SSIS package to do it, it would almost certainly be faster, my experience is that lonked servers SUCK for retrieval of large amounts of data but works pretty good for discrete queries, if you are pulling say 1M records, an SSIS package would likely be able to use Fast Bulk Load, even on a slow server, its not hard to get 10K rows a second inserted into a table.

    CEWII

    Yes I did look at the IO affinity masks and I did try different settings as you can see in one of my previous postings. The two loading processes where scheduled to diferent times and I even switched the schedules, but it didn't make any difference.

    SSIS would probably a good alternative, but the reason I started this post was not to find the fastest way for the ETL process, but to find out why one server is so much slower than the other even though on paper it had much more hardware resources.

    Anyway I'm no longer working at this specific customer site, so I suggest we leave this topic as unsolved.

    [font="Verdana"]Markus Bohse[/font]

  • No problem man.

    CEWII

  • I'm not sure if this is exactly the same issue, but I have just installed 64bit SQL Server 2005 on a single physical CPU Nehalem Xeon x5570 with HT (so 8 virtual cores are shown to the OS) on a Windows Server 2008 Standard edition Dell box.

    I noticed the queries were not running as fast as I expected (around the same speed as a previous generation Xeon box) and, after trying everything else I could think of, I messed with the affinity mask settings. My results found that for CPU bound queries (that do not seem to run in parallel anyway) if the CPU bound part schedules on virtual core 0 they complete about twice as fast as if they schedule on any other core. Actually this is true no matter what the affinity mask is set to. But the only way to ensure they schedule on core 0 is to set the processor affinity mask for that core only.

    Obviously this doesn't scale well, but I don't really have much experience of this CPU / system setup, so if anyone knows what I'm doing wrong, or how to address this I'd really appreciate it, as most of the queries for the main system running on this are CPU bound.

    Thanks.

  • MarkusB (4/28/2009)


    Both machines are actually virtual machines running on the same type of hardware. The OS is Windows 2008 Standard 32-bit. I don't have the Perfmon data for CPU usage since the job runs at night, but I can set up a log for tonight.

    All other configuration settings are default.

    What puzzles me is that even when I check all the individual boxes for all processors for cpu and IO affinity in SSMS, it's faster than simply checking the manage automatically option.

    The issue you are experiencing appears to be related to how VMware allocates resources to Virtual Machines (VMs)

    From experience and how I understand how VMware works, I am using ESX 3.5 on 16-cpu core servers, the VMware host (VM host) shares its resources with its VMs measured in time slices in ms). Thus, VMs only get resources when the VMware host is able to allocate/schedule resources to VMs.

    Plus, the VMhost must be able to allocate all of the resources requested for the particular VM.

    For example, if a VM has one cpu, the VM host will allocate 1 available cpu resource for that VM 20ms (default) at a time.

    If a VM has 4-cpus, then, the VM host will have to allocate 4 available cpu resouces at the same time for this VM. Of course this will be harder to accomplish if there are other VMs running on this VM host. The VM host will have to schedule 4 of its cpus to be used by the 4-cpu VM while other VMs are waiting for resources.

    Thus, 4-cpu VMs can appear to run slower than single-cpu VMs depending on how VMware is configured and the number of cpus on the VMware host.

    If you do not need a 4-cpu VM, a dual-cpu VM should suffice and is easier for VMware to run.

    Also, NOTE: when using Windows Perfmon, the cpu data is useless because of the way VMware slices its cpu resources for allocation to it's VMs. Windows has to (unknowingly) wait for cpu resources from the VMware host. I think this will be changing soon in the next version of VMware.

    I hope this explanation helps.

  • MarkusB (4/28/2009)


    Both machines are actually virtual machines running on the same type of hardware. The OS is Windows 2008 Standard 32-bit. I don't have the Perfmon data for CPU usage since the job runs at night, but I can set up a log for tonight.

    All other configuration settings are default.

    What puzzles me is that even when I check all the individual boxes for all processors for cpu and IO affinity in SSMS, it's faster than simply checking the manage automatically option.

    The issue you are experiencing appears to be related to how VMware allocates resources to Virtual Machines (VMs)

    From experience and how I understand how VMware works, I am using ESX 3.5 on 16-cpu core servers, the VMware host (VM host) shares its resources with its VMs measured in time slices in ms). Thus, VMs only get resources when the VMware host is able to allocate/schedule resources to VMs.

    Plus, the VMhost must be able to allocate all of the resources requested for the particular VM.

    For example, if a VM has one cpu, the VM host will allocate 1 available cpu resource for that VM 20ms (default) at a time.

    If a VM has 4-cpus, then, the VM host will have to allocate 4 available cpu resouces at the same time for this VM. Of course this will be harder to accomplish if there are other VMs running on this VM host. The VM host will have to schedule 4 of its cpus to be used by the 4-cpu VM while other VMs are waiting for resources.

    Thus, 4-cpu VMs can appear to run slower than single-cpu VMs depending on how VMware is configured and the number of cpus on the VMware host.

    If you do not need a 4-cpu VM, a dual-cpu VM should suffice and is easier for VMware to run.

    Also, NOTE: when using Windows Perfmon, the cpu data is useless because of the way VMware slices its cpu resources for allocation to it's VMs. Windows has to (unknowingly) wait for cpu resources from the VMware host. I think this will be changing soon in the next version of VMware.

    I hope this explanation helps.

  • MarkusB (4/28/2009)


    Both machines are actually virtual machines running on the same type of hardware. The OS is Windows 2008 Standard 32-bit. I don't have the Perfmon data for CPU usage since the job runs at night, but I can set up a log for tonight.

    All other configuration settings are default.

    What puzzles me is that even when I check all the individual boxes for all processors for cpu and IO affinity in SSMS, it's faster than simply checking the manage automatically option.

    The issue you are experiencing appears to be related to how VMware allocates resources to Virtual Machines (VMs)

    From experience and how I understand how VMware works, I am using ESX 3.5 on 16-cpu core servers, the VMware host (VM host) shares its resources with its VMs measured in time slices in ms). Thus, VMs only get resources when the VMware host is able to allocate/schedule resources to VMs.

    Plus, the VMhost must be able to allocate all of the resources requested for the particular VM.

    For example, if a VM has one cpu, the VM host will allocate 1 available cpu resource for that VM 20ms (default) at a time.

    If a VM has 4-cpus, then, the VM host will have to allocate 4 available cpu resouces at the same time for this VM. Of course this will be harder to accomplish if there are other VMs running on this VM host. The VM host will have to schedule 4 of its cpus to be used by the 4-cpu VM while other VMs are waiting for resources.

    Thus, 4-cpu VMs can appear to run slower than single-cpu VMs depending on how VMware is configured and the number of cpus on the VMware host.

    If you do not need a 4-cpu VM, a dual-cpu VM should suffice and is easier for VMware to run.

    Also, NOTE: when using Windows Perfmon, the cpu data is useless because of the way VMware slices its cpu resources for allocation to it's VMs. Windows has to (unknowingly) wait for cpu resources from the VMware host. I think this will be changing soon in the next version of VMware.

    I hope this explanation helps.

  • MarkusB (4/28/2009)


    Both machines are actually virtual machines running on the same type of hardware. The OS is Windows 2008 Standard 32-bit. I don't have the Perfmon data for CPU usage since the job runs at night, but I can set up a log for tonight.

    All other configuration settings are default.

    What puzzles me is that even when I check all the individual boxes for all processors for cpu and IO affinity in SSMS, it's faster than simply checking the manage automatically option.

    The issue you are experiencing appears to be related to how VMware allocates resources to Virtual Machines (VMs)

    From experience and how I understand how VMware works, I am using ESX 3.5 on 16-cpu core servers, the VMware host (VM host) shares its resources with its VMs measured in time slices in ms). Thus, VMs only get resources when the VMware host is able to allocate/schedule resources to VMs.

    Plus, the VMhost must be able to allocate all of the resources requested for the particular VM.

    For example, if a VM has one cpu, the VM host will allocate 1 available cpu resource for that VM 20ms (default) at a time.

    If a VM has 4-cpus, then, the VM host will have to allocate 4 available cpu resouces at the same time for this VM. Of course this will be harder to accomplish if there are other VMs running on this VM host. The VM host will have to schedule 4 of its cpus to be used by the 4-cpu VM while other VMs are waiting for resources.

    Thus, 4-cpu VMs can appear to run slower than single-cpu VMs depending on how VMware is configured and the number of cpus on the VMware host.

    If you do not need a 4-cpu VM, a dual-cpu VM should suffice and is easier for VMware to run.

    Also, NOTE: when using Windows Perfmon, the cpu data is useless because of the way VMware slices its cpu resources for allocation to it's VMs. Windows has to (unknowingly) wait for cpu resources from the VMware host. I think this will be changing soon in the next version of VMware.

    I hope this explanation helps.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply