New Memory Issues on new 64-bit Server

  • I recently migrated our SQL server from a 32-bit Windows Std 2003 with only 4GB of memory to a brand new 64-bit server running Windows Server 2008 Std on 32GB of ram.

    Keep in mind though, that while the OS is 64-bit, I am still just running SQL Server 2005 32-bit.

    The old server ran okay, but I convinced management to let me upgrade the box because it would be better off with some faster disk and more memory. After doing so, the server ran like a top for about 6 days, and then started complaining about insufficient memory, and then eventually shut itself down due to issues with TempDB.

    I should preface this post with a disclaimer that I fiddled with the -g startup flag on day 1 (setting it to 1024MB) because we had trouble getting some CLR assemblies to load. I am suspicious that this has something to do with my issue and it bothers me to no end that I did not have to do this on my old 32-bit server but I did on my new 64-bit server.

    I will try to provide all the pertinent information I can without too much noise:

    Microsoft SQL Server 2005 - 9.00.4053.00 (AWE enabled)

    Although I have 32GB of RAM on the server, I have forced Max Server Memory to only 20GB, just to make certain that there is plenty of room before I start dialing it in.

    I see in the SQL error log that the first indication of memory pressure is:

    AppDomain 9 (MyDBName.dbo[runtime].8) is marked for unload due to memory pressure.

    This message occurred right at the beginning of a scheduled (every 6hrs) server-wide tran log backup.

    6 hours later, just prior to the next scheduled tran log backup, I see this message,

    Failed Virtual Allocate Bytes: FAIL_VIRTUAL_RESERVE 1048576

    followed by a dump from DBCC MEMORYSTATUS. I will spare you the full dump and just provide it as an attachment. To summarize, AWE was using 20GB as expected and none of the memory clerks seem to be using an exorbitant amount of memory (8MB at most).

    After that the tran log backups continued but this time each preceded with the message,

    Downgrading backup log buffers from 1024K to 64K

    This appears to be in response to memory pressure and is the server's attempt to complete the operation with what little resources it has.

    After that for the next couple of hours there were several queries that failed with the message,

    Error: 701, Severity: 17, State: 123.

    There is insufficient system memory to run this query.

    Then at one point in the error log I see, Server is running low on virtual address space or machine is running low on virtual memory. Reserved memory used 1 times since startup. Cancel query and re-run, decrease server load, or cancel other applications.

    After that a handful of other seemingly random errors with replication and authentication start happening, I think at this point the server was just in a downward spiral and losing it's mind because of the tight squeeze on memory.

    Finally, I see this message, just before SQL shut itself down to recover,

    During undoing of a logged operation in database 'tempdb', an error occurred at log record ID (123:43525:83).

    Given that the server eventually shut down because of an error in the TempDB log file, I would typically be looking at disk issues, but since everything leading up to this points to a memory pressure issue, I'm just writing it off as a side effect of the core memory problem.

    I have Googled my brains out and learned way more about MemToLeave, VAS, memory clerks, and App domains then I ever cared to, and I still am no closer to understanding what went wrong and what I can do to prevent it from happening again. I am about ready to open a case with Microsoft which should provide you with a sense of how befuddled I am.

    Any advice would be greatly appreciated.

  • bobmclaren (5/24/2010)


    I recently migrated our SQL server from a 32-bit Windows Std 2003 with only 4GB of memory to a brand new 64-bit server running Windows Server 2008 Std on 32GB of ram.

    Keep in mind though, that while the OS is 64-bit, I am still just running SQL Server 2005 32-bit.

    The old server ran okay, but I convinced management to let me upgrade the box because it would be better off with some faster disk and more memory. After doing so, the server ran like a top for about 6 days, and then started complaining about insufficient memory, and then eventually shut itself down due to issues with TempDB.

    I should preface this post with a disclaimer that I fiddled with the -g startup flag on day 1 (setting it to 1024MB) because we had trouble getting some CLR assemblies to load. I am suspicious that this has something to do with my issue and it bothers me to no end that I did not have to do this on my old 32-bit server but I did on my new 64-bit server.

    I will try to provide all the pertinent information I can without too much noise:

    Microsoft SQL Server 2005 - 9.00.4053.00 (AWE enabled)

    Although I have 32GB of RAM on the server, I have forced Max Server Memory to only 20GB, just to make certain that there is plenty of room before I start dialing it in.

    I see in the SQL error log that the first indication of memory pressure is:

    AppDomain 9 (MyDBName.dbo[runtime].8) is marked for unload due to memory pressure.

    This message occurred right at the beginning of a scheduled (every 6hrs) server-wide tran log backup.

    6 hours later, just prior to the next scheduled tran log backup, I see this message,

    Failed Virtual Allocate Bytes: FAIL_VIRTUAL_RESERVE 1048576

    followed by a dump from DBCC MEMORYSTATUS. I will spare you the full dump and just provide it as an attachment. To summarize, AWE was using 20GB as expected and none of the memory clerks seem to be using an exorbitant amount of memory (8MB at most).

    After that the tran log backups continued but this time each preceded with the message,

    Downgrading backup log buffers from 1024K to 64K

    This appears to be in response to memory pressure and is the server's attempt to complete the operation with what little resources it has.

    After that for the next couple of hours there were several queries that failed with the message,

    Error: 701, Severity: 17, State: 123.

    There is insufficient system memory to run this query.

    Then at one point in the error log I see, Server is running low on virtual address space or machine is running low on virtual memory. Reserved memory used 1 times since startup. Cancel query and re-run, decrease server load, or cancel other applications.

    After that a handful of other seemingly random errors with replication and authentication start happening, I think at this point the server was just in a downward spiral and losing it's mind because of the tight squeeze on memory.

    Finally, I see this message, just before SQL shut itself down to recover,

    During undoing of a logged operation in database 'tempdb', an error occurred at log record ID (123:43525:83).

    Given that the server eventually shut down because of an error in the TempDB log file, I would typically be looking at disk issues, but since everything leading up to this points to a memory pressure issue, I'm just writing it off as a side effect of the core memory problem.

    I have Googled my brains out and learned way more about MemToLeave, VAS, memory clerks, and App domains then I ever cared to, and I still am no closer to understanding what went wrong and what I can do to prevent it from happening again. I am about ready to open a case with Microsoft which should provide you with a sense of how befuddled I am.

    Any advice would be greatly appreciated.

    1) Is AWE enabled in SQL Server and /PAE added to boot.ini?

    2) Why in the world are you running 32 bit sql server? There are SOOOOO many issues that go away completely with 64bit memory management.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • 1) The /PAE switch is not supported (or needed I guess) on a 64-bit OS.

    2) Following the migration guide I was reading for moving the SQL service to new hardware, it was recommended that the target SQL server have the "exact same edition and build number" of the source server. This made sense to me since I was copying everything over including my master db and all the other system databases. I didn't know if it would blow up if you fire up a 64-bit SQL 2005 server and attach it to the master db of a 32-bit SQL 2005 server.

  • set awe enabled to 1

    then set your memory

  • My Apologies,

    By memory i mean Memory to be utilized by SQL Server 2005/2008 . if you didnot set AWE, by default it is going to use the same default set memory by MS

  • AWE is enabled and I have the SQL Max Server Memory at 20GB. (I have 32GB available)

    Another interesting observation is that I started tracking my available memory (using the sys.dm_os_virtual_address_dump DMV) and it is steadily decreasing over time. I estimate I will run out of virtual address space in 3 days if this continues unchecked.

    I'll have to see if I can figure out where this memory leak is.

  • Have you tried granting the user account that is running SQL Server the "lock pages in memory" right? This is a windows setting so it shouldn't matter whether you're running 32 bit or 64 bit SQL Server.

    "Beliefs" get in the way of learning.

  • The service account does indeed already have the "lock pages in memory" right assigned to it.

    UPDATE:

    I have opened a support case with MS support and they have me running some traces and perf counters. At this time we don't know what it is that consumed up all the memory, but at least if it happens again we will have more intel.

    I will follow up with the resolution, assuming we get one.

  • Hi,

    Dont know these will help or not , but try to create more tempdb database, anyhow that is good on bases of performance issue ... ,Can say just give it a try ...

  • one small thing what is the log file size and are you using replication (merge) , Do check the server name , some days before faced a problem in a client side there server name was different whereas SQL Server was pointing to some old server name (which was renamed but not configured in sql server ) .

    Hope it help .

  • So after nearly 3 months I am following up to say that in the end, the only resolution was to uninstall the 32-bit and install 64-bit (which by the way was no fun in our 24/7 shop).

    After MS Support gathered all the logs they could and dug through my memory dumps, they concluded that the reason I was running out of memory was related to CLR, and the extra memory allocations needed for each processor core running on a 32-bit instance. I had CLR loaded, and I have 12 cores on this machine, so it stood to reason. But I was told that tweaking my affinity mask to reduce the number of cores would not help because these allocations occur outside the SQL process. Later on I managed to disable CLR and unfortunately I was still crashing every 6 days so honestly I don't know how accurate their assesment was.

    Rather than re-open the MS support case and go through the whole process again I decided it would be less painful to simply migrate from SQL 2005 32-bit to SQL 2005 64-bit, and I'm happy I did.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply