Can SAN Read+Write Cache cause out of sync db calls?

  • I have a SQL Server database connected to a Dell/EMC CX300 SAN Drive array system, we bought a SAN package from dell that is supposed to be certified for RDBMS systems (batteries protecting data loss on power failure, etc.).

    Anyway, I'm experiencing a problem which may be related to using read and write cache on the SAN, and I was wondering if anyone had any thoughts on it.

    Basically, I have two processes.

    1) One process is coming from a message delivery application, which calls a stored procedure to get a new message record from an outbound message queue (and move that message to a partitioned outbound queue table) all within a transaction.

    2) The other process is updating the message record in the partitioned message queue with a delivery acknowledgement, using the message queueID passed in the first processes to the message delivery application.

    The problem is that the second process is coming back with an error indicating that the message queue ID does not exist.

    This according to the logical flow of the processes should never be the case, as the transmission application (process 1) should never even get the record, no less send it out and get an acknowledgement back for it (process 2), until the record is already moved to the partitioned queue, committed, and able to be read.

    Anyway, one of the dba's on our team is blaming the SAN, saying that because we are using write and read cache on the SAN, that the transaction is being committed to the DB engine through the SAN's write cache, but the delivery acknowledgement query (process 2) is getting information from the SANs READ cache which doesn't have the changes yet, since the SANs write cache has not yet flushed to disk.

    Does that seem accurate/possible to anyone? Is data written to a SAN using WRITE cache susceptible to old reads from the SANs read cache, if the read request comes fast enough, so that it is before the write cache is flushed?

    I would think that a SAN certified for RDBMS usage would protect against this (hopefully by updating read cache as well as write cache before committing to the server application).

    Any help would be appreciated.

  • Hi,

    Sorry, but this would not be caused by SAN cache and seems more to do with transaction logic. Regardless of whether the data resides in SAN cache or on-disk, the behavior is more to do with how message queue is being accessed and overall transaction logic. I would suggest you review the overall transaction logic, as from my understanding the following is happening:

    1. Process 1 call a stored procedure to Retrieve (SELECT) a new message from Outbound Queue

    2. Process 1 moves retrived message to a different partitioned table (INSERT\DELETE?)

    3. Process 2 attempts to Update Outbound Partitioned Queue (UPDATE) based on messageID, but unable to locate this messageID

    As you mentioned, a SAN is no more than a storage place and Cache no more than memory. SQL Serve prefers accessing data from cache, as opposed to storage, thus no change from a non-SAN configuration.

    Cheers,

    Phillip Cox

  • Thank you. Yes, you have the logic correct. And I share your thoughts on this. However, my colleague seems to think that because the SAN has two separate cache's, a READ cache and a WRITE cache, remember this is SAN cache not system memory, That when the SAN is confirming the transaction is complete (written to disk) to the operating system (even though it has not actually been flushed to disk yet), that the SAN read cache, being separate from the SAN write cache, has not yet been updated with the WRITE changes, and is possibly being queried and returning the false (or old) data.

    So the question is, are SANs setup that way such that you can get a query response from the READ cache that does not reflect committed WRITE transactions not yet flushed to disk still in the WRITE cache? Does anyone know if this is possible? This would not seem very logical to me. (again we are using a Dell/EMC CX300 fibre SAN).

  • aspersage (10/30/2007)


    So the question is, are SANs setup that way such that you can get a query response from the READ cache that does not reflect committed WRITE transactions not yet flushed to disk still in the WRITE cache? Does anyone know if this is possible? This would not seem very logical to me. (again we are using a Dell/EMC CX300 fibre SAN).

    I'm not sure about SAN config, as I have little storage experience.

    It is possible, under certain conditions, but should not occur. If it does happen, you will see an entry in the error log with SQL complaining about a stale read. It's a fairly serious error condition.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi,

    Thanks for update, but I have worked with EMC CX300 and DX3000 over the years and what your collegue is stating is the cache is not mission critical, which this device is designed to do. I have never in my years of working with any mission critical SAN device experienced the type of issue you are going through. What you may be experiencing is LUN contention, whereas you have multiple LUNS mapped to a RAID Group and each LUN is contending with other LUNS for disk access. Ideally, you should have a one-to-one mapping of LUNS to RAID Group, but most customers try to leverage the one to many option, which is ok in a file sharing application, but terrible for database applications. This type of configuration may result in cache misses, which your system behaviour seems indicate. You will need to ensure the balance between storage processors and LUN's has been implemented within your infrastructure to reduce likelyhoold of read\write latencies.

    Here are some suggested test to see if SAN configuration has not been done optimally:

    1. Disable cache usage for all LUNS hosting database workloads

    2. Ask SAN team to provide output of SAN configuration and study LUN to Storage Processor mapping (should be 2KB or 4KB)

    3. Review following SAN configuration settings:

    Cache Page Size = Number of kilobytes stored in one cache page (ideally 2KB or 4KB for DB apps)

    Cache Flush Watermarks = This defines when processor flushes its write cache to disk (high = 80% and low = 60%)

    So to answer your question, its not the actual SAN hardware, but the SAN Configuration, which may be the culprit of your problems and once you disable all caching and test application, this will provide an answer. Finally, regardless of the cause, always ensure your transactions are designed with good logic.

    If you provide SAN configuration details, I can advise if its been setup optimally. If so, I'll provide email address once you let me know.

    Hope this helps,

    Phillip Cox

  • I've attempted to attach to the post a snapshot of a report generated from our SAN containing Configuration information.

    I appreciate your time!

    Thank you very much.

  • Just a question--does the account running SQL have the priv lock pages in memory? I recently ran into something similar and had failed to do that. Change it solved my issue. It was a 64-bit environment, though.

  • aspersage (10/29/2007)


    Anyway, one of the dba's on our team is blaming the SAN, saying that because we are using write and read cache on the SAN, that the transaction is being committed to the DB engine through the SAN's write cache, but the delivery acknowledgement query (process 2) is getting information from the SANs READ cache which doesn't have the changes yet, since the SANs write cache has not yet flushed to disk.

    No way

    Sounds like the second process is looking for the data before the first process commits it.

  • ...the SQL Account has administrator priv., so yes I would believe it can lock pages in memory.

    .. and Rob, I would agree with you that the 2nd process is most likely looking for results before the first one commits, the problem is that it occurs infrequently (like one out of 10,000 records), so it's difficult to troubleshoot the logic, it appears on the surface that it shouldn't ever be able to occur, as the application shouldn't even get the record until the first request is committed. Thus precluding an acknowledgement of that record before the initial grab is committed. But, I still suspect in high volumes, it's possible.

    So, I am just trying to get feedback to see if anyone has heard of the this issue resulting from a SAN READ/WRITE cache mis-match issue.

  • With a single SAN head (e.g. just one CX300) I don't think I'd waste a lot of time looking at the SAN, much more likely that you've got a bug, error, missed commit statement, etc. buried somewhere down in your code. The read/write cache suggestion might make sense in a clustered environment but I can't see it happening on a single head as described.

  • aspersage (11/1/2007)


    So, I am just trying to get feedback to see if anyone has heard of the this issue resulting from a SAN READ/WRITE cache mis-match issue.

    If you were getting cache misreads, you'd be getting an error like the following.

    Error 830 Severity 10

    stale page (a page read returned a log sequence number (LSN) ([old LSN] hat is older than the last one written ([later LSN))

    If you're not seeing that error, it's not the cache.

    Besides, since you've got a change to the table's data shortly before reading it again, it's very unlikely that the read will be done from disk, since the data is in SQL's data cache.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hey,

    We must remember, SAN cache is completely different from host cache, as once SAN generates an I/O completion event, as far as SQL Server is concerned, the data has been committed, but can still be held within SAN cache awaiting flush to disk. Incorrect SAN configuration can exhibit this type of behaviour, which is why I am suggesting both the read and write caches are disabled for the LUN's supporting the application database that is having the issue, if possible to see if problem still surfaces.

    Which LUN's are supporting affected database? The majority of LUN's are mapped to SPA, which doesn't look balanced (e.g. skewed to one SP) and possible contention issues for SQL Server page retrieval may exist. Please have your SAN team review setting, but quick test would be to disable all caching on LUN's involved in hosting database files.

    Thank you,

    Phillip Cox

  • I can try turning off the cache to see if the problem goes away. but I am not getting the error mentioned above -->

    "Error 830 Severity 10

    stale page (a page read returned a log sequence number (LSN) ( ))"

    Unfortunately, I have no SAN team. It's me, and our Systems Admin (whom's first SAN is the one we are working with). Do you know if the cache can be disabled (and then re-enabled) without having to loose connection to the servers?

    Thank you all for your feedback so far, I appreciate it!

    I'll also in the meantime be very carefully reviewing the procedures involved to see if I can find a logic error somewhere.

  • Hey,

    The error wouldn't get raised, as SQL Server is concerned, the I/O is completed, but in reality in cache on SAN.

    By disabling the storage system LUN cache property, this will prove whether caching is possible service chain bottleneck. As far as I am aware, within Navishpere management tool, you can modify cache property of concerned LUN's and change will have no impact on LUN data. Please check documentation to confirm.

    If you manage to disable LUN cache for LUN's supporting database files, please make sure you backup prior to change. If you have a development or test environment, I would suggest you test application there first.

    Thanks,

    Phillip Cox

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply