TDE and absolutely ridiculous amounts of corruption?

  • I've searched around quite a bit and can't find anyone else with this problem, but ever since we implemented TDE encryption (we have a separate certificate for each Instance but not each database) we have seen corruption pop up ALL OVER THE PLACE.  This is especially the case with ETL servers.. seems like even 1 night being encrypted they corrupt beyond repair.. Has anyone else ran into this?  This corruption happens across multiple servers and multiple different setups.
    The majority of the corruption is indexes but sometimes its also in row data. 

    Windows server 2012 with 2012 and 2016 SQL servers.
    windows server 2008 with 2012 SQL servers.

  • You might be having this issue FIX: TDE-enabled database backup with compression causes database corruption in SQL Server

    Alex S
  • I did look at that .. but I don't believe this is the case.. the reason being is that yesterday, I took a backup of the unencrypted database.. restored it under a new name then encrypted it and ran a DBCC immediately after and it was already corrupted.

    We are investigating the possibility that this might be related to VMWare .. I did restore it to another box and encrypted it and it was fine.. but if I take it and put it on another instance on the same box it gets corrupted. 
    Both boxes are same SQL version and same windows version.. I'll update this when I have more info.

    -Edit: Also, we are on 2016 SP2 CU 4 on this box so that should have been fixed in this instance.

  • oogibah - Thursday, January 24, 2019 1:20 PM

    I did look at that .. but I don't believe this is the case.. the reason being is that yesterday, I took a backup of the unencrypted database.. restored it under a new name then encrypted it and ran a DBCC immediately after and it was already encrypted.

    We are investigating the possibility that this might be related to VMWare .. I did restore it to another box and encrypted it and it was fine.. but if I take it and put it on another instance on the same box it gets corrupted. 
    Both boxes are same SQL version and same windows version.. I'll update this when I have more info.

    -Edit: Also, we are on 2016 SP2 CU 4 on this box so that should have been fixed in this instance.

    You are saying that you took a backup from the VMWare box and restored it in another box outside the VMWare and it wasn't corrupted but when restoring in another box in the VMWare it gets corrupted?

    Are the VMWare machines and the other box using the same SAN?

    edit:typo

  • No, sorry, both servers are in vmware.. they are both windows server 2016 with sql server 2016 installed both are SP2.. one is CU 3 one is CU4.  On the CU4 one, if I encrypt with TDE it becomes corrupted immediately.. i.e I wait for the encryption to complete run dbcc checkdb.. corruption all over the place.  On the other box I was able to encrypt it with TDE and it didn't corrupt .. immediately at least.. who knows sometimes the corruption shows up after a couple days, and this database isn't really active on the test instance like it is in production so its hardly apples to apples.  The only other difference my infrastructure guy saw between the two boxes is the one that corrupted immediately was on Vmware tools version 9, and the one that didn't corrupt immediately was on 10.  However, we have other servers in the environment on VMWare tools version 10 that occasionally get corrupted. 

    We've probably had 30 corrupt databases since implementing TDE over the last 5 months.   For the most part the ones that get corrupted also seem to be the larger databases.

  • oogibah - Friday, January 25, 2019 8:50 AM

    No, sorry, both servers are in vmware.. they are both windows server 2016 with sql server 2016 installed both are SP2.. one is CU 3 one is CU4.  On the CU4 one, if I encrypt with TDE it becomes corrupted immediately.. i.e I wait for the encryption to complete run dbcc checkdb.. corruption all over the place.  On the other box I was able to encrypt it with TDE and it didn't corrupt .. immediately at least.. who knows sometimes the corruption shows up after a couple days, and this database isn't really active on the test instance like it is in production so its hardly apples to apples.  The only other difference my infrastructure guy saw between the two boxes is the one that corrupted immediately was on Vmware tools version 9, and the one that didn't corrupt immediately was on 10.  However, we have other servers in the environment on VMWare tools version 10 that occasionally get corrupted. 

    We've probably had 30 corrupt databases since implementing TDE over the last 5 months.   For the most part the ones that get corrupted also seem to be the larger databases.

    Which VMWare ESXI you guys are using? 
    What about your storage? is it SAN or local?
    Are all the VMs stored in the same SAN?
    Are both database stored in the same datastore?

    Side note:
    I'm a little bit rusty on vmware, its been a while since i used it.

  • Alejandro Santana - Friday, January 25, 2019 9:01 AM

    Which VMWare ESXI you guys are using?  6.5 build 8935087
    What about your storage? is it SAN or local? Storage is local on each host but it is hyperconverged and considered a SAN. (Hyper convergence performed by Cisco Hyperflex)
    Are all the VMs stored in the same SAN? All VMS and their disks are stored on the same SAN.
    Are both database stored in the same datastore? Yes

    Side note:
    I'm a little bit rusty on vmware, its been a while since i used it.

    Verified with infrastructure above.. Also, I checked the database in test and after 24 hours still no corruption on that one.. I've also restored an encrypted database to a second instance on the original box that had corruption.  When I did this yesterday it also corrupted immediately after encrypting it, however, in this instance I can play with it more as its not being used.  

    So just to clarify to prevent confusion
    Box 1: 2 instances have the database, both corrupted immediately after encryption VMWare tools 9
    Box 2: 1 instance has the database, database has been encrypted hasnt corrupted after 24. VMWare tools 10.

    We updated Box 1 to VMWare Tools 10 last night so I'm re encrypting the second instance on box 1 this morning and will report back with the results.  Though I don't anticipate that the VMWare Tools REALLY has anything to do with it
    its just the most stand out difference we can see.

    -Update DBCC on the Box 1 instance after VMware tools 10 upgrade, the database still corrupts immediately after being encrypted, as suspected.

  • oogibah - Friday, January 25, 2019 9:12 AM

    Verified with infrastructure above.. Also, I checked the database in test and after 24 hours still no corruption on that one.. I've also restored an encrypted database to a second instance on the original box that had corruption.  When I did this yesterday it also corrupted immediately after encrypting it, however, in this instance I can play with it more as its not being used.  

    So just to clarify to prevent confusion
    Box 1: 2 instances have the database, both corrupted immediately after encryption VMWare tools 9
    Box 2: 1 instance has the database, database has been encrypted hasnt corrupted after 24. VMWare tools 10.

    We updated Box 1 to VMWare Tools 10 last night so I'm re encrypting the second instance on box 1 this morning and will report back with the results.  Though I don't anticipate that the VMWare Tools REALLY has anything to do with it
    its just the most stand out difference we can see.

    -Update DBCC on the Box 1 instance after VMware tools 10 upgrade, the database still corrupts immediately after being encrypted, as suspected.

    Thanks for the information.

    I will try to build a laboratory at home with those infrastructure specs you just told me, if something happens i'll let you know.
    I'll let you know the specs of the labs i'll be working on.

  • Thank you sir, very much appreciate the assistance here.

  • Are both the working and not working VM's on the same physical host? You've probably already checked but I'm wondering if it could be fault on the host affecting the VM?

  • They are not on the same host as one is production and the other is a test environment. however all hosts are mirrored.

  • I would open a case with MS here. I haven't seen anyone getting corruption with TDE because of enabling it, but there might be some bug. I would be worried here that there is some bug, but VMWare is used in lots of places, including MS, and I wouldn't guess that is the issue.

  • Yeah, that was my thought, neither of these products are uncommon so its pretty nuts that I'm the only one I see experiencing it.

  • Just an update.. we may have found the issue in the scsi disk controller.. updated it from LSI Logic SAS to VMWare Paravirtual.. which is the recommended setup for sql server..  the database didn't corrupt immediately after we made this change.. going to check on it every day for a couple days and I'll update this if it is indeed fixed.

  • oogibah - Tuesday, January 29, 2019 12:03 PM

    Just an update.. we may have found the issue in the scsi disk controller.. updated it rom LSI Logic SAS to VMWare Paravirtual.. which is the recommended setup for sql server..  the database didnt' corrupt immediately after we made this change.. going to check on it every day for a couple days and I'll update this if it is indeed fixed.

    That makes sense. Thanks a lot for posting back - please keep us updated. Hopefully this is it and resolves your headaches. 

    Sue

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply