It surprises me how often I see people posting questions about what type of encryption to implement for credit card data. If you are processing credit cards yourself, and storing the data, you need to comply with the PIC regulations that exist. Here's a good place to get started: https://www.pcisecuritystandards.org/.
Actually the place you need to start is with your bank or processing company. They should be able to guide you in what requirements need to be met for safe data storage.
However if you're running some service and perhaps trying to store a credit card for a customer to make it easy to charge them over and over, that doesn't mean you don't need to comply. You are holding financial information about a customer and if something happens to the data, you're at fault. Your company could be liable, and possibly even you personally if you make the recommendation to build something yourself.
Good security isn't magical, and it isn't secret. It involves you using well known algorithms, protecting the keys, and following best practices. There are some great encryption technologies in SQL Server 2005/SQL Server 2008, but don't just implement them without learning a few things about what best practices are and how these technologies work.
Encryption is a tough subject, but it’s one that’s fascinated me for years. I downloaded PGP when it came out nearly two decades ago, and learned about the intricacies of symmetric and asymmetric keys. I thought it was fun, even when I couldn’t convince many people to consider using it.
Last year at PASS I ran into Michael Coles, who I’ve known for a few years and had just finished tech reviewing a book for. He was planning on a new book and asked me if I’d be interested in doing another one, this one on encryption. I jumped at the chance, and sometime this past spring started getting chapters.
I knew a lot, but had to learn a lot as well. I spent a lot of time Googling various terms and double checking historical references as well as algorithms and terms. I think I did a good job, and Michael and his co-author, Rodney Landrum, did a fantastic job.
I found a package on my desk this morning, and opened it to find my copy of the book. It’s now out, and available in bookstores, including Amazon.
So check it out, and learn a bit about this very important topic for DBAs. Not a lot of companies use encryption, but the ease of administration has grown with SQL Server 2008 and more companies are using it every day. I expect to see it start coming up in interviews soon.
I've been tech editing Michael Cole's new book on Encryption in SQL Server, which is due out in August or September of this year. It's an exciting project for me because I was very disappointed in the extreme dearth of information in Books Online for encryption. I totally understand why people struggle with this or don't use it, but the vendor skimping on details is inexcusable.
In any case, I noticed early on that there was no facility for backing up symmetric keys. I got a few comments from Microsoft about "why would you since they're in the database backup" that did not make me feel better. Lots of things are in the database backup, but that doesn't mean we don't need to transfer them to other systems or other databases. Or that we just want a back up.
In answering a post for someone on encryption, I mentioned to the poster that they needed to look at a database backup to get a copy of their symmetric key, which they had deleted. Then Michael chimed in that this wasn't necessarily true, and I'm betting that thread lead this this blog post: "Cloning" Symmetric Keys. I haven't asked Michael about it since he's a busy guy, and I know he's been traveling, but that's my guess.
In the blog, Michael says that with the same seed values, SQL Server will create the same symmetric key. If you have the same key source and identity values, you get the same key. Which is good and bad. It's good in that you can reproduce the key if you have issues or corruption.
Bad because someone else can do this as well if they know what seed values you used. That means they can decrypt your data.
So apparently the seed values are just as important to protect as the keys themselves.
One of the new features with SQL Server 2008 is the addition of extensible key management, allowing hardware devices to be integrated into the encryption/decryption process for the keys securing your data. There are various vendors that are building Hardware Security Modules (HSMs) that can connect to your SQL Server instance.
The author of the book I'm tech editing set up a SafeNet Luna HSM device on a machine he has and gave me access through a VM. It's an interesting device, and while I haven't seen it, it allows you to offload some of the encryption/decryption processing from your SQL Server to the device. This should speed up your encryption, though I haven't had a real world workload to really test it. And I hate running things over a VM remotely.
The architecture allows for these HSM devices to plug right into the SQL Server encryption stack. You create asymmetric and symmetric keys (depending on which your device supports) the same way, adding a "WITH PROVIDER" clause to your CREATE statements.
The provider is how an EKM system hooks into SQL Server. Essentially you are registering your device within SQL Server as a way to handle cryptographic functions, kind of like you used to register DLLs so you could implement an extended stored procedure. In this case there is a new CREATE statement, the CREATE CRYPTOGRAPHIC PROVIDER which registers the DLL for your EKM system. This is provided by the manufacturer and doesn't get you access to the device. There will be a device specific procedure for logging into the device. Once that is complete, you'll have to set up a credential and grant a login rights to the credentials for other individual logins to use the HSM device.
Why go through this?
Speed is one since these devices are optimized for encryption/decryption routines, but also because your server's CPU isn't being spent with these math-intensive operations. These devices also can have their own backup routines for keys, so you can possibly prevent sysadmins from being able to access data.
One thing I'd say is to go slowly and carefully if you implement one of these devices. Actually, you would never implement one of these devices. The chance of failure or problems is too great. You would always want two of these, keeping one in another location for DR purposes, but perhaps somewhat close in the event of routine errors.
This is an interesting addition to the encryption and security framework of SQL Server. I expect to see more vendors build HSM devices in the near future.
One of the interesting things about Transparent Data Encryption is that it tries to ensure that your data, whenever it is written to disk, is protected. That means that in addition to your data files (mdf, ndf), the log file is also encrypted.
Does this mean you need to enable TDE when you create a database?
You can, but it's not required. If you do this later, how does TDE impact your log?
The rough overview is that the system starts to encrypt the log, and tempdb, from that point onward. Any data that gets written after that time is encrypted, however existing data in the log is not encrypted.
It's not likely that you'll be storing information in a log that you need to worry about and then later implementing encryption, but you ought to be aware of this fact, especially if you are encrypting for some regulatory reason. The safe thing is to run a log backup after you've enabled encryption to be sure that everything is completely encrypted.
A hash is a computation that transforms one set of data into another (hopefully smaller) set of data. So a hash on your 2,000 character blog post should generate a smaller, 10-20 byte value. In doing that, obviously there are many more possible 2,000 character sets of data then there are 10 or 20 byte sets of data. Most of those don't make sense since they would be random scrambling of character values, but they still exist.
When two or more large set sets of data were to generate the same 20byte hash, we have a collision. And collisions are bad.
Why?
The short answer is that you then can't be sure that the source data the 2,000 character strings are equivalent. Their hashes are, but they aren't. They could be, and that's one thing that you are usually using a has to check. For example, one common use of hashing is to detect if data has changed. You might calculate a hash from your application, and one from the database and compare them. If they're the same, you assume nothing has changed.
Why do you do this? It is often much less resource intensive than comparing all fields, or sending a large amount of data to (or from) the server. You could more easily send a 20 byte hash than a 2,000 byte chunk of data.
The HASHBYTES function in SQL Server can be used to generate hashes with a few algorithms, and SHA-1 is the best one to use. It has a low chance of collisions and is considered fairly secure by the cryptographic community. Not great, and it's been superseded by SHA-2, but it should work for most data detection changes.
However there are a few other functions have been supported for some time. There are CHECKSUM, BINARY_CHECKSUM, and CHECKSUM_AGG that exist. You shouldn't use these because of a high chance of collisions. I'll repeat that.
Don't use these!
If you are detecting data changes, consider this code:
select 'LE' 'String', checksum('le') 'CheckSumHash'
It returns
Now, consider that you run this:
select 'AAAAAAAAAAAAAAAALE' 'String', checksum('AAAAAAAAAAAAAAAALE') 'CheckSumHash'
you get this:
Not exactly what you were expecting!
The issue is that there are changes of collisions with these functions, perhaps too high to take the chance of two values being incorrectly compared. I would be very careful about using these functions, and if you do, be sure that you note this potential issue to all support people. If the application appears to be missing changes because of a checksum, be sure you do not have any type of collision taking place.
You have a few options for protecting your SQL Server data at rest. So far I haven't seen anything thing protects data in memory, and I'm not sure we ever will. After all, at some point it needs to be processed, joined, etc., and that can't happen if it's well encrypted.
As I've been working through the encryption book I'm tech editing, I was thinking that Transparent Data Encryption (TDE) was the only native way to protect your data files. It had slipped me mind that there are two other Windows solutions, which will work with SQL Server.
The Encrypting File System (EFS) and Bitlocker are both part of Windows, an can be used to secure your data files at rest.
EFS requires that you take databases offline and then encrypt the files (or folders) before bringing them back online. Bitlocker will encrypt an entire volume on your Windows machine, and while you can still use Windows while it's working, it will be slow.
For SQL Server should you use these?
The answer is always it depends. Both of these will throw a hit on your system, and if that's a problem, you have to weight the issues of that against the protection you get. For a server or workstation, I'm not sure it's a big deal. For laptops I'd recommend Bitlocker as a solution since you really should protect all your files, not just SQL Server files.
I know many people might feel if they protect their data files they've covered the important things, but I'm not sure I agree. I've seen too many people cut and paste, export, or copy files for importing onto their machines.
And often this is the data you might really want to be sure is protected.
As I'm digging more into SQL Server 2008 encryption I'm learning some interesting things about how SQL Server handles encryption. Recently I was doing some research on the HASHBYTES function, which performs a one-way hash on data. This is useful in terms of validating secure passwords or determining is some data has changed. You do this by comparing the hash values, without necessarily needing to compare the actual data.
As I was reading about this function and researching the SHA-1 algorithm (one of a couple that can be used), I was surprised to find out that while the SHA-1 algorithm can handle 2^64 - 1 bytes of data, and SQL Server's text 2^32-1 bytes, this function is limited to 8,000 bytes of data.
I could understand back when we had 2k page sizes, or even the first evolution of 8k pages that not all functions would handle more text data, but isn't time in SQL Server 2008 that all functions that can operate on text data can handle more than 8k worth of data?
It seems silly these days, with more and more data being stored in SQL Server, and larger and larger sizes, wouldn't there be the need for functions to work with more than 8k of data?
Now HASHBYTES works on varbinary data, so character data gets converted (or needs to be), but there are definitely places where you might want to use this with binary data as well. Consider someone storing audio files in SQL Server, which definitely might exceed 8kb. Imagine that the end of a recording is changed, something beyond the 8kb point. If two hashes were compared, they would be the same since this function truncates the data at 8kb.
What if someone wanted to use hashing to audit this data for changes?
It's time that SQL Server remove the 8k limits for data throughout the product. I'm hoping that SQL 11, or SQL 12 for sure, will do this. There are ways around this now (work in 8k chunks), but that's cumbersome and unnecessary. If we can do it with a UDF and a series of calls, can't the engine do it easier?
In the meantime, if you use hashing functions, be aware of this limitation.
I've been tech editing a book about SQL Server encryption and this series of blog posts looks at some interesting things I've found when going through the book.
The Service Master Key (SMK) in SQL Server is the root of all encryption operations. It's the key that's used to encrypt all other keys below it, or at least encrypt those keys that are used to encrypt something else.
Surprisingly, there's no CREATE SERVICE MASTER KEY command. There is an ALTER SERVICE MASTER KEY command, which is used to work with this key.
Instead, the Service Master key is automatically created when you enable certain operations. For example, if you create a Database Master Key (DMK), if there is no SMK, it is automatically generated. It is then used to encrypt the DMK on the instance. If you enable Transparent Data Encryption (TDE), then an SMK is created if it does not exist. If you create a linked server, an SMK is needed to encrypt the password.
There are a few other operations as well that create the SMK, but the important thing for you to understand is that this key is at the root of all encryption, and needs to be protected. That means backup of this key, and secure storage. There is a Backup Service Master Key command, and you need to make sure you use this and store a copy of this somewhere that you can get to it in a DR situation. Putting it on the root of the instance host might not be a secure place, and honestly I think you need some secure place that you centralize all of your keys so that they are available along with, but separate from, the backups.
Inside the instance, or on the host, the SMK is protected by the Windows DPAPI mechanism, which should be secure. If it's not, then likely nothing is secure in the computer.
Other posts:
We're DBAs, right? We're concerned with data, with stability, and perhaps most importantly, backing up and restoring our data. We expect to be able to do this, and at a fairly granular level. I can back up a table or a view (through bcp), a stored procedure (scripting), a file group, or a database.
As I was checking facts in the encryption book I stumbled upon this caveat. You can't back up an asymmetric key!
There's not even a backup asymmetric key command in the T-SQL reference!
I was stunned and looked all around to try and figure this out. After all, I can create an asymmetric key from a file. You use some other program to create the key files, like sn.exe from the Visual Studio resource kit, but you apparently can't get this key back out of SQL Server once it's in there.
It is in a database backup and restore, so that's good, but if I wanted to use this key in another database, or perhaps more importantly, I wanted to put the public key in one database, say an order entry system where I take credit cards in and encrypt them with this key. I don't have the private key, but I move that key to a second system, which the order entry people don't have access to. I then transfer the encrypted data between systems and can decrypt it with the private key on the second system.
If I've created my asymmetric keys in SQL Server, I can't easily do this. Can't do it for DR if I need to get data decrypted, just have to be extra sure I have a good database backup and I can use it to decrypt the data.
There's a backup certificate command, a backup master key command, and even a backup service master key command (though strangely no create service master key command), but no backup asymmetric key command.
I think this is an oversight in SQL Server encryption (as is the lack of CREATE SERVICE MASTER KEY) and ought to be corrected. Hopefully it will in the future. In the meantime, I would not recommend you creating asymmetric keys in SQL Server. Instead, create them in Visual Studio and then protect the files.
Transparent Data Encryption (TDE) sounded like a really cool feature when I first heard of it. Encrypt your database without any changes to your application. It seemed too good to be true.
And it might be.
It's another feature that is well marketed, but really addresses only certain issues. In this case it addresses the issues of getting to your database on disk, and in some cases, preventing access to the data even if you have physical access to the server.
That's pretty cool. Someone could have access to your server, say in some data center, and if you've secured SQL Server, even if they were a local administrator, you potentially could have your data safe from access. The physical files are encrypted, so they can't be read.
Or can they?
As I dig through various documentation, trying to understand how it works for the book I'm tech editing, it seems that most things are covered. The log is encrypted, tempdb encrypted, mirrored databases encrypted, it's a good system.
However I did find one potential place where it could be bypassed. Now this wouldn't be for your everyday bad guy. We'd need a smart hacker, but I bet there's one out there.
First, the data is decrypted when it hits the buffer pool, so in memory it's decrypted, allowing indexes and other normal SQL Server operations to work with the data. However this means there are two places where this data can get to disk.
I don't know how big an issue this is, but I think a smart hacker could cause either of these things easier than poking through memory and reading what's going on. And if they could do this after somehow ensuring critical data was in memory, so through some normal interface to your application, they could potentially get to the data.
It's not a big hole, but it's a hole. I'm not sure how you get around something like this until you get a whole SQL Server OS that's a specialized version of Windows and encrypts everything.
In the meantime, I wouldn't worry about these holes. I'd worry more about keeping your keys backed up and safe!
I've been tech editing an encryption book for SQL Server. It's long overdue, and I think it will be a nice addition to any DBA's library. It also addresses a complicated subject that Books Online has inadequately covered. In fact, they've done an incredibly poor job with just documenting encryption, much less trying to help explain how to use it.
If you're not familiar with asymmetric keys, they are a set of keys that are paired. One is used to encrypt data and the other is used to decrypt data. Often these are referred to as the public key and private key. The public key you disclose to other people and keep the private one, well, private. If someone encrypts data with your public key, only you can decrypt it with the private key. Not even the person that encrypted the data and decrypt it!
The reverse happens when you encrypt something with your private key. You can't decrypt it with that key, though you can decrypt it with your public key, which presumably you have access to. However the fact that your public key decrypts something proves you encrypted it, which is how a digital signature works.
SQL Server 2008 supports two types of asymmetric keys: standard asymmetric keys, and certificates. In case you were not aware, those certificates that you buy for your SSL web sites, or for other security functions, are asymmetric keys. The difference is that certificates have some other meta data, like expiration dates, that make them very handy.
One downside of asymmetric encryption is that it's processor intensive, and it takes time. In fact, it's much, much longer to encrypt (or decrypt) data with an asymmetric key, so they aren't supposed to be used to encrypt data.
Instead you use them to encrypt symmetric keys, and then use those keys for encrypting/decrypting your data.
This is an interesting book, being written by MVP Michael Coles, and I'm looking forward to digging in further.