Compression

  • I don't understand the question, and I think it's not even a question at all.

    So we have two databases "with size of 50GB". Is that the size of the data file(s)? The log file(s)? Data and log combined? The allocated size? Or the used size? Measured in what way?

    Further, as I believe was pointed out already, it depends on the contents of the database. If I fill a 50GB database by repeating REPLICATE('a', 4000) over and over again, it will compress a lot better than, for instance, a database that stores a bunch of JPEG files in varbinary(max) columns.

    What interests me a lot more (and maybe someone has already done the research?) is this: given two databases, with identical content but one in compressed tables and one in uncompressed tables - how will the size of their backups compare, either when using or when not using backup compression? (This, too, will probably depend on the actual content of the database, but I believe that in this case it may be possible to draw some generic conclusions).


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • For the case where both sets of compression are performed using the same algorithm (probably a reasonable assumption if SQL server is handling both - definitely not safe to assume if a third party backup system is in use) then database A will consist of data of size Da when backed up this will be compressed to size Da' and a header of size Ha added; database B will consist of data of size Db' and a header of size Hb - when backed up this will see little difference - going to data still of size Db' a compressed header of size Hb' and a new header of size Hb2 - i.e. usually a small increase in size.

    If the algorithms used are different then it may be possible to gain further compression but that will depend heavily on the content.

  • L' Eomot Inversé (2/1/2014)


    Nadrek (1/30/2014)


    The actual correct answer is D) If the schema and contents are identical in the uncompressed and compressed databases except for compression, then the compressed backup of the uncompressed database will be no larger than the compressed backup of the compressed database.

    That's a very strange case. You are looking at the case where the schema and content are identical in the two databases, and the two databases are the same size although one is compressed and the other is not - in other words, these two databases are actually two coipes of one database which is not compressible by the algorithms used by SQL Server for data compresion. Obviously if the two databases are identical the two compressed backups will also be identical, but the case where the two databses are identical that you have used to suggest a new option D as the answer is just not something that ought to be considered - it's a reasonable expectation that a question about two databases isn't referring to two identical copies of the same database.

    I've worked in strange places, and often dealt with multiple copies of the exactly the same database, some of which have SQL Server index compression enabled , and some of which do not. Additionally, on a theory question, I would argue that edge cases are important to understand.

  • Elementary, dear Watson!

    😀

  • I have never checked it but logically seems to go thatw ay only.

  • Learned something here,the difference between a compressed database and a compressed back-up thank you.Just to add something,from my experience back-up compression depends on the database structure and contents.

    “When I hear somebody sigh, ‘Life is hard,’ I am always tempted to ask, ‘Compared to what?’” - Sydney Harris

  • Igor Micev (1/30/2014)


    patricklambin (1/30/2014)


    Hello John Mitchell ,

    Thanks for your reply which is confirming what I was thinking especially about LOB and maybe FILESTREAM ( and what about the FileTables ? )

    Maybe the creator of the question could tell us the amount of memory he/she used for the tests. My own tests were done with 16 GB , but maybe with 64 or 128 GB ( or more ) , the results should be different. I am retired ( 67 years old ) and so I have not the same necessities about memory or disks or processors than a big company. So my tests may be not reflecting the usual state in the "work" world.

    A little general reproach for the creators of this Question of Day : maybe it would be useful to indicate the version(s) of SQL Server which is(are) concerned by the question. It is not the st time I am seeing a possible problem in the "good answer". Even on the forums ( MSDN/TechNet/SQL Server Central ) it is difficult to obtain a full view of his/her SQL Server ( full edition, physical configuration ...).

    Have a nice day

    Hi all, and thanks for the comments.

    I was working on a system with a 32GB RAM, 24 cores, and enough disk space.

    First I made a data (clustered+nonclustered indexes) compression for a database and I reduced its size from about 123GB to 50GB. Then I made a backup of that database. Further I had to make e backup of another database which was 50GB in size, but wasn't compressed. Then I noticed the difference in sizes of the backups (given in the answer of this QotD) for both databases.

    A good remark is the case of databases with LOB data. Honestly I didn't get that in mind. It's difficult to find such scenario. However the reference does not mention a case of database with LOB data.

    Regards,

    IgorMi

    Thanks for your question. The best answer should probably have been "it depends", as it depends on if it is possible to compress the data or not. In the most common scenario it's possible to compress the data, but if you are using column store index, the data would be as compressed as it could be. Another thing that affects the compression is how full the database is, as you could create a 50Gb database and only use a fraction of it. An empty database can be very compressed. 🙂

    /Håkan Winther
    MCITP:Database Developer 2008
    MCTS: SQL Server 2008, Implementation and Maintenance
    MCSE: Data Platform

  • The option does not seem like correct. The compression ratio will depend upon the kind of data is contained in the tables. So if the tables hve LOB then the compression % will be less. For other cases, it will be more.

    Thanks.

Viewing 8 posts - 31 through 37 (of 37 total)

You must be logged in to reply to this topic. Login to reply