Compression

  • Mark

    I'm not sure what your point is. The question is about compressed databases (not compressed disks), as was my reply. I apologise if I didn't word mine clearly.

    I took the question, as I think most did, on the balance of probabilities. If a database is compressed, it probably means all rows or pages are. And even if only some are, you're still likely to get more backup compression on a database that hasn't had any rows or pages compressed.

    When you say "crapshoot", do you mean a poor question, and not a risky enterprise, unpredictable venture or gamble, as a dictionary I've just looked it up in defines it? (I hope I don't sound pedantic there - I'm just checking that you mean what I think you mean.)

    John

  • I see this as a "it depends" situation, like pretty much everything in the DB world. My experience is with a MS-Dynamics AX database, creating copies for a test system (where I have to compress the DB to fit onto the test system's limited disk). I do this compress/backup routine about every 1-2 months and I get similar results every time.

    Uncompressed: online 450GB, backed up with compression 46GB

    With Page compression: online 67GB, backed up with compression 34GB

    So from the standpoint of my testing, the right answer is been "compressed backup of the compressed DB is smaller than the uncompressed DB".

    "It depends" is never the wrong answer... It's just the answer that decision makers hate to hear.

  • The question requires you to make some assumptions (the compressed database is wholly compressed, and the data is basically the same in both databases, for instance).

    The answer is based on a test carried out by the questioner, so is hardly scientific.

    It would also be reasonable to assume that data compression on the DB and in the backups are broadly equivalent in effectiveness and hence compression ratio achieved. Therefore it is reasonable to think that the two backups would be similar in size (note we do not have a definition of similar in the question).

    Made me think though so thanks.

  • Upon first reading the question, my reaction was also "Well, it obviously depends on information not given in the question."

    Then I realized that wasn't an answer, and went with the most likely to be correct, which turned out to be the expected answer.

    It definitely wasn't a "perfect" question since it forced you to make several assumptions, but I actually don't mind the occasional imperfect question. After all, in our jobs as DBAs, we always work with precisely defined conditions of success, and the absolutely correct option is always available to us...oh, wait, that's not quite right 🙂

  • Hello IgorMy,

    My wife has come back from Canada this morning ( 9:00 AM Paris time ) with a gift which could close that discussion as , at a first glance , I was going wrong.

    That's a book :

    Microsoft SQL Server 2012 Internals from Kalen Delaney, Bob Beauchemin, Paul S. Randall , Jonathan ( he knows that I have never succeeded to spell his name correctly ) , Cuningham ( sorry for him I have forgotten his 1st name ).l

    I have begun to read it and I found an explanation at the bottom of the page 443 in the part Backup compression ( Chapter 8 ).t

    If you have it , I would suggest you to have a look at this page : after a first read , I am thinking that you are right as it is explained that the compression of a backup is acting in a different way that the row or page compression used for the compression of a database.

    To understand everything , I will have to read from the beginning up to the page 443 and I will need time... ( as the English language is not often my friend ). But I go on to think that the explanations are not evident to understand ( especially that I had up to now only the version related to SQL Server 2005 ==> many changes have appeared and I will have to modify some of my SMO programs I have written for SQL Server 2005 , 2008 , 2008 R2 and 2012. For 2014 , I give up for the moment and it is a version too near in time compared 2012 )

    If you have some time to waste ( no sure as time is money ) , I would suggest you to build your "50 GB" to extract around half of rows from your 123 GB database. Maybe you should obtain the same structure of the database ( logical ) but with similar data. So the comparison should be more valuable ( I am not thinking that your test is not clever , but I faced a similar problem , not of compression , when I had to compare speed and storage problems between 2 versions of OS with each a different version for the database engine a long time ago more exactly 25 and I was alone as system manager, DBA and developer : so if I was doing an error , I was the only guilty man... an useful school which obliges you to be strict in your tests..).

    I think that I will post when I will finish the whole book ( but 900 pages.. )

    Have a nice day

  • I'd be interested to know what your rationale was for going for the answer you did.

    Can you explain please? I just want to know whether my assumptions were different. Did I misss something obvious?

    Thanks

  • @ipounder,

    Well, my line of reasoning was as follows:

    1) Backup compression of an already compressed database yields very little data reduction, as has been pointed out.

    2) One of the databases was 50 GB AFTER data compression (whether row or page, we don't know), while the other was 50 GB with no compression.

    3) To use the economists' favorite phrase, ceteris paribus this means the database with no compression yet applied will be smaller after backup compression, since the already compressed database will receive very little further data reduction (point 1).

    As has been pointed out earlier in this thread, this still requires some assumptions about the type of data in the uncompressed database, since if the data is not amenable to compression, the backups might end up being of similar size.

    All else being equal, though, in the majority of cases the compressed backup of the as-yet uncompressed data will be smaller.

  • ipounder (1/30/2014)


    The question requires you to make some assumptions (the compressed database is wholly compressed, and the data is basically the same in both databases, for instance).

    The answer is based on a test carried out by the questioner, so is hardly scientific.

    It would also be reasonable to assume that data compression on the DB and in the backups are broadly equivalent in effectiveness and hence compression ratio achieved. Therefore it is reasonable to think that the two backups would be similar in size (note we do not have a definition of similar in the question).

    Made me think though so thanks.

    The actual correct answer is D) If the schema and contents are identical in the uncompressed and compressed databases except for compression, then the compressed backup of the uncompressed database will be no larger than the compressed backup of the compressed database.

    It depends very much on the type of data - if it's 100% nonrepeating GUIDs or nonrepeating JPG's in binary columns, then none of it compresses well, and the backups are about the same size. If the data is highly compressible, then yes, the OP's answer is correct.

    Note that database compression and backup compression are NOT broadly equivalent in effectiveness - "database compression" is done either at an 8kB page level, or at a row level, which is more limited than the streaming or larger-block compression available to a backup process.

  • Thanks for the explanation.

  • Thanks John Mitchell for the explanation

  • simple math isn't it :Wow:

    nice question , thanks for sharing

  • Thank you for the post, very interesting one. IgorMi.

    (

    my current state of knowledge on compression is still fresh [as-in - experiencing for the first time]. I just want to thank you each individual for all your replies which has given a great deal of insights to understand it better.

    @ThomasAbraham - Thanks for the link)

    )

    -//-edit; fixed some typos

    ww; Raghu
    --
    The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.

  • Well I went for the answer that the OP was clearly wanting rather than the correct answer i.e.

    D) Any of the above.

    In most cases where the data in both databases is similarly compreesible then the backup of B will be smaller.

    However if database B is non compressible and the compression algorithm used for the database compression and the backup compression is the same then both backups will be similar in size.

    And a more contrived possibility - If database A is highly compressible but has been compressed using an inefficient algorithm while database B is non compressible then the backup of database A will be the smaller.

  • Transparent Data Encryption is another factor that can greatly reduce the efficacy of backup compression.

    - Jeff

  • It's just common sense, plus getting ones fingers to it the intended option. I failed on the second point, which make we feel very careless.

    Nadrek (1/30/2014)


    The actual correct answer is D) If the schema and contents are identical in the uncompressed and compressed databases except for compression, then the compressed backup of the uncompressed database will be no larger than the compressed backup of the compressed database.

    That's a very strange case. You are looking at the case where the schema and content are identical in the two databases, and the two databases are the same size although one is compressed and the other is not - in other words, these two databases are actually two coipes of one database which is not compressible by the algorithms used by SQL Server for data compresion. Obviously if the two databases are identical the two compressed backups will also be identical, but the case where the two databses are identical that you have used to suggest a new option D as the answer is just not something that ought to be considered - it's a reasonable expectation that a question about two databases isn't referring to two identical copies of the same database.

    Tom

Viewing 15 posts - 16 through 30 (of 37 total)

You must be logged in to reply to this topic. Login to reply