Datatype performance - in theory there should be a difference, but in practice is there?

  • pcd_au (8/2/2016)


    Frankly if the query is going to take 3 minutes to run and 4 nano seconds to compile who cares about the compilation.

    In that case, indeed, who cares (though it's more like 3ms to compile, not 3 nanoseconds). Now what about the query that runs 20 times a second. Want a 3ms overhead on every single execution there?

    You say that 3-4x reduction in size isn't important, but it becomes important when you get to lots of rows. Let's hypothesise a reduction in row size of 20 bytes due to data types on a 100 million row table (small-medium table these days). That's 2GB, potentially per index, which could be the difference between the working set fitting in memory or the system having to churn off disks during normal operation, and there's a large speed difference between memory and disks, even if they are SSDs.

    Size overhead, compilation overhead, neither matter at the small scale, both become critical when you get to the high volumes and very large sizes.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • pcd_au (8/2/2016)


    Eirikur Eiriksson (8/2/2016)

    You will have to understand how the SQL Server works, which workload is expensive and which factors bear more weight than others, data sizes and IO do have a very sizeable impact on performance.. Narrower data sets, narrower indices etc. do improve the performance to a greater proportion than the actual data size, especially when working on large analytical sets like DWs. I guess your lack of interest in the data size subject is portraying an ignorance on the overall performance subject.

    😎

    Interesting observation about my 'lack of interest in the data size subject', given you know nothing whatsoever about my background. I'll leave it at 'I started working with SQL Server in 1989, and have used it just about every working day since then.'. It's precisely my lack of ignorance about the subject that has promoted this question. Applications I develop tend to not have performance issues because I do understand the internal mechanisms of SQL Server, data modelling, indexing and performance tuning. It's the client that is making these requests and I'm looking for evidence that these things are important. Hence I'm reaching out to people such as yourself who have experience in developing sub optimal systems to get some metrics that I can take back to the client on why we should NOT restrict ourselves to a small subset of data types.

    You input is much appreciated.

    pcd

    Not an observation but a direct response to your previous reply;-)

    I'm not interested in '3-4 times reduction in size', but I am interested in 'large performance improvements by using the correct data types' - do you have any more info on this bit?

    I have done this kind of optimization work many times and recently a reduction of a large data set to roughly 1/4 of the initial size, the performance improvement was 7-800 times improvement in the average response time.

    😎

  • Eirikur Eiriksson (8/2/2016)


    pcd_au (8/2/2016)


    Eirikur Eiriksson (8/2/2016)

    You will have to understand how the SQL Server works, which workload is expensive and which factors bear more weight than others, data sizes and IO do have a very sizeable impact on performance.. Narrower data sets, narrower indices etc. do improve the performance to a greater proportion than the actual data size, especially when working on large analytical sets like DWs. I guess your lack of interest in the data size subject is portraying an ignorance on the overall performance subject.

    😎

    Interesting observation about my 'lack of interest in the data size subject', given you know nothing whatsoever about my background. I'll leave it at 'I started working with SQL Server in 1989, and have used it just about every working day since then.'. It's precisely my lack of ignorance about the subject that has promoted this question. Applications I develop tend to not have performance issues because I do understand the internal mechanisms of SQL Server, data modelling, indexing and performance tuning. It's the client that is making these requests and I'm looking for evidence that these things are important. Hence I'm reaching out to people such as yourself who have experience in developing sub optimal systems to get some metrics that I can take back to the client on why we should NOT restrict ourselves to a small subset of data types.

    You input is much appreciated.

    pcd

    Not an observation but a direct response to your previous reply;-)

    I'm not interested in '3-4 times reduction in size', but I am interested in 'large performance improvements by using the correct data types' - do you have any more info on this bit?

    I have done this kind of optimization work many times and recently a reduction of a large data set to roughly 1/4 of the initial size, the performance improvement was 7-800 times improvement in the average response time.

    😎

    Ditto that.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I can't speak for anyone else but here are my thoughts and a couple or personal observations on the subject.

    On the original subject of "does it really matter", the answer is a very profound "YES"!. When I first reported to my current job, there were multiple 10 minute timeouts each day due to "system paralysis", screen performance was greater than 5 seconds (sometimes much greater), and about half of the problems were solved just by paying attention to data types and making them match.

    I agree that, by itself, fixing a 100 ms proc or query so that it runs in less than 10 ms doesn't sound like much until you realize that tens of thousands of them exist in query cache and they're all used every day sometimes millions of times each in just 8 hours. Then realize that the key to solving many of the blocking issues is to "just" make things run faster.

    Yes, today's hardware is a bloody miracle. "Just" by changing to a new server with some fancy SSDs, faster hard drives, and doubling the number of CPUs, we were able to double the performance of most code. Heh... as my grand kids might say, "Big wow, what a let down". That still pales in comparison to what we were able to do by paying attention to data type matching and making code SARGable, which is frequently dependent on data type matching. In many cases, the 7-800X improvement that Eirikur mentioned, which is very significant, was small change compared to some improvements that we realized, which resulted in thousands of times improvement with a couple of rare birds that experienced nearly a million times improvement.

    And no... it wasn't just data type matching that we did. That helped a lot but it's not a panacea. It's only the first necessary step in a much larger picture but, if you leave it out, the much larger picture doesn't have much of a chance to be resolved.

    Ignore read reduction if you want. I won't. We've reduced reads in the normal 8 hour business day to the tune of 150 Tera Bytes of IO just by paying attention to mostly small stuff like data type matching and SARGability. Blocking is now virtually non-existent in the system, system "paralysis" outages are unheard of now, and most screen times are sub-second instead of over 5 seconds. The only ones that aren't are for reporting purposes and we've drastically reduced those times as well.

    The really bad part of it all was that it only took the previous regime of "expert 'developers'" less than a year to mess all of that up with incorrect and mismatched data types. It has taken us years to fix it because they also screwed the front end down on the same kinds of problems. Make a change, fix several things it affected and some of those fixes affected other things that needed to be changed and over and over and over...

    Heh... and then people have the nerve to say it's the fault of the database.

    When considering such "small" things like data type matching and "right sizing", you might think you're working on a grain of sand. The problem is that if all the grains of sand have the same problem, then you have a pretty well messed up beach to fix one grain of sand at a time. To borrow and warp a phrase, your really should sweat the small stuff and... it's all small stuff. :w00t: You can't afford to get the small stuff wrong.

    Rant complete... regard all further alarms.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff

    I guess we have all had our wins of note. I think my best was 4h+ to under 13minutes.

    I also don't deny the importance of squeezing all available performance with transactional systems and high frequency stored procedures. However my original question did (try to) restrict answers to data warehousing style applications.

    Fortunately that role came to and end, and I'm now working somewhere that has a more balanced view on how to do things. The old place (finally) admitted that their approach was wrong and are now trying a different tack. Datatypes were not the only bad design decision they made - there were a number.

    pcd

  • pcd_au (9/26/2016)


    However my original question did (try to) restrict answers to data warehousing style applications.

    Understood that from the beginning. A data warehouse application isn't much different than anything else when it comes to performance. One of the purposes of a data warehouse is performance. Just because some of the tables are denomalized or pre-aggregated for performance, it doesn't change some of the basic rules of performance. Using the correct data types and matching them between tables is still one of the important precursors to performance.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply