Balancing Performance and Readability

  • Comments posted to this topic are about the item Balancing Performance and Readability

  • Thanks for the article, and for getting us thinking about code performance and readability.

    I was struck by something you said:

    By doing everything in a single statement, we eliminate the two performance complaints mentioned above.

    Often I encounter the opposite:  It seems that complex queries that I write often see a performance benefit  from being broken down into smaller pieces.  Readability sometimes improves as well, but other times suffers as a result.

    For example, today I was working on a query that joined a number of tables, and then joined 7 derived table.  It was readable, and seemed like it should have performed fairly well.  But performance turned out to be unacceptable.

    I rewrote this query to create a temporary table, insert from the joined tables, and then perform 7 separate updates (doing the work in the derived tables in separate queries instead of in one big joined query).  Performance increased dramatically:  say from 30-40 seconds down to <5 seconds.  This pattern of materializing resultsets early is one that I begrudgingly am forced to do more often than I would like.

    "Premature optimization is the root of all evil"...and is particularly hard on readability.  Yet sometimes optimization is needed, and sacrificing elegance and readability is a necessary cost.

    On the other hand, some might find a bunch of separate simple updates to a temp table to be more readable than a big query with lots of derived tables.  (To. me. it. seems. too. choppy. to. be. optimally. readable.)

    I guess my points are that 1) a single query is not necessarily better than multiple queries, and 2) to a certain extent, readability is in the eye of the beholder.

  • Great article Mr Pollock.

    Well written SQL code is often an over looked art and solving the problem is the primary in the developers mind, as you'd expect. But I feel many developers sign-off once the primary objective has been solved. This is often due to time pressure being applied by stakeholders. 
    Taking another 30mins to refactor, componentize and comment code will pay dividends for the future you if you have to perform an urgent bug fix on the code. Or help a colleague who now has to extend, optimize or fix your code. 

    Previously I wrote an article which approached the same subject and asked developers to think a little deeper regarding code design. Please take a look, I'd be interested to hear any feedback you may have.
    http://www.sqlservercentral.com/articles/Query+Tuning/159654/

  • Thanks for your response, David.  You are absolutely right that a single query is not *always* better performance-wise than other queries, because the execution plan can surprise you.  Without knowing your specifics, it's hard to say why you got the findings you did.  I appreciate the feedback, and definitely agree that the database engine has a mind of its own sometimes.

  • Absolutely, Mr. Strange.  The obstacle of "complete" code.  My best approach is to have a talk about code standards within your organization to develop guidelines before-hand.  My approaches to any code review is to not say things like "your code should do..." or "your code needs...", because we're all people with pride and scope creep is a very real problem.  Developers, and engineers in general are not known for their "feeling" skills--they are given a problem and work toward solving it; seeing additional requirements for style can be seen as you piling on code--even if you're right to do so.

    My practice is to adopt more of a RAD approach to coding, where the developer has a specific task with a clear definition of done.  This definition of done should consider development standards and practices, and developers know what's expected before they begin work.  Then any additional design concerns are a *new* problem instead of tagging on to what the developer originally was asked to do so they're more open to feedback.  Finally, I would make note of developers who are regularly not doing these improvements and recommending them for additional training.

  • Great article. Instead of readability, I recommend maintainability. A system that works but can't be changed is a disaster. When I design a system, I try to focus on maintainability, not performance (of course, try very hard not to do anything rbar-stupid). As the system starts to run, identify performance bottlenecks and fix those. This is usually a pretty good strategy for architecting a functional, maintainable system.

  • Interesting article which brought up a great point about things to consider when writing code.

    But you need to be fair to the "combined" code version.  So let me give an alternative version.  It can be just as easy to write (and to me, easier to maintain and understand, since all data is together) as the step-by-step (i.e. less set-based) version, and it will perform better. 

    I've used CROSS APPLYs, but INNER JOINs could be used instead to get aggregated values.  To choose, naturally you'd review the query plan to see if either one was significantly better than the other.

    Edit: Removed the "INSERT INTO dbo.EmployeeReview" as I don't need a work table for this query.


      SELECT E.ID AS EmployeeID
        ,E.[First] AS FirstName
        ,E.[Last] AS LastName
        ,E.StartDate AS HireDate
        ,E.Pay AS Salary
        ,EPH2.MaxSalary
        ,CASE
         WHEN alias1.YearsWorked >= 10 AND EPH2.MaxSalary > 100000
         THEN 'Executive'
         ELSE 'Basic' END AS Retirement
        ,EPC2.PaidToDate
      FROM dbo.Employees E
      CROSS APPLY (  
          SELECT MAX(EPH.Pay) AS MaxSalary
          FROM dbo.EmployeePayHistory EPH
          WHERE EPH.ID=E.ID
      ) AS EPH2
      CROSS APPLY (
          SELECT SUM(EPC.Amount) AS PaidToDate
          FROM dbo.EmployeePayChecks EPC
          WHERE EPC.ID=E.ID
    ) AS EPC2
      CROSS APPLY (
          /* assign alias names to calc'd values to simplify/"de-clutter" the main SELECT code */
          SELECT DATEDIFF(YEAR, E.HireDate, GETDATE()) AS YearsWorked
      ) AS alias1
      WHERE E.Active=1
          /*AND EPH2.MaxSal=1 ??I have no idea what this does or why it was deemed needed in the code.??*/

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher - Tuesday, March 20, 2018 8:12 AM

    Interesting article which brought up a great point about things to consider when writing code.

    But you need to be fair to the "combined" code version.  So let me give an alternative version.  It can be just as easy to write (and to me, easier to maintain and understand, since all data is together) as the step-by-step (i.e. less set-based) version, and it will perform better. 

    I've used CROSS APPLYs, but INNER JOINs could be used instead to get aggregated values.  To choose, naturally you'd review the query plan to see if either one was significantly better than the other.


    INSERT INTO dbo.EmployeeReview( EmployeeID, FirstName, LastName, HireDate, Salary, MaxSalary, Retirement )
    SELECT E.ID AS EmployeeID
        ,E.[First] AS FirstName
        ,E.[Last] AS LastName
        ,E.StartDate AS HireDate
        ,E.Pay AS Salary
        ,EPH2.MaxSalary
        ,CASE
         WHEN alias1.YearsWorked >= 10 AND EPH2.MaxSalary > 100000
         THEN 'Executive'
         ELSE 'Basic' END AS Retirement
        ,EPC2.PaidToDate
      FROM dbo.Employees E
      CROSS APPLY (  
          SELECT MAX(EPH.Pay) AS MaxSalary
          FROM dbo.EmployeePayHistory EPH
          WHERE EPH.ID=E.ID
      ) AS EPH2
      CROSS APPLY (
          SELECT SUM(EPC.Amount) AS PaidToDate
          FROM dbo.EmployeePayChecks EPC
          WHERE EPC.ID=E.ID
    ) AS EPC2
      CROSS APPLY (
          /* assign alias names to calc'd values to simplify/"de-clutter" the main SELECT code */
          SELECT DATEDIFF(YEAR, E.HireDate, GETDATE()) AS YearsWorked
      ) AS alias1
      WHERE E.Active=1
          /*AND EPH2.MaxSal=1 ??I have no idea what this does or why it's deemed needed in the code.??*/

    Just to throw this out there, set-based does not always mean done in a single query.  Sometimes divide and conquer is the way to go.  Each should be considered and the best solution selected.

  • Lynn Pettis - Tuesday, March 20, 2018 8:36 AM

    ScottPletcher - Tuesday, March 20, 2018 8:12 AM

    Interesting article which brought up a great point about things to consider when writing code.

    But you need to be fair to the "combined" code version.  So let me give an alternative version.  It can be just as easy to write (and to me, easier to maintain and understand, since all data is together) as the step-by-step (i.e. less set-based) version, and it will perform better. 

    I've used CROSS APPLYs, but INNER JOINs could be used instead to get aggregated values.  To choose, naturally you'd review the query plan to see if either one was significantly better than the other.


    INSERT INTO dbo.EmployeeReview( EmployeeID, FirstName, LastName, HireDate, Salary, MaxSalary, Retirement )
    SELECT E.ID AS EmployeeID
        ,E.[First] AS FirstName
        ,E.[Last] AS LastName
        ,E.StartDate AS HireDate
        ,E.Pay AS Salary
        ,EPH2.MaxSalary
        ,CASE
         WHEN alias1.YearsWorked >= 10 AND EPH2.MaxSalary > 100000
         THEN 'Executive'
         ELSE 'Basic' END AS Retirement
        ,EPC2.PaidToDate
      FROM dbo.Employees E
      CROSS APPLY (  
          SELECT MAX(EPH.Pay) AS MaxSalary
          FROM dbo.EmployeePayHistory EPH
          WHERE EPH.ID=E.ID
      ) AS EPH2
      CROSS APPLY (
          SELECT SUM(EPC.Amount) AS PaidToDate
          FROM dbo.EmployeePayChecks EPC
          WHERE EPC.ID=E.ID
    ) AS EPC2
      CROSS APPLY (
          /* assign alias names to calc'd values to simplify/"de-clutter" the main SELECT code */
          SELECT DATEDIFF(YEAR, E.HireDate, GETDATE()) AS YearsWorked
      ) AS alias1
      WHERE E.Active=1
          /*AND EPH2.MaxSal=1 ??I have no idea what this does or why it's deemed needed in the code.??*/

    Just to throw this out there, set-based does not always mean done in a single query.  Sometimes divide and conquer is the way to go.  Each should be considered and the best solution selected.

    Unquestionably in relatively rare cases, dividing steps out makes more sense for performance.  So that can definitely be the proper approach.  But isn't that still, by definition, less "set-based" (given that the single-query code is properly written)?

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • I think it is interesting to note how quickly the discussion between two DBAs went off-topic. DBAs by personality are detail oriented, which is a great thing for maintaining databases. However, this strength can become a hindrance when the goal is readability. When readability is the goal, I think that most DBAs could find some benefit in making the effort to stay on that task.

  • GeorgeCopeland - Tuesday, March 20, 2018 9:52 AM

    I think it is interesting to note how quickly the discussion between two DBAs went off-topic. DBAs by personality are detail oriented, which is a great thing for maintaining databases. However, this strength can become a hindrance when the goal is readability. When readability is the goal, I think that most DBAs could find some benefit in making the effort to stay on that task.

    This on a site where we routinely go off on tangents.

  • Thanks for all your feedback.  I intend to use all these comments to better not only my understanding on this topic, but also further articles.  Most of my work revolves around data warehousing, where I feel that making code more readable improves maintainability, and it's good to hear all your struggles with this as well.

    I also like the back-and-forth between the DBAs.  It's only a matter of time before a developer gets a nasty email from the dba for their code, so I consider it a victory if I can convince you two that a balance can be struck.🙂

  • GeorgeCopeland - Tuesday, March 20, 2018 9:52 AM

    I think it is interesting to note how quickly the discussion between two DBAs went off-topic. DBAs by personality are detail oriented, which is a great thing for maintaining databases. However, this strength can become a hindrance when the goal is readability. When readability is the goal, I think that most DBAs could find some benefit in making the effort to stay on that task.

    I honestly find the combined code in this case much more readable and understandable.

    As to performance, I'm currently in the process of rewriting step-by-step code that takes hours and reduced some of it to 1.5 mins.  A big potential issue is when adding or lengthening string columns in subsequent UPDATEs, row splits are required.  That really hurts performance.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply