Writing the Correct Query is Important

  • Comments posted to this topic are about the item Writing the Correct Query is Important

  • Considering recent events, I think it terribly ironic that you published this particular article at this particular time.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • When you wrote 'I don't want this to be political' my first thought was 'don't blame the business or management' but that's not what you meant.... so I will blame the business and management!

    Some very basic user testing catches these issues out. Either do proper testing or employ a test team. In my experience, finding someone in the business who is a thorough and conscientious tester is a rarity.

    Might I add, it really doesn't help when the latest BI fad is 'self service'. Convincing business that you don't need ETL, data cleansing anymore, just mash all that mess up in the reporting tool!!!

  • Way back, much earlier on in my career we had a series of sql Server Agent Jobs that polled the production DBs every 5 minutes to see if each data source had sent some data and if not to email the system and line of business owners.

    As the clustered index was always on the record creation date in the front facing DBs the impact on service was negligible and the benefits tangible.

    Establishing basic rules for what your daily data should look like isn't hard.  If you want to get more sophisticated then a more complex statistical approach can be taken.

    I can see two problems with this approach, neither of them technical
    1.  The "No business logic in the DB" zealots
    2.  It isn't shiny shiny.

    The combination of the two is incredibly destructive.  The zealots will propose a solution that is architecturally pure involving at least three technologies.  The development cost and operating costs will be such that a prioritisation session will always push it to the bottom of the list, effectively descoping it.

    There is a Facebook MEME doing the rounds that says "Briefly describe your job badly".  A delivery manager friend replied "I apologise for software being late and buggy"!

  • The issue, in my experience, is rarely coming from the queries: "It is the data stupid!" would say uncle Bill.
    You might have created a report several years ago and checked it at the time. But then some smart software dev decides to "repurpose" the fields that your reports capture and slowly but surely your report start spitting out rubbish. There is also the instance where, more classically, people change the fields that were been populated and so with time your report start to be missing data. For example, let's imagine you were reporting from a field called "Ilegal?" to calculate the number of illegal that need to be deported in a particular state. At a later stage someone might  add a new field called "Illegal but allowed to stay in the country until legalised" and not inform the report writer. Now if you only report from only the first column, you will give a misleading figures of people that need to be deported back to their country of origin.
    As a BI dev, it is extremely rare to see SQL queries on reports that are logical wrong. On the other hand, reports tends to become obsolete very quickly because some *smart* managers decide to modify the metrics or the structure of a particular part of the organisation every other months. And so the reports tends to accumulate and forms "strate" - a bit like in geology - until we review them and decomission them.

  • One problem I see from time to time is a business run on the back of hand written ad hoc queries which sometimes go on to "prove" wrong the analysed and verified queries i.e. the business decides that their processes using incorrect data must be correct.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • In one of my last client engagements before I left consulting a client had a daily "sales" report that drove much of the business. (The business was a credit card processing company.) Over time the transaction codes and supported cards had changed but the queries underlying the report had never been updated, and the codes were hardcoded into the views the report used. I dove in and built a new "rules engine" that introduced data tables with the valid codes, so that when things changed they could just change the tables and the report would be correct. I rebuilt the report and verified that it was correct.

    The business side decided not to use it because the numbers were "different". They chose to continue to use incorrect results.

  • So true. The age old craft of writing a query with the correct logic cannot be replaced with a "tool" based approach.

  • nick.mcdermaid - Sunday, January 15, 2017 10:29 PM

    When you wrote 'I don't want this to be political' my first thought was 'don't blame the business or management' but that's not what you meant.... so I will blame the business and management!

    Some very basic user testing catches these issues out. Either do proper testing or employ a test team. In my experience, finding someone in the business who is a thorough and conscientious tester is a rarity.

    Might I add, it really doesn't help when the latest BI fad is 'self service'. Convincing business that you don't need ETL, data cleansing anymore, just mash all that mess up in the reporting tool!!!

    Certainly this is an issue. Good testers are hard to find, but it's also important that once they start testing something, we capture their tests as much as possible in automation. Humans are poor at repeating tests.

    We also need a good set of test data that evolves. It's easy to use test data that includes happy paths and not the anomalies we might get in the real data set.

  • Gary Varga - Monday, January 16, 2017 5:56 AM

    One problem I see from time to time is a business run on the back of hand written ad hoc queries which sometimes go on to "prove" wrong the analysed and verified queries i.e. the business decides that their processes using incorrect data must be correct.

    Very true. This is a problem in all sorts of businesses. Another reason I think good test data sets need to be assembled and maintained.

  • Keep your data clean, and fix your processes to collect needed data.

    How:
    Create data testing procedures and email the failures to people who can get the data corrected.  For example, birth date checks in which the employee would be under 10 years old or over 100 years old.  Anytime an issue is found with the data, include a check for that particular type of data. If necessary, change the process to get correct data entered.  For example, require term codes to be entered for terminated employees.

    Think of it in the same way as running Test Cases in software development.

  • SQL suffers from the same predicament as HTML and JavaScript; it's an accessible high-level language that's widely used by a lot of people, but only a few people actually know how to code it at a professional level. When I'm called in to help resolve a performance issue, it's typical for IT management and application developers to have this prejudicial assumption that the root cause has something to do with the configuration of the database server or network itself. The developers bristle when I suggest that there is something wrong with how the SQL itself is coded. After all, they self evaluate themselves as an 8 out of 10 when it comes to proficiency at SQL. At least initially, they dismiss my advice because they think I'm wasting time by nitpicking their coding style, but when they see how much impact in performance can be achieved simply by modifying a join or removing a user defined function from the where clause, they realize there is so much more to be known.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Sometimes a developer does not fully understand the data and uses the wrong keys for joins.

    Sometimes a non-developer tells a developer what to do, but contains assumptions not valid in a broader context or lacks necessary keys.  The developer does not understand and blindly follows the non-developer.  For example using a hard coded value that "never changes". Or an assumed key that is really only a partial key, like WC Code rather than WC Code + State.

    Sometimes pressure is on to get things done quickly, and so quality suffers (and the developer knows it) and bites them later.

    Sometimes a developer made a mistake due to not really being a good developer.  For example, how many days are there from May 10th to May 15th?  There are 6 days, not 15-10 days.  Another example is not putting parens around an OR condition.

  • Eric M Russell - Monday, January 16, 2017 8:34 AM

    SQL suffers from the same predicament as HTML and JavaScript; it's an accessible high-level language that's widely used by a lot of people, but only a few people actually know how to code it at a professional level. When I'm called in to help resolve a performance issue, it's typical for IT management and application developers to have this prejudicial assumption that the root cause has something to do with the configuration of the database server or network itself. The developers bristle when I suggest that there is something wrong with how the SQL itself is coded. After all, they self evaluate themselves as an 8 out of 10 when it comes to proficiency at SQL. At least initially, they dismiss my advice because they think I'm wasting time by nitpicking their coding style, but when they see how much impact in performance can be achieved simply by modifying a join or removing a user defined function from the where clause, they realize there is so much more to be known.

    And don't get me started on software developers that put GUID everywhere (PK, FK) and set all the other columns as nvarchar(max)! Good luck in creating indexes with that!

  • Kyrilluk - Monday, January 16, 2017 9:03 AM

    Eric M Russell - Monday, January 16, 2017 8:34 AM

    SQL suffers from the same predicament as HTML and JavaScript; it's an accessible high-level language that's widely used by a lot of people, but only a few people actually know how to code it at a professional level. When I'm called in to help resolve a performance issue, it's typical for IT management and application developers to have this prejudicial assumption that the root cause has something to do with the configuration of the database server or network itself. The developers bristle when I suggest that there is something wrong with how the SQL itself is coded. After all, they self evaluate themselves as an 8 out of 10 when it comes to proficiency at SQL. At least initially, they dismiss my advice because they think I'm wasting time by nitpicking their coding style, but when they see how much impact in performance can be achieved simply by modifying a join or removing a user defined function from the where clause, they realize there is so much more to be known.

    And don't get me started on software developers that put GUID everywhere (PK, FK) and set all the other columns as nvarchar(max)! Good luck in creating indexes with that!

    Yeah, GUID keys and entities stored as XML or JSON documents. If that's what they want to do, if storing and retreiving data in that fashion matches the business requirements, then they should be using MongoDB and leave the SQL Server database to do what it does best.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Viewing 15 posts - 1 through 15 (of 42 total)

You must be logged in to reply to this topic. Login to reply