Writing the Correct Query is Important

Question

Writing the Correct Query is Important

Steve Jones - SSC Editor

SSC Guru

Points: 734449
More actions
January 14, 2017 at 6:28 pm

#325324

Comments posted to this topic are about the item Writing the Correct Query is Important

Viewing 15 posts - 1 through 15 (of 42 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 1

Considering recent events, I think it terribly ironic that you published this particular article at this particular time.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

nick.mcdermaid Hall of Fame Points: 3809 More actions · Answer 2

When you wrote 'I don't want this to be political' my first thought was 'don't blame the business or management' but that's not what you meant.... so I will blame the business and management!

Some very basic user testing catches these issues out. Either do proper testing or employ a test team. In my experience, finding someone in the business who is a thorough and conscientious tester is a rarity.

Might I add, it really doesn't help when the latest BI fad is 'self service'. Convincing business that you don't need ETL, data cleansing anymore, just mash all that mess up in the reporting tool!!!

David.Poole SSC Guru Points: 75898 More actions · Answer 3

Way back, much earlier on in my career we had a series of sql Server Agent Jobs that polled the production DBs every 5 minutes to see if each data source had sent some data and if not to email the system and line of business owners.

As the clustered index was always on the record creation date in the front facing DBs the impact on service was negligible and the benefits tangible.

Establishing basic rules for what your daily data should look like isn't hard. If you want to get more sophisticated then a more complex statistical approach can be taken.

I can see two problems with this approach, neither of them technical
1. The "No business logic in the DB" zealots
2. It isn't shiny shiny.

The combination of the two is incredibly destructive. The zealots will propose a solution that is architecturally pure involving at least three technologies. The development cost and operating costs will be such that a prioritisation session will always push it to the bottom of the list, effectively descoping it.

There is a Facebook MEME doing the rounds that says "Briefly describe your job badly". A delivery manager friend replied "I apologise for software being late and buggy"!

LinkedIn Profile

Kyrilluk Ten Centuries Points: 1269 More actions · Answer 4

The issue, in my experience, is rarely coming from the queries: "It is the data stupid!" would say uncle Bill.
You might have created a report several years ago and checked it at the time. But then some smart software dev decides to "repurpose" the fields that your reports capture and slowly but surely your report start spitting out rubbish. There is also the instance where, more classically, people change the fields that were been populated and so with time your report start to be missing data. For example, let's imagine you were reporting from a field called "Ilegal?" to calculate the number of illegal that need to be deported in a particular state. At a later stage someone might add a new field called "Illegal but allowed to stay in the country until legalised" and not inform the report writer. Now if you only report from only the first column, you will give a misleading figures of people that need to be deported back to their country of origin.
As a BI dev, it is extremely rare to see SQL queries on reports that are logical wrong. On the other hand, reports tends to become obsolete very quickly because some *smart* managers decide to modify the metrics or the structure of a particular part of the organisation every other months. And so the reports tends to accumulate and forms "strate" - a bit like in geology - until we review them and decomission them.

Gary Varga SSC Guru Points: 82166 More actions · Answer 5

One problem I see from time to time is a business run on the back of hand written ad hoc queries which sometimes go on to "prove" wrong the analysed and verified queries i.e. the business decides that their processes using incorrect data must be correct.

Gaz

-- Stop your grinnin' and drop your linen...they're everywhere!!!

AllenMWhite Ten Centuries Points: 1198 More actions · Answer 6

In one of my last client engagements before I left consulting a client had a daily "sales" report that drove much of the business. (The business was a credit card processing company.) Over time the transaction codes and supported cards had changed but the queries underlying the report had never been updated, and the codes were hardcoded into the views the report used. I dove in and built a new "rules engine" that introduced data tables with the valid codes, so that when things changed they could just change the tables and the report would be correct. I rebuilt the report and verified that it was correct.

The business side decided not to use it because the numbers were "different". They chose to continue to use incorrect results.

debdip_mukherjee SSC Journeyman Points: 80 More actions · Answer 7

So true. The age old craft of writing a query with the correct logic cannot be replaced with a "tool" based approach.

Steve Jones - SSC Editor SSC Guru Points: 734449 More actions · Answer 8

nick.mcdermaid - Sunday, January 15, 2017 10:29 PM
When you wrote 'I don't want this to be political' my first thought was 'don't blame the business or management' but that's not what you meant.... so I will blame the business and management!
Some very basic user testing catches these issues out. Either do proper testing or employ a test team. In my experience, finding someone in the business who is a thorough and conscientious tester is a rarity.
Might I add, it really doesn't help when the latest BI fad is 'self service'. Convincing business that you don't need ETL, data cleansing anymore, just mash all that mess up in the reporting tool!!!

Certainly this is an issue. Good testers are hard to find, but it's also important that once they start testing something, we capture their tests as much as possible in automation. Humans are poor at repeating tests.

We also need a good set of test data that evolves. It's easy to use test data that includes happy paths and not the anomalies we might get in the real data set.

Steve Jones - SSC Editor SSC Guru Points: 734449 More actions · Answer 9

Gary Varga - Monday, January 16, 2017 5:56 AM
One problem I see from time to time is a business run on the back of hand written ad hoc queries which sometimes go on to "prove" wrong the analysed and verified queries i.e. the business decides that their processes using incorrect data must be correct.

Very true. This is a problem in all sorts of businesses. Another reason I think good test data sets need to be assembled and maintained.

billm 56127 Old Hand Points: 375 More actions · Answer 10

Keep your data clean, and fix your processes to collect needed data.

How:
Create data testing procedures and email the failures to people who can get the data corrected. For example, birth date checks in which the employee would be under 10 years old or over 100 years old. Anytime an issue is found with the data, include a check for that particular type of data. If necessary, change the process to get correct data entered. For example, require term codes to be entered for terminated employees.

Think of it in the same way as running Test Cases in software development.

Eric M Russell SSC Guru Points: 125522 More actions · Answer 11

SQL suffers from the same predicament as HTML and JavaScript; it's an accessible high-level language that's widely used by a lot of people, but only a few people actually know how to code it at a professional level. When I'm called in to help resolve a performance issue, it's typical for IT management and application developers to have this prejudicial assumption that the root cause has something to do with the configuration of the database server or network itself. The developers bristle when I suggest that there is something wrong with how the SQL itself is coded. After all, they self evaluate themselves as an 8 out of 10 when it comes to proficiency at SQL. At least initially, they dismiss my advice because they think I'm wasting time by nitpicking their coding style, but when they see how much impact in performance can be achieved simply by modifying a join or removing a user defined function from the where clause, they realize there is so much more to be known.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

billm 56127 Old Hand Points: 375 More actions · Answer 12

Sometimes a developer does not fully understand the data and uses the wrong keys for joins.

Sometimes a non-developer tells a developer what to do, but contains assumptions not valid in a broader context or lacks necessary keys. The developer does not understand and blindly follows the non-developer. For example using a hard coded value that "never changes". Or an assumed key that is really only a partial key, like WC Code rather than WC Code + State.

Sometimes pressure is on to get things done quickly, and so quality suffers (and the developer knows it) and bites them later.

Sometimes a developer made a mistake due to not really being a good developer. For example, how many days are there from May 10th to May 15th? There are 6 days, not 15-10 days. Another example is not putting parens around an OR condition.

Kyrilluk Ten Centuries Points: 1269 More actions · Answer 13

Eric M Russell - Monday, January 16, 2017 8:34 AM
SQL suffers from the same predicament as HTML and JavaScript; it's an accessible high-level language that's widely used by a lot of people, but only a few people actually know how to code it at a professional level. When I'm called in to help resolve a performance issue, it's typical for IT management and application developers to have this prejudicial assumption that the root cause has something to do with the configuration of the database server or network itself. The developers bristle when I suggest that there is something wrong with how the SQL itself is coded. After all, they self evaluate themselves as an 8 out of 10 when it comes to proficiency at SQL. At least initially, they dismiss my advice because they think I'm wasting time by nitpicking their coding style, but when they see how much impact in performance can be achieved simply by modifying a join or removing a user defined function from the where clause, they realize there is so much more to be known.

And don't get me started on software developers that put GUID everywhere (PK, FK) and set all the other columns as nvarchar(max)! Good luck in creating indexes with that!

Eric M Russell SSC Guru Points: 125522 More actions · Answer 14

Kyrilluk - Monday, January 16, 2017 9:03 AM
Eric M Russell - Monday, January 16, 2017 8:34 AM
SQL suffers from the same predicament as HTML and JavaScript; it's an accessible high-level language that's widely used by a lot of people, but only a few people actually know how to code it at a professional level. When I'm called in to help resolve a performance issue, it's typical for IT management and application developers to have this prejudicial assumption that the root cause has something to do with the configuration of the database server or network itself. The developers bristle when I suggest that there is something wrong with how the SQL itself is coded. After all, they self evaluate themselves as an 8 out of 10 when it comes to proficiency at SQL. At least initially, they dismiss my advice because they think I'm wasting time by nitpicking their coding style, but when they see how much impact in performance can be achieved simply by modifying a join or removing a user defined function from the where clause, they realize there is so much more to be known.
And don't get me started on software developers that put GUID everywhere (PK, FK) and set all the other columns as nvarchar(max)! Good luck in creating indexes with that!

Yeah, GUID keys and entities stored as XML or JSON documents. If that's what they want to do, if storing and retreiving data in that fashion matches the business requirements, then they should be using MongoDB and leave the SQL Server database to do what it does best.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho