Guest Editorial: On writing SQL

  • Comments posted to this topic are about the item Guest Editorial: On writing SQL

    Best wishes,
    Phil Factor

  • I'm actually not entirely sure what you're asking. Do you mean something like: Always join on the key, the whole key. Or are you looking for tricks with XML execution plans?

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • The question I get asked is much higher-level than that. it is more like 'How do you actually go about writing a stored procedure'. I always tend to start out on the same tack as you, but I soon find I'm not answering the question. It is more like how you go about debugging code, changing sections of it, refactoring it, or just plain and simply how you check on the behavior of the system without having breakpoints or sophisticated debugging.

    You have developed a stored procedure, maybe, and you want to check on the parameters passed to it, or perhaps the value of a variable, or the contents of a temporary table when the system is actually running. How would you construct a test harness for it? How do you go about profiling the performance of a stored procedure to find where the performance problems are?

    What sort of advice would you give?

    Best wishes,
    Phil Factor

  • If I understand the question, I guess it's something I do every day (all day sometimes). First, I need to have a set of working parameters. Preferably, I have some of the most painful parameters, the ones that cause the most work or return the most data. Assuming I have them (I can get them by using Profiler, but let's not go there for the moment), I always start with an execution plan (shocking I'm sure). In addition, I get the Statistics I/O and TIME. Usually, these three bits of information are enough. But if you get into really tough procedures, you might go back to Profiler and set up a trace that captures statement executions (I've been doing a bunch of that this week on a particularly ugly query I was tuning). This will show which statement within the query is causing pain. If you're really stuck, you should also capture wait states to see what specifically is causing the slow down (this week it was I/O latches caused by fragmented indexes, I'm beating, uh, I mean talking to, the admin team about the missing defrag job).

    Is that the sort of thing you're looking for? Obviously I'm skipping hundreds of details on every step.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thinking about it some more... how do you write a procedure...

    Well, I assume that you know what you want to get, what data set you're looking for, first. Armed with that you have to look at the database to see if contains that information and in what form. From there it gets really tricky.

    This is a hard one.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I was getting into an argument with some Agile XP people who were making out that they were the first to see the central importance of unit-testing when developing software. OK, I thought, I've been writing functions, views and procedures in TSQL for years, and I've always had that focus. You sit down at the keyboard and you say to yourself, How in the blazes am I going to test this routine I haven't yet written? Once you've worked out how you can test it, Then you write it. Surely the principles of Test Driven Development (TDD) have been around for years?

    It is the same with performance profiling. I'd have thought it is always better to design routines in a way that makes it easy to monitor performance issues, than to wade in retrospectively. (I use a debugging log to do this, that is switched in or out as required)

    I've worked with many other SQL Developers who have done it a similar way, even to the point of using test harnesses in VB or C++/C#, but there are a whole lot of other ways of writing TSQL, especially now we have the profiler, can read execution plans (with Grant's book there and open at the right page), and all sorts of other tools to make life easier.

    I had the idea a while back of writing something about developing SQL Code using very simple test harnesses. I was then gripped with panic, thinking that it was possibly a completely unnecessary skill, now that the information we can get about the dynamic behaviour of SQL routines is so sophisticated, and I'd make a complete idiot of myself describing it. But then, the Agile people got me thinking again....

    Best wishes,
    Phil Factor

  • I spend more of my time on the development side rather than the DBA side. To help with overall testing, I've taken to creating a set of test data that I can load up into the QA environment for our unit testing pass. This gives me predictable results from a known starting point which I also use in development.

    Any time a "new" bug bites me, I add a sampling of the data that caused it to the test set.

    Regards,
    Michael Lato

  • I'll bet no-one's going to like this answer... somebody may even call me an idiot, LOL. And maybe my answer is situational... may not work for everyone:

    I single thread my queries: a series of SELECT INTOs, with one JOIN apiece.

    And I did not come to this style to ease debugging: I came to it based on performance observations.

    I suppose SQL is intended to be smart enough to handle multiple joins in one pass with just as much efficiency as single joins... but it's not. (Well, maybe 2008 is -- haven't tasted that one yet.)

    In our environment, where I'm often joining tables with millions of records apiece... it has often given a query a ten-fold performance gain, to start with a SELECT INTO on *ONE* table, with *NO* joins... and then, maybe, do multiple joins on the subsequent SELECT INTOs... but keeping each pass pretty simple.

    NOW, back to the topic: this also happens to make debugging very simple: you can look at the resulting tables, to see what data was picked up on each pass.

    (This approach doesn't make as much use of a graphical query designer... but 2005's graphical interface is worse than Enterprise Manager's was, anyway.)

    Again, the above may be situational: if you've got better indices than mine, or your tables aren't millions of rows long, or... there may be a variety of reasons why your performance is just fine, with complicated JOIN setups.

    Good luck, and be well,

    -- Doug Ivison

  • On a high level, I consider the stored procedure to be what in electronics is called a black box - you define the inputs and the expected outputs.

    All of the following would be done in the development environment using a recent backup from production.

    Begin a new query, DECLARE the input parameters and SET those parameters to sample values. Write some T-SQL using those parameters to produce the desired results.

    For INSERT, UPDATE or DELETE sp's I would also insert BEGIN TRAN and ROLLBACK TRAN with a few "SELECT *" statements before and after the T-SQL so I can verify the data changes.

    At this point you have a working batch & you can determine if results & performance meet the objectives.

    Once the batch is working, comment out the DECLARE's and SET's and add the CREATE PROC statement at the top of the code. Create & then execute the sp with various combinations of parameters to verify that the outputs are still correct. I would also run it with some inappropriate input values to see if I can break it.

    After all that, you can remove any TRAN and SELECT * statements and the sp is ready for QA testing.

  • I think the important thing is to "establish my interface" first. Exactly what am I required to provide? What must the data look like? Do I need one result set or two? A scalar or a result set? How many rows per record set? What does the aggregation or nullability require of me? Etc.

    Once I have that I try to determine the logical order to get that data. "Store this data in a temp table then ...". What are the major building blocks as I assemble the data and what order must it happen in?

    From there I start at the "inside" of the first query and, iteratively, work outward. Then the next query, etc.

    At least that's what I remember through the creative fog. 🙂

    [font="Comic Sans MS"]Tom Powell
    http://philergia.wordpress.com/[/font]

  • One thing that i think has to be done for any TSQL that is going to stick around is to run a showplan on it. Ensure that the indexing that you expect to be used is being used and also that the indexing you think to be the most efficient actually is. Obviously this will be dependant on the load of data but a very important step. Some of our developers are finally seeing the value in this. Of course it has taken some of our sql apps 2 years to grow large enough db's to validate the purpose of performance testing but boy do they hate having to be brought back in the loop to re-engineer something.

    No matter how big or small make it run as fast as possible. My favorite was still the developer that said... 'It only takes a half a second or so'. My response was that if you do the math that means that the number of estimated transactions in a day will take a week. That just isn't going to work.. Needless to say that after we tuned her query it ran in 1/100 of the time. All because 'I was just told to make it work' says the lazy dev.

  • I appreciate people willing to expose what they do during development. It's nice to compare notes and get ideas.

    I wouldn't presume to expose my techniques under the rubric of best practices - but I am willing to share if I'm allowed to shoot for "adequate" practices that stay in the safety zone - following if imperfectly the spirit of Alistair Cockburn's (Agile Manifesto, http://alistair.cockburn.us/) more humble and perhaps more practical objective.

    One technique is to start with a TSQLUnit test that fails in the absence of the stored procedure. Then produce a skeleton stored procedure that receives but doesn't use inputs and that outputs hard-coded values and data sets. These hard-coded values are perhaps set up with other pieces in mind - to satisfy basic requirements of other entities so that construction can proceed elsewhere. Once the skeleton is tested and validates, then refactoring can begin.

    During refactoring, one utilizes inputs within the stored procedure and produces real outputs. The tests focus on validating expected values, behavior with value extremes and performance issues. When an adequate set of tests have been completed and passed, development stops.

    Tests continue on other pieces. If other tests expose an issue with the stored procedure, if only performance issues, another iteration of refactoring begins.

    The effort expended in refactoring can scale to the requirements and resources of the project. More refactoring is expected for a project where bugs are more dangerous - like a project involving nuclear reactors.

    I hope this description was adequate.

    Bill Nicolich: www.SQLFave.com.
    Daily tweet of what's new and interesting: AppendNow

  • I've always been a visual thinker so I can appreciate Phil talking about seeing the joins or smelling code trouble. To me, I see the data flow through each part of a query like water flowing through the pipes in my house, but with the ability to control all the valves simultaneousy.:hehe:

    In a way I guess it's kind of like when foreign language teachers say you need to think in Spanish or French, I find myself thinking in SQLesque, but visually instead of SELECT... FROM... WHERE...

  • How do you write a stored procedure? That's pretty open ended. How does an artist make a painting? How does a musician write a song? Of course since we have an end in mind (some result) we need to verify that the outputs are correct. But so do other artists.

    Since coding is a bit of an art, I would argue it is one of the reasons why you can't just take a 12 month technical course and then go to work writing applications if that's all the training you have.

    Whistling ASCII would be a neat party trick. I can also store large amounts of memory in my brain. Was handy when I needed to regurgitate information back in school.

  • Interesting editorial, Phil.

    I tend to write procedures in the following way.

    1. Identify from the requirements what the inputs and outputs should be.

    2. Break the task into logical steps. Not procedural ones, but sets of information: this part is looking for cancelled orders and this part for cancelled shipments, for example.

    3. Test each substep individually. I tend to use a lot of CTEs and derived tables partly because it is simple to test them. and once I know that these entities are returning correct results, then I can build on them without worrying about them.

    4. Combine the subcomponents into the general query. If it doesn't work, I know it's the SQL in the outer query that's the problem, because I've tested the SQL in the inner queries already.

    5. Once the outer query returns correct results, look at the query plan and see if there are efficiencies to be made. If I'm using some set of data in more than one place, does it make sense to break it out into temporary storage? Is there a complex and expensive join that would benefit from being calculated in a temp table? Can arguments to the procedure be pushed down into the subqueries?

    6. Make performance changes one at a time, and test the results each time.

    7. Document it. I deal with some tables that are on the order of 1B rows, and sometimes steps 5 and 6 can end up making a stored procedure that even a reasonably good developer will have trouble unraveling.

Viewing 15 posts - 1 through 15 (of 73 total)

You must be logged in to reply to this topic. Login to reply