Guest editorial: SQL Code Metrics

  • Comments posted to this topic are about the item Guest editorial: SQL Code Metrics

    Best wishes,
    Phil Factor

  • As always, I do like your personal way to write articles.

    Just wanted to let you know it!

  • Nobody has replied to this yet? Well maybe this might spark some debate.

    The thing I started thinking about while reading this article is how terrible of a programming language SQL is and the lack of tools to help you write code. I know you can write a bad program in any programming language, but I don't know if we should even be considering SQL as a programming language. For example, why doesn't SQL have the concept of an array? It's the most basic data structure there is. Also, up until SQL Server 2008, we've never had Intellisense!

    Back when the .NET Framework and C# were released in 2001 it came with a rich IDE, numerous tools, and the ability to document your code using what's called "Xml Documentation". That documentation then becomes part of the tool tip when using Intellisense. Why doesn't SQL have that capability!?

    You also talked about always learning new, perhaps better ways of doing things. While I don't disagree that we should always be learning, I do see it as a problem. If most of you are like me, we tend to try and find new ways of doing things to help keep us interested, make it challenging, and to learn new things. However, I believe that is a main factor of what constitutes bugs and performance problems. For example, I recently came across an article here entitled "Custom Pagination in SQL Server 2005". I thought it was a great article and exactly what I needed. Then I started reading some of the comments and some of the "other ways to do it". And that got me thinking. Why are there so many ways do it? Which one is the best? Why do we keep reinventing the wheel? In my opinion, paging a result set is a very old and very common problem. Why isn't there a standard/best practice way to do it? Why isn't there a tool to help you do it?

    You begin by stating, "SQL has a wonderful way of teaching us humility." And you're right. But it shouldn't be that way. SQL needs to go the way of the Dodo bird. In my opinion, it has been stale and archaic for years. New language features, framework functions, and tools need to become part of the "standard" of SQL. Forget about backwards compatibility, SQL is in need of a revolution.

  • Thanks for breaking the silence, both of you.

    It is difficult to compare SQL with a conventional language. It is like comparing C# to Postscript. Writing SQL gives me more pleasure than any other language, because one can hammer together an application very quickly. It may not look pretty, but it will do a job of work. I've served by time crafting applications in C, Pascal, C++ and so on, but the lingering after-effect is an impatience to get things up and running quickly.

    I think that today is an exciting time for anyone programming with relational databases because we are continually having to rethink what is possible. I agree that SQL has its flaws. It doesn't even allow you to use the whole relational model. However it will do me fine for the time being.

    I think SQL Server's lack of IDE is a historical problem. It was bought-in by Microsoft to fill a gap in the product line, and, as Sybase, was designed to be portable across different operating systems. Everybody expected Microsoft to develop tools as rich as Sybase's Powerbuilder. It never seemed to happen, for some reason. There were several strategies that never really worked, and we never got anything with the richness of Powerbuilder or Business Objects. SQL Server has always remained culturally different from the rest of the Microsoft stable, but that is more the fault of Microsoft than the SQL Language, I think.

    Best wishes,
    Phil Factor

  • On the point of the article, about having a "code checker" for SQL, I don't think it would really be possible, at this point, to automate such a system. Databases and procs and such are far too business-specific.

    How, for example, would a standard checker know if a proc is taking too long to run, or if it's something that's expected to take a long time, and it's okay/good in this case?

    How would a "lines of code" check help in situations like "you have inline UDFs in your Where clause"?

    Now, something that could find things like that might be a good idea, but I don't see it replacing running a server-side trace and querying the results in various ways.

    On the subject of SQL "needing to go away", I have to ask what you would replace it with. It's a pretty darn effective way to access data. I've used OODBs, and so far as I can tell, they universally stink for anything but building simple web apps on top of, and are a nightmare to build reports on.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I like the idea of a code checker but I agree with GSquared that it probably isn't currently possible. There would have to be consensus on what is best practice, allowances for hardware effects etc etc.

    I like T-SQL even though there are a few short comings. I think that some of the issues are really with the tools that are meant to make our lives easier. Intellisense is a great addition to our toolset but I wonder if that will allow some people to become lazy in their work. Without Intellisense we have to remember things and know the hows and whys. This won't affect the people that always make the effort of course but may allow those who perhaps wouldn't have even tried before to have a stab and make a mess that the diligent people will have to clean up later. Hhmm quite thought provoking, advances making it easier to do things either well or poorly which brings us back around to a code checker!

    Cheers

    Nicole Bowman

    Nothing is forever.

  • Phil,

    A fantastic last sentence and a great editorial. I too am not sure that we can build a great code checker for SQL, but there are definitely some basics that could be added. Something to warn about cross joins, something that has intellisense like SQLPrompt from Red Gate or Management Studio 2008, check for cursors, look for subqueries that don't make sense. Likely it would be more of a "warn-er" than a "checker."

    I do like the simplicity of SQL and I think it's a good language for working with sets of data. Definitely a different paradigm than many other languages.

  • Phil Factor (2/17/2009)


    ... SQL Server has always remained culturally different from the rest of the Microsoft stable, but that is more the fault of Microsoft than the SQL Language, I think.

    On most Microsoft development tools, I can easily see cause and reason and have cheered the changes and advancements. I too have observed the seeming gap in the feature set for the IDE-like space in Microsoft SQL Server.

    They have put great effort into the BI space, both graphically and functionally, to include the .NET framework and all of the power there... but seemingly not much in the SQL editor.

    Seriously, why does it seem to have this "cultural" gap?

  • If you had something that would pop up and tell you, "this query involves a cross-join", or "this query will involve multiple table scans of large tables", or something like that, in the way that it will tell you about "index hints force..." and so on, in the Messages tab when you execute the create/alter command for a proc, that might be helpful. You could ignore it if it's expected/desired, but it might help catch a few of the eggregious errors.

    A DBA at a prior job was having a heck of a time trying to figure out why his query was returning 25,000 rows when he was expecting more like 6. Turned out he was using single-letter table aliases in the From clause, and one of his joins used the same alias on both sides, because both tables began with the same letter. The alias was correct, so it wasn't giving an error message, but the join was "a.column = a.column", which made it a cross join. Hard to spot in the middle of a complex query, so easy for the eye to slide right past it, especially when most people see what they expect instead of what's actually there. (Over 90% of people fail on simple errors when tested for proofreading jobs. Not specific knowledge of the spelling of specific words, or judgement rulings on punctuation, but really simple stuff like "the the" being read as "the". People see what they expect.)

    For that kind of thing, a warning about a cross join and the line number it appeared on would have been quite useful. Would have saved about five minutes of frustrating review. For that, a code checker could be quite nice.

    Maybe RedGate should start the R&D on it. If they haven't already.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Steve Jones - Editor (2/17/2009)


    Phil,

    I too am not sure that we can build a great code checker for SQL, but there are definitely some basics that could be added. Something to warn about cross joins, something that has intellisense like SQLPrompt from Red Gate or Management Studio 2008, check for cursors, look for subqueries that don't make sense. Likely it would be more of a "warn-er" than a "checker."

    I think a "warn-er", as Steve has suggested, is a great idea. There are many "obvious" bad ways of coding T-SQL, and a "warn-er" could be used to bring these to the attention of the code writer, much like how the grammar checker in Word works. When you get a warning, you could choose to heed or ignore it.

    Brad M. McGehee
    DBA

  • .NET has been around for less than 10 years and look at the vast number of tools and features it has available. There is a tool called FXCop, now built into Visual Studio that does static code analysis. It's great! Everyone should use it no matter how big or small the project. Heck, it makes recommendations like, "Consider moving this string constant to a resource file." In my opinion, it really does help you write a better program and learn to do things right.

    Now, SQL has been around for how long? How come we don't have anything like FXCop? Phil mentioned in his reply, "I've served by[sic] time crafting applications in C, Pascal, C++ and so on, but the lingering after-effect is an impatience to get things up and running quickly." I agree, and that is why we now have .NET (and perhaps Java). .NET now allows you to write a program quicker and easier than you could in C++. In my opinion, C#, the .NET Framework, and associated tools were a revolutionary jump forward. Why can't we have the same for SQL and databases?

    GSquared and others mentioned the subtle cross join and other mistake that could kind of fall under syntax checking. Visual Studio .NET has "real-time" compilation, where if you make a syntax error, you get the infamous red squiggly line (like in MS Word on misspelled words). Now since it's technically not wrong to have a cross join, they could instead use a green squiggly (like they do in MS Word for grammar). (Dang, Brad posted just before me.)

  • I read the last paragraph with mixed thoughts on the subject of code metrics. I started playing back in my mind what many Developers do if left to their own devices especially if they're under some good pressure to churn out code. They will not document, they don't care about scalability or performance, and they don't give a damned about readability. Well, maybe they do, but they don't exercise those "cares" in the face of the almighty "schedule" especially when the dunder-head that prescribed the unreasonable schedule is hawking the Developer for an "Easter Egg" right now!

    A good set of metrics generated by an automated checker would be a great way to give a code reviewer an idea of what to look for to meet certain company standards and industry best practices. If it were merely a "warn-er", the key to override would be worn through the print and would serve no purpose other than to annoy the Developer who is hell bent for election to meet the schedule at any cost.

    Make no doubt about it... if you want good code, the code must be reviewed by those both qualified to make the review and given the all powerful right of rejection even in the face of schedule. That also means that those persons must also be as sharp as a tack and able to make alternative suggestions at the drop of the proverbial hat.

    I've been in a company that has done both at one time or another... allow unreviewed code to "go in" and allow only tightly reviewed code to go in. As expected, the perception of the review process is one of a general "road block" to performance because, especially when first instantiated, the review process does take up some time. Since most folks put off submitting code until the last bloody second (at first), it will, of course be seen as the "road block". What people don't consider is the post mortem (appropriately named) cost of code that dies in production especially if it isn't noticed right away. Heh... try double billing a telephone customer and watch the hell that breaks loose. Try double billing a half million of them and expect a call from the PUC and possibly being shut down. If you're not shut down, you still have an incredible mess to clean up where a simple, if not, lengthy code review could have prevented the whole thing.

    Consider this... does a spell checker prevent people from mispelling "blue" as "blew" or prevent the use of "an" instead of the "and" they wanted? Even a grammar checker will frequently miss such a thing. So would go either a "warn-er" or a "checker".

    If you want it done right, buckup, write standards, and follow them. No, they shouldn't be designed to prevent new ideas as many would think. But they should be designed to prevent a new bad idea or a mistake from going into production.

    By the way... when full blown code reviews were implemented in the company I worked for, time to write the code and get it through the review went up by about 20%. We also approached near zero defects and the amount of rework time, which was very close to the original amount of code to production time, dropped by 95%. So where something might take a day to code and a day to rework, it now only took 1.3 days and we almost never had bad code make it to production. Guess what we did with that other .7 days per? It gave us the time to meet some of the damnedest schedules you ever saw. πŸ˜‰

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/17/2009)


    By the way... when full blown code reviews were implemented in the company I worked for, time to write the code and get it through the review went up by about 20%. We also approached near zero defects and the amount of rework time, which was very close to the original amount of code to production time, dropped by 95%. So where something might take a day to code and a day to rework, it now only took 1.3 days and we almost never had bad code make it to production. Guess what we did with that other .7 days per? It gave us the time to meet some of the damnedest schedules you ever saw. πŸ˜‰

    Experienced the same thing in the past and you know what, the developers were actually happier. Strange huh? :w00t:

    Side note, we actually use some metrics as a baseline for review, i.e. we have established duration time and read levels for a query based on utilization and if it exceeds the predefined values then we review the procedure / query. Doesn't eliminate all bad code but certainly brings forth the major ones that are going to kill our system while still letting us get some work done. Also allows for some great training opportunities (for me and for those that I work with).

    David

    @SQLTentmaker

    β€œHe is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • p.s. ... and we did it without {gasp} tool tips in SQL. Yes, there are some grand improvements that could be made to SQL, but we've seen the kind of performance inhibited, non-scalable code that comes out of such things as "hibernate", Business Objects Data Integrator, Query Designer in EM and SMS, DTS, and a whole world of other GUI driven database programming methods. Oddly enough, I'll agree that many of those are really great tools... the problem is that they'll allow just about anyone to "write" database code and not everyone is qualified to do so because they just don't understand.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I think one of the reasons that databases and their tools have lagged behind other software development is that it still requires human expertise to make a database perform very well, so automatic tools become more of a problem than a solution at this time.

    Yes, there are ORM tools that can write a good query, but a decent DBA still needs to review them if you want the code to consistently perform well. It takes as long or longer to review automatically generated code as it does to write it yourself in the first place, if you know what you're doing.

    Sure, when I want to build a view, I can drag icons for tables onto a screen, drag arrows between column names in them to define joins, check boxes to put the columns I want into the select clause, and enter formulae into a set of fields to build a where clause. Then I have to review what this creates, which means reading horribly formatted text, making sure it's right, and then clicking the save button.

    Or I can drag the table names into the text editor, drag the Columns folders from the object explorer into the same, and then type up the rest of it myself. It's faster, it ends up formatted better (equals "the way I like it" and/or "the way it's required to be because of company standards"), and it gets the same end result.

    Of course, it helps that I type very, very fast, but that's just a slight advantage, the rest of it is an experience advantage.

    But the underlying reason this is needed is because database performance just plain sucks when you leave it on its own.

    What will happen to database performance when hard drives are replaced with something a thousand times faster? How about a million times faster?

    What will happen when multi-table indexes become less of an overhead and more readily available for use? How about when the database detects the need for these indexes and builds them on the fly on its own? And when it can do so correctly?

    How about a better implementation of OR in Where clauses?

    As these kind of things happen, it will be less and less critical for database code to perform well, which will make it more and more likely that more automatic tools will be more useful.

    To compare that to application development in .NET or whatever, take a look at how efficient, fast and powerful code written at a very low level, like C, can be. Procedural code will generally outperform object-oriented code by a significant margin. But OO code is more common in application development because hardware, compilers, libraries, etc., have become powerful enough to make up the difference as far as human users are concerned, and OO code is generally easier to learn, easier to write, much easier to collaberate on, etc.

    The database remains the biggest performance issue in most application development. It's where the heavy lifting takes place. It's where most of the IO takes place. It can't be offloaded to the GPU. Etc.

    Till that's no longer true, I think "more sophisticated" database dev tools aren't practical in too many places.

    But I also think it's catching up fast. And the better tools are evolving right at this moment.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply