Guest editorial: SQL Code Metrics

  • While static analysis of SQL is still in its infancy compared with other languages like C++ and C#, I'm surprised that DataDude (aka DBPro) hasn't come up in this thread yet.

    http://msdn.microsoft.com/en-us/library/dd172133.aspx

    And it's extensible (if you feel compelled to add your own custom rules):

    http://blogs.msdn.com/gertd/archive/2009/01/01/creating-t-sql-static-code-analysis-rules.aspx

    Especially now that (I believe) Visual Studio for Database Professionals has been folded into the other SKUs... which makes it "free" (for acceptable values of free). Gert's blog has all kinds of cool stuff in it for database developers (his new project will be interesting to watch, too):

    http://blogs.msdn.com/gertd/archive/2008/10/02/more-team-developer-data-edition-merge-information.aspx

    I have no illusions that's most development shops are mature enough to add static analysis of their data layer code to the mix, but for a number of data developers the addition of SQLCLR was nirvana because they could add unit testing and static analysis of some of their data layer code to the toolbox. DBPro (aka DataDude) brought a similar toolset to the mix for pure T-SQL.

    Of course, like a large number of static analysis "bugs" that I've seen produced from C++ and C# code, these "bugs" found in SQL have the potential for being equally ignored.

    :Whistling:

  • David Reed (2/18/2009)


    While static analysis of SQL is still in its infancy compared with other languages like C++ and C#, I'm surprised that DataDude (aka DBPro) hasn't come up in this thread yet.

    (snip to end)

    Don't know why I haven't bumped into this before, but thanks for the info! It actually makes sense too... yes the back end management studio has a SQL editor, but it is not really meant as a dev environ... so roll a SQL dev environ for SQL into VS... I am now rethinking some dev projects I have coming up.

    Others have produced tools for VS for languages outside of the .NET framework (such as PHP and PERL), so there is no reason it could not be built into that platform (including the suggestive hints and warnings) for any language, including all versions of SQL.

    I have just not developed a database from ground zero in the VS/.NET IDE before, but have used it for other code development since VB 5 (VS) to current VB, ASP and C# (VS.NET).

  • This might be a bit off topic, but this discussion reminds me a lot of reading Larry Wall's comments on PERL. PERL has a lot in common with SQL, but I suppose you would say it's closer to a 'real programming language' on the spectrum.

    Larry's comment was that languages like C are designed to be 'orthogonal', ie. mathematically complete and deterministically oriented. He compared PERL to the strategy builders used to build walkways at one of the Universities in the US - they waited three months after the new buildings were opened, and then laid paths wherever there was a track through the grass.

    This describes SQL perfectly; it is an eminently practical language, designed to achieve real-life outcomes. SQL surfs the waves of complex data interdependency. 'Orthogonal' languages attempt to impose a rigid structure which can quickly become a victim of exactly the complexity it sets out to subdue.

    SQL offers tools for the traveller to find various paths through the hills from points A B or C, to points D E or F. 'Orthogonal' languages attempt to bulldoze the hills flat so they can roll out a path wherever they damn well please. All that work might be worth it if it wasn't for all the ongoing tectonic activity.

    The motto of PERL is 'there's more than one way to do it'. Not a bad motto for SQL.

    SQLPoet

  • Speaking of CLR's and Perl...

    There's only two classes of CLR's that I haven't been able to beat for performance (so far), but I've come damned close... RegEx and certain types of file handling. The only reason why I see CLR's becoming a "Nirvana" is for Developers who need to get a job done, but don't actually know how to do it in T-SQL.

    As for Perl... some folks no longer with the company, wrote some rather elegant and well constructed Perl to do a split on some files that just aren't importable by the normal means. The files have a "header" section and double headers that must be combined and used as entries into some rather nasty NVP tables. Not my design, just a fact.

    To do this nasty import and split on a 39k row, 215 column input file (CSV version of a bloody spreadsheet, really), it takes the Perl code about 40 minutes. After all, it has to gen roughly 8 million rows to describe each "cell" in the file. I've got it down to less than 2 minutes in T-SQL using, you guessed it, a Tally table split and some running total "smears" for the column headers.

    My point is that we seem to be going backwards. Rather than making smarter tools that help the Developer make better decisions and better code, they've made the marketing decision (and it's a very effective one, dispite my great misgivings about it) to simply make it easier for people to do what they know (heh... or not) even at the cost of DB performance. And, notice I didn't say that the tool should make the decisions. It should present options and ramifications rather than rigidly enforcing some policy dreamed up by some bloody ISO committee. Shoot, that's why most folks think code reviews will be such a burden and that's also why I hate Oracle for many things... I know what I want to do to improve the performance and, yes, that sometimes requires some rule breaking... but that should be my choice or the architect's choice. I don't want some RDBMS to squawk at me when I'm trying to do something different using set based techniques. Sure, I can write a cursor to over come that problem, but that's a performance problem in itself, even in Oracle.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • David,

    At $5,499.00, a Visual Studio Team System Developer license is going to be minority interest, surely? C# or VB.net developers can use .NET Reflector and Peli's code Metrics add-in for free. I was thinking more of a community project to bring static code analysis within reach of everyone. My editorial was really trying to express my mixed feelings about making this sort of tool available for anyone who wants it, as I was worried that it might entrench programming techniques. Actually, I really like Data Dude's static code analysis as a way of encouraging teams to conform to a corporate style. It certainly has a place in the market.

    Jeff,

    The trickery that we're all experimenting with at the moment in using TSQL to slice and dice our Data Feeds is really all made possible by the (MAX) datatypes. Suddenly, it has made it all achievable, but the techniques we are using were all around when we were wrestling with 255 character CHARs. They were just slightly less use back then! I've been experimenting with getting data from feeds like JSON, YAML, and even doing some limited parsing of SQL (for the Prettifier, at the moment!)

    SQL's intrinsic power is apparent in doing these unfamiliar tasks. Perl is great, and I've used it a fair bit, but it is a Smörgåsbord. You want something? It gets bolted on somehow. What you don't get is the intellectual consistency that allows you to do things that the original creators of the language couldn't imagine, and do them simply with classic techniques.

    Best wishes,
    Phil Factor

  • Phil Factor (2/19/2009)


    Perl is great, and I've used it a fair bit, but it is a Smörgåsbord. You want something? It gets bolted on somehow. What you don't get is the intellectual consistency that allows you to do things that the original creators of the language couldn't imagine, and do them simply with classic techniques.

    I agree... I've seen folks do some pretty amazing stuff with SQL Server. Ran into one just the other day where a couple of folks figured out how to take the difference between two ROW_NUMBER () OVER's to create temporal sequences on repeating groups without using the "quirky" update. Think of it as the "impossible ranking" that's no longer impossible. It was a fascinating study and quite the revelation for me when I saw how they did it. And, it's so simple... I just wish I had thought of it first.

    Thanks for the thought provoking article... you always seem to bring the best out in people.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Microsoft has added "Static Code Analysis" to Visual Studio 2013 - SQL Server Data Tools (SSDT).

    See attachment.

    For example you have rules to avoid SELECT * in stored procs or deprecated syntax.

    There are rules for data type compatibility and avoiding non-SARG-able WHERE clauses.

    For some reason they have rule #7 that wants you to use WHERE ISNULL(column, default value) which is just plain wrong and would cause a table scan.

    It is a pity that you cannot dismiss individual warnings as you reviewed them. That would make it really useful for quality management, as you could either fix the issue or review and dismiss it. Without this, the tool always produces a lot of noise which may hide important issues.

    Also, it's a pity you cannot export the results into a report.

    But overall, it is a step in the right direction.

  • If you work for an organization that counts lines code as a metric for developer productivity, then RedGate's "SQL Prompt" extension for SSMS is a great investment. A poorly formatted SELECT statement of moderate complexity can be stretched out to 200+ lines. It can also collapse SQL statements for ease of readability.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell wrote:

    If you work for an organization that counts lines code as a metric for developer productivity, then RedGate's "SQL Prompt" extension for SSMS is a great investment. A poorly formatted SELECT statement of moderate complexity can be stretched out to 200+ lines. It can also collapse SQL statements for ease of readability.

     

    The problem is that it doesn't actually identify crap code. 😀

     

    I think judging productivity by lines of code is a terrible metric and is frequently contrary to what Set-Based and effective code is all about.  Rather, companies should be measuring developer productivity by how many or how few times their code needs to be reworked or fails to operate correctly or does/doesn't cause performance issues.

     

    Yeah... sorry.  I know I'm preaching to the choir but it's easier to rant here than it is to go chop down a tree with my bare hands (no axe, just hands) to take out my frustration on that subject.  It also annoys the neighbors to no end because I've run out of trees in my own yard and I'm not real picky about whose tree might be next. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Eric M Russell wrote:

    If you work for an organization that counts lines code as a metric for developer productivity, then RedGate's "SQL Prompt" extension for SSMS is a great investment. A poorly formatted SELECT statement of moderate complexity can be stretched out to 200+ lines. It can also collapse SQL statements for ease of readability.

      The problem is that it doesn't actually identify crap code. 😀   I think judging productivity by lines of code is a terrible metric and is frequently contrary to what Set-Based and effective code is all about.  Rather, companies should be measuring developer productivity by how many or how few times their code needs to be reworked or fails to operate correctly or does/doesn't cause performance issues.   Yeah... sorry.  I know I'm preaching to the choir but it's easier to rant here than it is to go chop down a tree with my bare hands (no axe, just hands) to take out my frustration on that subject.  It also annoys the neighbors to no end because I've run out of trees in my own yard and I'm not real picky about whose tree might be next. 😀

    https://youtu.be/TQQ32lEqCS0

     

  • Yes, I'm just being sarcastic about lines of code. 😉

     

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Heh... wrong color... it needs to be a Red Herring to chop down trees with. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply