Stored Procedures Reconsidered

  • braindea-754908 (10/23/2009)


    Glad I came upon this post. I have long suspected that there was someone else who looked at this as I do. I've been programatically constructing database queries (client/server, mainframe) for many years through whatever "dynamic" method was available as I find a higher level language (currently C#) has much better inheritence (for objects that play a role in assembling the SQL to perform) - than having to write a stored procedure for every little thing.

    You'll think it odd, but as a DBA, I agree with you... except that a lot of folks simply don't know how to write SQL even though they've "been doing it for years" and because it's part of the GUI code, the DBA never gets a chance to review it. The kinds of peformance problems that I find in GUI code are usually due to a trivial understanding of SQL. For example (my favorite example, actually), what's wrong with the following embedded SQL and how could it be written to not cause a performance problem?

    SELECT *

    FROM sometable

    WHERE COALESCE(somenumericcolumn,0) > 0

    DBA's and a lot of SQL Developers are frequently chastised for being so "pro" SQL. Putting the shoe on the other foot, GUI programmers will frequently rely on things like NHibernate to do their SQL thing for them. Putting it in a less gentle way, they sometimes don't know when to quit. I just troubleshot some NHibernate generated code that a GUI programmer allowed to happen instead of working with the resident DBA (ie; Me)... he called it an "aggressive GET" where everything he needed to populate a rather large screen was done in a single query with a 60 table join with a pot wad of aggregation... just one instance of the screen cause two CPU's to slam into the wall for several minutes and the system was supposed to be scalable to 1,000 concurrent users. The really funny part was, it was perfectly acceptable to him for the code to take those several minutes because "it was getting a lot of data". He was totally oblivious to the fact that the code had caused two CPU's to go into meltdown never mind what it was doing to the storage system.

    So, yes, I agree that you don't have to write a stored procedure for every little thing... but, like the code example in the code window above, it's sometimes the little things that can cause big damage. If someone is going to write embedded SQL, they need to either know how to write effective SQL or be willing to review their embedded SQL with someone who does. The problem arises when you don't know what you don't know.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/26/2009)


    ... he called it an "aggressive GET" where everything he needed to populate a rather large screen was done in a single query with a 60 table join with a pot wad of aggregation... just one instance of the screen cause two CPU's to slam into the wall for several minutes and the system was supposed to be scalable to 1,000 concurrent users. The really funny part was, it was perfectly acceptable to him for the code to take those several minutes because "it was getting a lot of data". He was totally oblivious to the fact that the code had caused two CPU's to go into meltdown never mind what it was doing to the storage system.

    ...

    Heh, never mind what it would be doing to the Users. 🙂

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Great example, Jeff. You've put your finger on the point I think is the essential element of the great 'programmer vs DBA contention' (aka 'the database-object impedance mismatch').

    Programmers LOVE to build in 'smarts' to their objects and hence like to think that the objects are far more efficient than an inherently 'dumb' database could ever be.

    Problem is, while the objects are individually smart, on a collective level they usually use very simplistic modelling of relationships.

    A set of objects (corresponding to a database table) is generally loaded with a single SELECT statement, but if related objects are wanted, child objects are generally loaded with one SELECT for each object in the parent collection (some ORMs have support for 'collective loading' but this is often regarded as a hack).

    As you would well know, this is RBAR in its very essential form. Consider (shudder) three or more levels of hierarchy, or more complex structures than parent-child.

    Many programmers don't seem to understand that while the objects in memory are orders of magnitude faster than a database, if getting them there and saving them back uses operations that are orders of magnitude slower than using properly constructed SQL, and the in-memory manipulation is trivial by comparison, the net effect is simply slowing down by orders of magnitude.

    But programmers rightly hate doing aggregation in SQL, because it breaks object encapsulation. It's a difficult chasm to bridge.

    The thing that SQL can provide that no object structure I've seen can is the query planner and optimiser, which is expert at traversing these complex structures and planning aggregate data retreival in the most optimal way.

    As RAM becomes standard storage (I don't think it's a contentious proposition that hard disks will no longer exist in 20 years), it will be interesting to see how RAM-based database servers impact on this problem.

    In particular, I think that the next growth area in programming will the the building of object planners/optimisers which do exactly what the SQL one does, but on structured object networks in memory. Presumably there will be some kind of object-inheritable DDL API to define the relationships between objects, then the optimiser can go away and do its thing - on objects rather than rows and tables.

    It would be interesting even now, to have a look at the source for an optimiser for an open source project (MYSQL, PostGres - don't know how effective the optimisers in these are) and see what it would take to 'open it' like that.

    Maybe in another life where i don't have kids 🙂

    I apologise if this is a bit of a rave, but I really want someone to build this so I can use it ! C'mon open-sourcerors !

    Edit: I know there were object-oriented DBMS's constructed years ago, which didn't really take off. Anyone used them and can point out the flaws ? Am I totally off base here ?

    Ben

  • Thanks for the reply, Ben. Good stuff.

    The problem with true "Ram Disks" in the future is that they do nothing to help the "pipe". You still have to have data move from the server to the client.

    So far as OODB's go, I don't believe that's a real big help in the batch processing of millions of rows... it may help OOzers write code, but I think it would take much more time to write code and I'm not sure that it would do anything for performance, scalability, or getting people to write more intelligent database code in general.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Re the pipe, have you read Rocky Lhotka's C# Business Objects 'Chapter 1 - Distributed Architecture'.

    (sorry about this:)

    http://books.google.com.au/books?id=6OkTWMLg71IC&pg=PA1&lpg=PA1&dq=%22logical+and+physical+architecture%22&source=bl&ots=9U2PFQz0gq&sig=iNcwGbgXKw55F9O2npFvKLD1bSI&hl=en&ei=nQXnStucJNaCkAXZ5oTBBg&sa=X&oi=book_result&ct=result&resnum=2&ved=0CA4Q6AEwAQ#v=onepage&q=%22logical%20and%20physical%20architecture%22&f=false

    I thought I knew about Distributed Architecture, but read this (just Chapter 1) and tell me if you don't feel about 10 IQ points smarter.

    Re OODB's, I'm talking essentially about having the optimiser in the business layer itself, probably not on the client, but at least not divorced from data storage. An optimiser which does pretty much exactly what the current one does for the DBMS, but for memory-persisted object networks. If this was the case, it would not need to involve more code than the current system, although the code would apply to and reside in a different layer. It would be a unification of SQL into the OO framework, in a way that LINQ for SQL is the palaeolithic precursor of.

    Agreed, an analogous system would be vulnerable to analogous bad programming practices. It seems that they are the invariant around which all such debates revolve.

    B

  • A set of objects (corresponding to a database table) is generally loaded with a single SELECT statement, but if related objects are wanted, child objects are generally loaded with one SELECT for each object in the parent collection (some ORMs have support for 'collective loading' but this is often regarded as a hack).

    As you would well know, this is RBAR in its very essential form. Consider (shudder) three or more levels of hierarchy, or more complex structures than parent-child.

    Many programmers don't seem to understand that while the objects in memory are orders of magnitude faster than a database, if getting them there and saving them back uses operations that are orders of magnitude slower than using properly constructed SQL, and the in-memory manipulation is trivial by comparison, the net effect is simply slowing down by orders of magnitude.

    But programmers rightly hate doing aggregation in SQL, because it breaks object encapsulation. It's a difficult chasm to bridge.

    The thing that SQL can provide that no object structure I've seen can is the query planner and optimiser, which is expert at traversing these complex structures and planning aggregate data retreival in the most optimal way.

    This problem is specifically what ORMs are for. Eager loading / future queries are not a hack. They are very important features. I know that a lot of people think ORMs are simply a way to avoid writing SQL / knowing SQL, but it's simply not true. They are built to address this exact issue. With NHibernate, LLBLGen, etc... you can define your fetch depth per query very easily. They can very easily take a grandparent->parent->child operation that would take 1000s of calls to the database in your scenario and turn it into one query. However, it's important to keep in mind that ORMs can be misused just as easily as SQL can be misused. The 60 table join in the post above is proof of that.

    Having one proc per entity and then calling the database in a loop to fill out the relationships is a common problem with hand written data access. Unfortunately, almost every page in an application could require a slightly different fetch strategy. It's not really practical to write a best case scenario proc for every fetching variation if there are many for each object graph. It would lead to an explosion in the number of procs.

    All queries - procs / hand written sql / or dynamic sql from an ORM need to be profiled during development. Select N + 1 problems through hand written procs are often not caught until they make it into production because people aren't profiling.

  • >>>They can very easily take a grandparent->parent->child operation that would take 1000s of calls to the database in your scenario and turn it into one query.

    In a long career as a database programmer, I have never used SQL code generators of any persuasion, and I have never written an app that made 1000s of calls -- a call for each record? Ugh.

    >>Having one proc per entity and then calling the database in a loop to fill out the relationships is a common problem with hand written data access.

    Not common among good database programmers.

    >>It would lead to an explosion in the number of procs.

    T-SQL has if-then and case.

    _________________
    "Look, those sheep have been shorn."
    data analyst replies, "On the sides that we can see.."

  • I apologise if I misrepresent the more sophisticated ORMs, Daniel, I haven't used NHibernate.

    I do use several ORMs regularly (I like Subsonic, partly because it doesn't even attempt to solve this kind of problem).

    But what you've said does kind of support my point. In SQL I can feed the engine thirty tables, with all kinds of crazy outer joins structures even with multiple join paths between designated tables, and as long as I have designed my relational structure and indexing and done my SQL right, it quite happily optimises away and spits out a result set, often in sub-second time.

    I used the parent-child example to illustrate an elementary pitfall of the ORM process, but really even parent-child-grandchild is dead simple compared to the above.

    What I'd love to see is being able to define relationships between object collections on the ORM, in an equivalent way to how DDL is used in SQL, and a parser on the object side that can process a complex thirty-collection object structure and extract a result just as efficiently, perhaps direct from memory or perhaps with a database server hanging off the end.

    The SQL optimiser can do this in a client-server environment, and database functionality is supposedly a subset of object functionality, so why not move (or at least duplicate) the smarts to the object side ?

    And ORMs aren't for avoiding writing SQL - that's LINQ 🙂

  • No offense taken. This is a fun topic to discuss.

    What I'd love to see is being able to define relationships between object collections on the ORM, in an equivalent way to how DDL is used in SQL, and a parser on the object side that can process a complex thirty-collection object structure and extract a result just as efficiently, perhaps direct from memory or perhaps with a database server hanging off the end.

    The SQL optimiser can do this in a client-server environment, and database functionality is supposedly a subset of object functionality, so why not move (or at least duplicate) the smarts to the object side ?

    And ORMs aren't for avoiding writing SQL - that's LINQ 🙂

    I'm not exactly sure what you mean by defining relationships between object collections. Can you explain? The the different types of relationships you can map tend to vary quite a bit depending on the ORM.

    On your second point, it sounds like you may be describing object cache which is a feature that some ORMs support. Example, NHibernate supports two levels of caching:

    First level cache: this is a cache within one "session"/set of request to the database. Example - lets say two widgets on a page need the same object. It will only be requested from the database once no matter how many different distinct queries happen during the page load.

    Second level cache This is a cache that is shared by all sessions. This can also be distributed across servers. This really helps avoid trips to the database for data that is read often but changed not so often.

    Is that along the lines of what you meant or am I totally missing your point?

  • If the data is stored in a relational database, the only way to read/write it is with SQL. By definition. So training-wheel programmers write LINQ which writes SQL, and object-oriented programmers think about collections of objects which must be rewritten as table joins at some level of the application in order to read/write the data. Away with junk. Model the business relationships using entity-relation diagramming and get right to SQL programming written with understanding of the business relationships and understanding of the database engine.

    _________________
    "Look, those sheep have been shorn."
    data analyst replies, "On the sides that we can see.."

  • katesl (10/29/2009)


    Model the business relationships using entity-relation diagramming and get right to SQL programming written with understanding of the business relationships and understanding of the database engine.

    There is an architectural pattern called active record which is just that. It's very useful for many types of applications. In these scenarios, static queries (procs etc...) are usually just fine. When ORM is used with these types of apps, it's usually just to save time and doesn't solve any real technical problem.

    Stackoverflow.com (the codebase that hosts http://ask.sqlservercentral.com/) does active record via Linq2Sql. They very could have easily done it with procs. I've gathered from interviews with the developers that they have very little to none in the way of object / relational mismatch. Interestingly enough, they are doing well over 1 million requests a day now and they are running off of one Sql Server instance. Apparently they aren't even close to needing to cluster the database. They have moved to two application servers however.

Viewing 11 posts - 151 through 160 (of 160 total)

You must be logged in to reply to this topic. Login to reply