RE: n-hibernate and dynamic sql - DBA vs Developers

SSCoach

Points: 16430

August 15, 2007 at 12:42 pm

(sorry this is so long but "the devil is in the details" )

Hello. I am a DBA facing pretty much the same issue and would appreciate some DBA feedback. As has been mentioned already (I guess in another forum on the same topic) the vast majority of information on this topic is all from the developer perspective; there is very little information from DBA's that have experience with ORMs (Object-Relational Mappers). The situation I am faced with has a few differences and nuances so I will try to describe in detail.

To start with, the company I work for uses Java as the application language and SQL Server as the database. The issue of ORM has reared its ugly little head as we are starting a new project. The developers have already advocated using Hibernate as it is becoming (or is already) an industry standard for Java. In our discussions we talked about performance and security as the main concerns, as well as maintainability of DB code and trouble-shooting.

The issues of maintainability and trouble-shooting regard the fact that we are unable to easily test the queries that are being submitted in order to test the performance. Yes, we can use SQL Server Profiler to capture queries and their exact CPU, Reads, Writes, and Duration, but that is not a good long-term solution to have running against production on a constant basis. Also, if we want to find where a table or field is used we cannot search the text of all procedures. Finding where a table or column is used might not be a major issue as it might be possible to search the applications DB access mapping file to see the references.

The main concerns really are security and performance. These issues have been discussed already in this and one or two other forums but I don’t think the level of detail needed was ever given to the topics as certain assumptions were made about how Stored Procedures are used and about how Hibernate submits queries. I will start with just the benefits of the Stored Proc method and then point out some variables that might alter what the typical response regarding them is.

Security can be split into two subcategories: data access and query modification (e.g. SQL Injection). If we are using Stored Procedures, then we can control data access directly to the tables tightly. We can turn off all read/write access to the tables and give only Execute permission on Stored Procedures that access the data. This way, the account that the application logs in as, even if compromised, cannot issue direct ad-hoc queries; it can only Execute Stored Procs and hence only interact with the data via the predefined API of the Stored Procedures. This does not mean that someone cannot do any damage if they do compromise an application account, but the risk is minimized a bit. Regarding query modification, given how Stored Procedures pass in and use parameters, SQL Injection is not a problem.

In terms of performance it has been widely noted that Stored Procedures have the benefit over dynamic SQL in terms of being able to re-use the Execution Plan. It might also be that the time it takes to parse and evaluate as valid SQL might be less in the case of Stored Procedures but I do not recall seeing this noted as a benefit for some reason.

One other benefit of a Stored Procedure API over dynamic / embedded SQL is that of controlling the data access in terms of protecting the integrity of the data and making sure the data model is properly adhered to.

So what about Hibernate and other ORM packages? To be clear, they do not all do the same thing; some are fully dynamic and others use static mappings from the DB and some are just code generators that build simple data access Stored Procedures that then get mapped. Hibernate can do fully dynamic or map to existing Stored Procedures. The developers I am working with advocate the fully dynamic option as it supposedly cuts down on a lot of development time. The question is: is this worth it in the long run given the security and performance concerns and even the minor concerns? To be fair, what has not been said so far in this discussion is that when Hibernate (and a few other ORM packages) say "dynamic", they are not meaning a concatenated text string that is submitted to the DB. A concatenated piece of text IS subject to the concerns of security (e.g. SQL Injection) as well as performance as it cannot re-use an execution plan. But Hibernate (and some others) use prepared statements that are parameterized. This is akin to using sp_executesql and passing in parameters. Using a prepared statement has two benefits over concatenated text: it uses parameters like Stored Procedures so it is not susceptible to SQL Injection and it can actually re-use its Execution Plan.

So does this information alone change anyone's mind about using such an approach? A few people and even articles on SQLServerCentral.com have pointed out that sometimes Execution Plans can hurt performance. But how often is this the case? And are Stored Procedures able to cache their Execution Plans for longer than prepared statements? Meaning, does a prepared statement have to run repeatedly in order to use the same Execution Plan or are those Plans cleared from the cache sooner than those of Stored Procedures so you wouldn’t see the benefit on a statement called once an hour, maybe? Also, according to Books Online, using a prepared statement will "probably" re-use a previous Execution Plan and is not guaranteed to do so. Does this tip the scale towards Stored Procedures?

If Hibernate's use of a prepared statement is good enough to be on par with Stored Procedures regarding the efficiency of using Execution Plans and is not susceptible to SQL Injection, it seems that the only real issue left is that of direct read / write access to the tables. Is this a major concern or acceptable to give up? In my opinion it should certainly be avoided but now I am not sure what I gain by restricting that access.

I advocated using a code generation tool (e.g. Code Smith) that builds the simple INSERT, UPDATE, DELETE, and SELECT Stored Procedures for all tables in the database. This can of course be customized using templates. In my thinking this would give us the benefit of being able to restrict direct read / write access to the database since the application would only interact via Stored Procedures. This in turn gives us the definite benefit of re-using Execution Plans. Of course, there is still the issue of sometimes Execution Plans do hurt if the first run of the procedure pulls back a vastly different size result set than other executions of that procedure. However, in the case of the generated Stored Procedures, they are all simple SELECTs so the result sets will rarely be vastly different in size. However, if all of the Stored Procedures are simple statements, then there is no true database API that will ensure proper use of the Data Model and ensure data integrity; the Stored Procedures in this case are merely a layer to gain the benefits of direct-access security and guaranteed Execution Plan re-use. And in terms of the direct-access security, what is the gain over allowing the application to submit its own ad-hoc queries? A compromised account can do just as much damage with INSERT, UPDATE, and DELETE procs for every table just as much as being able to issue their own SQL. Or is this not true? Does allowing read / write access as opposed to ONLY Execute permission on Stored Procedures open up areas of risk indirectly, such as possibly getting to system databases and/or system tables?

So, to me this is a slightly more complicated issue than previously presented, especially in light of Hibernate's (and other's) ability to use prepared statements and not submitting concatenated text which everyone was right to object to. Is the approach of using Hibernate to create dynamic, parameterized queries worth the amount of hours that the developers will save in creating the application? Would we be sacrificing too much by giving read / write access to the tables? If so, are the generated simple Stored Procedures a good compromise? I am not a huge fan of the business logic being in the DB as that is not what the DB does best, as much as I do appreciate the ability to control how the data model is used and trying to ensure data integrity. So a benefit of only simple Stored Procedures is that no longer are people who are not SQL experts writing Procedures that have inefficient JOINs, etc. The only Procedures left to write and debug for performance are reports. So this would save me some time as well. Of course, as long as we are not sacrificing query performance once we have millions of rows and not sacrificing security.

Again, any insight would be greatly appreciated. As per usual, the decision to use Hibernate to build its own SQL dynamically (no Stored Procedures) had already been partially made well before anyone on the database team was involved on the project so there is not too much time left to offer evidence against this if it is indeed a foolish decision.

Thanks in advance.

P.S. Here is the REAL kicker: the developers also wanted to use Hibernate to generate the Data Model directly from their object model. Supposedly Hibernate can generate the entire data structure for you once you build the objects, as opposed to getting a set of objects from the Data Model that was created by a lowly human.

SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR