RE: A Check and Foreign Key Constraint Improves Query Performance

SSCertifiable

Points: 6877

May 10, 2013 at 4:48 am

This is going to be a long reply, mostly because of much quoting, bear with me ?.

Evil Kraig F (5/8/2013)
peter-757102 (5/3/2013)
I would argue that most applications, notably the smaller to medium custom written, are better implemented using tight binding between database and application. For ease of development as well as performance and flexibility reasons and then some possibly beyond the scope of your own particular needs and experience!
Development is eased with direct binding but performance is rarely assisted. Also, you expose yourself to a number of dangers that dynamic SQL poses. I have worked in very small shops where they made practically six calls to 4 tables in a single database... and that's all they ever really needed. The tables could have been denormalized, but no reason, really. In this case... the proc is STILL the better choice. You've encapsulated the call for troubleshooting, removed injection concerns, and have better security separation between front and back end.
Additionally, tight binding and flexibility rarely go hand in hand. If you mean the flexibility for the devs to work in only their programming language to send code to SQL, then I'd agree. If you mean future programming flexibility, I'd disagree. Any change to schema would require you to re-write multiple code locations instead of a single proc acting as a function call for the front end.

Performance is assisted because you can create better tuned (dynamic) queries without procedural overhead in the database. Procedural logic to make SQL code adaptive is more efficiently handled in a language suited to procedural programming.

Another advantage of tight binding is that the SQL code that is written can use the same declared values for status (and other) codes as is used by the rest of the application. There still needs to be a match between these declarations and the actual status table contents of course. But at least you do not end up with unmaintainable constants in the SQL code as you would be forced to when writing stored procs!

Firing unneeded queries happens when people write code not knowing what that code really does behind the abstractions they use. This is the danger of abstraction and misuse is promoted by constructs like LINQ and ORM's (and yes, this is true for every hidden abstraction, thus stored procedures as well). Developers are using these abstractions most often because they never learned proper SQL. Which is a shame as any half decent level developer should know quite a bit of advanced SQL (I intend to write a piece on that someday, also for internal use). That there exists of a lot of bad database related code in applications "out there" just has another root cause then tight binding.

And just to be clear, I want developers to write SQL and become good at it! It promotes awareness of the data and structure they operate on and leads to better understanding, choices and in the end...better developers too. Just for this alone, schema awareness is essential and shielding the developer from it, is a very bad path to follow!

Your argument that changing the schema is easier with a stored proc, is comparing apples to oranges. That same stored proc can be implemented effectively in a single method in the application for application specific databases. And when a schema changes significantly, the application needs to change just the same! Not to mention the advantages of having that code in the same version control as the application!

Evil Kraig F (5/8/2013)
If some other application needs access to a specific part of the database you can fall back on stored procedures and provide customized access rights just for them to functionally provide what is needed securely and without being schema bound there.
And yet if the procs are already written you don't even need to do that.

You are thinking a one size fits all thought. What i write is that only in the rare exception (for most apps) you need a strored proc for security reasons, and only then you have to write one. Then a response of "it would be already there" doesn't make much sense! How would always writing procedures be better?

Evil Kraig F (5/8/2013)
In a public discussion of this matter it helps to acknowledge not every service is like facebook or a non-client specific application that is deployed over and over and just customized by third parties. Most applications I bet are custom written against a non-public database schema designs where the database is provided as part of the service an application provides and serviced by the application provider and no-one else.
You're correct, and that's the majority of my work. Now, are the front ends public or private depends on what we're doing. Most companies these days have some form of portal for their clientelle, and thus you must allow for eventual exposure of some kind of data to them. Now, is the majority of the work being done in the back? Sure.
Out of curiousity, have you ever worked with DB2? DB2 required the kind of tight binding you're speaking of. There's a reason it's basically out of the market except in legacy systems.

I have not worked with DB2. At the same time I do not get the feel you understand where I am coming from. An application consist of several layers, one is the data access layer (DAL for short) and even that can be subdivided in more layers.

What I am talking about that that at least the lowest layer of the DAL, there is 100% awareness of the database schema. Everything acting directly on one entity is found there, including constants related to the schema. Things spanning functional groups of entities can be found in higher DAL layers.

The decoupling between schema and interface in this approach does not happen at the application/database boundary, but instead above the DAL (which resides in the application). It is the interface where you generally find things out of direct alignment with the schema. Additional models supporting those interfaces represent the solution to this disconnect and keep both model groups clean. The DAL (database or otherwise) should not be inflicted with changes every time some part of the user interface is re-arranged.

Evil Kraig F (5/8/2013)
It just means business logic constraints that do not fit this declarative pattern can also be implemented in the application layer without any real drawback. But with the benefits this other approach brings, such as much better and efficient dynamic SQL and richer state management.
You'll have to define 'richer state management' to me. However, I agree with there being rules that are more efficient in the app layer than in the database layer. The reverse is also true. I'm more than happy to discuss with devs on the teams I assist which I feel would be best where and why. Particularly formatting/string manipulation, display, and row to row computations are certainly best done in an iterative environment instead of a set based one. Aggregations, data organization and associations, filtering, and usually mathmatics are normally best done at the DB level.

We are on the same page here I think. What I mean with “richer” is handling anything beyond the capabilities of the relational model and constraints. Think of keeping columns from different entities that are somehow related, in sync. You can write triggers for this too, but as usual it depends on the specifics if that is a good solution or not.

Evil Kraig F (5/8/2013)
However, there is a significant drawback if you just want to look at time to process a call to the DB. A proper string scraper in the front end to remove all possibilities of injection to your particular system can take anywhere from 15 to 20 milliseconds. No, that's not long. Every parameter for every call however has to be ran through that process. It's just one place where you're easing the time for development and placing it on the time to process.

15-20 milliseconds sounds like an awful lot to me, what I have in mind will be nowhere near even one millisecond! And the cost of parameter passing you will have is identical for stored procedure calls! And dynamic SQL, processing stored procedures isn’t free either and generally procedural logic is not efficient in SQL.

Evil Kraig F (5/8/2013)
Another thing is every dynamic call to the DB has to be uniquely compiled. It can't re-use cache plans. Now, cached plans have some of their own quirks but under 99.9% of the situations they're just fine.

You seem to assume that dynamic SQL send from the application DAL is somehow not parameterized. To be clear, all parameters not explicitly configured otherwise, are parameterized in what I am describing. It is just that all the sores of coding this can be removed from the programmer without alienating him/her from the actual schema and the SQL.

Evil Kraig F (5/8/2013)
Thirdly if the app is 'overhandling' data, you're dealing with a lot of unnecessary network traffic and waits.

Anything you put in a stored procedure has a counterpart in the application DAL. Thus only in situations where one stored procedure would call another stored procedure would there be extra traffic. If that is significant, then it is time to lift the combined logic to a higher level in the DAL and write specialized code for handling this situation in one step. Same as you would do with stored procedures really.

Evil Kraig F (5/8/2013)
Finally, and my most vehement argument, is eventually one of those little apps grows up. Your 'prototype' becomes the best thing since sliced bread and someone in accounting just dumped every spreadsheet for the last 30 years into it over a weekend. Now, since you've cut so many corners, you're looking at re-designing the entire thing into optimal calls. Wouldn't it have made sense just to have done it 'right' in the first place, and then tweak for volume?

You are misrepresenting the situation, there were no corners cut! What you will find in your stored procedure solution you now find in the application DAL, neatly isolated. Even more properly isolated then with procedures actually, since the DAL consists of several layers. The application does still the same, thus the models for the interface remain intact too.

Evil Kraig F (5/8/2013)
As for the requirement of dynamic SQL, nearly all application back-offices need this in spades for search and/or real-time reporting purposes.
Catch All Searches, I agree. And they can be done under the hood without significant problems. Real Time reporting you'll have to be a bit more explicit to me about. If you mean real time from the data, I hook SSRS reports to procs all day long off our OLTP system and they're used quite effeciently. Prior to SSRS ASP.NET served similar purposes for us, and so did MS Access with passthrough queries. Now, if you're talking dynamically built user reports, we're in a whole different conversation.

We don’t use catch all queries for general list views, filter conditions and sorting are added dynamically based on user input. Often we have reporting functionality build on aggregate tables and even user definable reports. Both type of systems work exclusively by generating SQL code dynamically on the fly.

Evil Kraig F (5/8/2013)
A) That's not common. Most firms expect Tech to build them for them. Part of why SSRS 2k8 went more dev driven than user driven, like SSRS 2k5 was.
B) If you've got power users at that level invest some time in SSAS and PowerPivot. It's a lot easier.

SSRS is awful from what I seen of it, I wouldn’t dare to sell such a solution if my life depended on it (ok, maybe I would in that extreme, but you get the idea)! It’s so disconnected from what a user really needs and totally not part of the application regular users use. It also requires its own database, backup procedures, users, has a bad interface, you name it. Nothing I would like to use as a developer either, it is just really bad all-around. In short application specific reporting functionality never needs this, let alone the dependencies that come with it.

Evil Kraig F (5/8/2013)
If I compare the issues we have in our applications developed this way and the number we find others have, it is staggering! We never have the issues related parties have (being unable to get anything data related to be consistent and/or correct). And it is not for lack of having to do complex stuff, it is just a matter of knowing what you do and have the basics in place, plus the proper knowledge level of SQL (this last part is an area that is pretty thin out there I am afraid).
If you have the proper level of SQL, putting it in a proc is no more difficult than writing the query and wrapping it in CREATE PROC @param1 INT, @param2 VARCHAR(50) AS ... GO

That is just overhead and does not allow for maintainable SQL code. It does however allow you to apply procedure lever security, which as I made clear most single purpose applications databases really do not require. The body of the stored procedure as you describe here can be executed as a batch just the same without wrapping a redundant interface around it.

In the end it is not about what is more difficult, it is about that develops easier and is cleaner. Having access to constants in an application DAL is one benefit, and source control is another. From a developer perspective having a stored procedure to call would still mean calling a proxy method for that stored procedure, which in turn is really a proxy for the SQL batch inside the procedure. That chain is overhead and complexity. Then consider aspects like error reporting when a query fails. The application being aware of the query allows much more detailed error reporting, allowing things to be fixed faster!

I have to say the C# environment I recently started developing in is quite underdeveloped the area of database access, as are the ORMs I seen for it so far (I think an ORM is for the most part a bad idea anyway). Microsoft and maybe the industry as a whole choose the path of least resistance and addressed symptoms instead of curing causes of bad code (that being lack of basic knowledge).

I think they have been on this path since the late 90s when garbage collection based languages gained traction. Since that event, I feel most code has become less clean, less efficient, is worse maintainable and has more bugs (be it of another kind, I never had memory leaks back in those days either tho). Everyone and their donkey can now write a few lines of code and get away with it for some time until what is made, simply implodes.

Evil Kraig F (5/8/2013)
Part of the reason is that most developers do not think in data, but code and at the same time DBA’s generally have the reverse mindset.
I couldn't agree with this specific statement more. Thus why I get 'enforced' rules of using things like CI for database scripts in storage. Ever try to CI an index column order modification, and not have it rebuild the index EVERY time the CI is ran?

Can’t say I have as I do not know of CI. I did look it up and maybe you mean CodeIgniter for PHP? I simply haven’t worked with that, and I wouldn’t as I don’t like PHP at all for various reasons. I haven’t checked the latest versions to see if they bettered their life.

Evil Kraig F (5/8/2013)
Both parties like to have total control and freedom in their own domain and comfort zone! That does not mix and it would be elevating if more people would step a bit further outside their own domain and comfort zone and gain relevant experience and insights that way.
Does it help you if you knew that I started as a programmer? That it helps me understand what my devs need? That I still give them bloody havoc when they start sending dynamic trash down the pipe that ignores the indexes in place and they're scanning a 400 gig audit table?
That last is mostly because they didn't bother to wrap it in sql_execute with parameters properly and didn't have me do a code review than because they wrote crap SQL. Sometimes you need to scan. Sometimes there's ways to do it without shared locking the entire table and thus locking the triggers that are trying to write to it from the main table.

I know what you mean, where I work there is no separate DBA role but a relatively high level of SQL knowledge and I expect new members to skill up too. Some problems simply require awareness of the schema and the insight to update the schema if that is beneficial. Other problems like SQL injection, query parameterization and when explicitly not wanted, no parameterization (for say a status value to take advantage of a filtered index), require a proper method of writing for and executing of SQL statements. This is where I found LINQ to SQL, LINQ in general and ORMs to be a major obstacle in nearly every aspect.

Evil Kraig F (5/8/2013)
Sorry for the long text, but after reading the latest discussion I don’t want to have people feel there is only one right way in what seems of this class between dba titans.
I agree, there is not only a single right way... if your machines can handle an unoptimized load on your database(s). Those also aren't the times when you need a SQL Dev professional like the majority of us who hang around here.

Not sure I agree with that, scaling issues can arise even with moderate growth in data. That is why even as a developer you need to know SQL and understand relational databases and not use them for dumb storage. Often you hear the mantra “premature optimization is evil”, where in practice it is mostly applied by people that create horrible inefficient solutions, that with the same amount of effort but more knowledge would run much better! Therefore my mantra is “performance by design” ?, much more positive!

Evil Kraig F (5/8/2013)
There are cases where I allow my front ends to send me down dynamic SQL. Particular cases where there is an exact scenario that makes for difficult or non-optimal T-SQL coding and it won't break security or expose the company awkwardly if the front end login information is compromised. However, data storage and retrieval logic will never fit directly into object/instance logic, you'll always require widgets of some form. Procs and their ilk are generally the most secure method and allow for both logic reusage and portability as well as separation of components during troubleshooting issues at 3 in the morning when some client 8 hours east of you on the planet is screaming holy rebellions because they can't check their cat's color for today.
If you want to ease development by avoiding them, that's your shop's call. Just hope the powers that be will give you time to rebuild from scratch once it gets out of scope.

I work on several projects that have activity in all time zones and that are “mission critical” for their particular customers. Granted the load we have is not that of say Facebook or twitter, but we do not face the issues you describe because we do not have that model mismatch to begin with! If something significant changes we do refactor the relevant code. We know exactly what needs to be changed and where.