RE: A Check and Foreign Key Constraint Improves Query Performance

SSC Guru

Points: 100851

May 8, 2013 at 4:40 pm

EDIT: This is a textwall, I know, sorry for that but I got on a roll. Peter, a response to you starts about 1/3 of the way down. /EDIT

L' Eomot Inversé (5/3/2013)
I guess it depends what you call horrible. In my experience most developers write SQL that I would be ashamed to put my name to, but then so do most DBAs; The good developers and the good DBAs both write excellent SQL, nothing like that.

I fear there's a bit of semantisicm in there but I get your drift. What I wrote five years ago I would consider horrible today. However, there may be a locality difference here. DBAs are just another title for SQL Devs here in the states, and they're 'run of the mill' for me. App devs tend to write multiple nested loops and the like because of their familiarity with that form of processing and data manipulation.

Data protection law is pretty fierce in Europe (including UK), and many organisations are paranoid about it. I'm inclined to believe that paranoid caution is the right approach. ... Maybe the only solution is to altogether prevent people getting large chunks of data. I've not had to deal with extremely sensitive stuff as opposed not really sensitive personally identifiable data (I've tried to instill paranoia in the non-technical management even over that).

Some of this was definately from the misunderstanding of how you intend to use views. I'll get more into that below, but most of my contracts lately have been incredibly sensitive data sets... it's part of why I avoid the discussions completely on 'how easy it is to find people' from a few bits of data. I don't need the temptation. 😉

Dynamic SQL??? What on earth would that be for? I think you've misunderstood badly. I certainly don't want one-size-fits-all, every system I've been involved with since about 2000 has ended up (in some cases after I've overcome a lot of resistance from the backwoodsmen who want to embed DML statements directly in C++) as a system where the only thing the database presents to the outside world is a bunch of stored procedures.

Definately misunderstood, but I think we're coming to the point where our approaches were radically different to deal with the same problem.

I had misunderstood your earlier posts to say that you though views were a bad idea and it would be better to give the app access to the base schema; so my posts were arguing that using views was better than that, which seems to have given you the impression that I thought views were the right way to go. Talk about cross-misunderstandings!

*facepalm* Oh, lords yes. That was definately NOT what I meant! 😛

Tightly bound app-db integration is pure poison. If you have that you can't make upgrades of DB and app independent, and for me that was absolutely essential (a company goes bust if you can't do it [without] issue).
...
That I don't understand - you seem to be contradicting yourself: you will do all in your power to avoid the app and the db being wound tightly together, but you don't mind not having narrow modular boundaries? You can completely decouple the app and database tiers without narrow modular boundaries?

Okay, I apparently had no idea of what the definition of a 'narrow modular boundary' was... and possibly still don't.

Now, how did I deal with the multiple app front ends being 'updated' in pieces for a 24/7 five nines system? We used schemas bound to logins and made sure the front end code simply called for procs, not dbo.procs. Any proc that could work between two versions stayed as 'dbo'. Any proc that was a version output/input change (parameters or output columns/datatypes) was dbo for the old version, and schema'd to the new version. The schema would come in, get its own procs in its schema first, and then default to dbo if there wasn't one for its default schema name. Same way tables work. The next upgrade shuffled that schema down to the DBO level.

More than a single version being supported except during switchovers is where I was reacting to, not the logistics behind a five nines system upgrades, which is a different scenario. I've never had to try to support four+ versions off the same schema, and it's not something I've ever really deeply thought about.

@peter-2:

peter-757102 (5/3/2013)
I would argue that most applications, notably the smaller to medium custom written, are better implemented using tight binding between database and application. For ease of development as well as performance and flexibility reasons and then some possibly beyond the scope of your own particular needs and experience!

Development is eased with direct binding but performance is rarely assisted. Also, you expose yourself to a number of dangers that dynamic SQL poses. I have worked in very small shops where they made practically six calls to 4 tables in a single database... and that's all they ever really needed. The tables could have been denormalized, but no reason, really. In this case... the proc is STILL the better choice. You've encapsulated the call for troubleshooting, removed injection concerns, and have better security separation between front and back end.

Additionally, tight binding and flexibility rarely go hand in hand. If you mean the flexibility for the devs to work in only their programming language to send code to SQL, then I'd agree. If you mean future programming flexibility, I'd disagree. Any change to schema would require you to re-write multiple code locations instead of a single proc acting as a function call for the front end.

If some other application needs access to a specific part of the database you can fall back on stored procedures and provide customized access rights just for them to functionally provide what is needed securely and without being schema bound there.

And yet if the procs are already written you don't even need to do that.

In a public discussion of this matter it helps to acknowledge not every service is like facebook or a non-client specific application that is deployed over and over and just customized by third parties. Most applications I bet are custom written against a non-public database schema designs where the database is provided as part of the service an application provides and serviced by the application provider and no-one else.

You're correct, and that's the majority of my work. Now, are the front ends public or private depends on what we're doing. Most companies these days have some form of portal for their clientelle, and thus you must allow for eventual exposure of some kind of data to them. Now, is the majority of the work being done in the back? Sure.

Out of curiousity, have you ever worked with DB2? DB2 required the kind of tight binding you're speaking of. There's a reason it's basically out of the market except in legacy systems.

It just means business logic constraints that do not fit this declarative pattern can also be implemented in the application layer without any real drawback. But with the benefits this other approach brings, such as much better and efficient dynamic SQL and richer state management.

You'll have to define 'richer state management' to me. However, I agree with there being rules that are more efficient in the app layer than in the database layer. The reverse is also true. I'm more than happy to discuss with devs on the teams I assist which I feel would be best where and why. Particularly formatting/string manipulation, display, and row to row computations are certainly best done in an iterative environment instead of a set based one. Aggregations, data organization and associations, filtering, and usually mathmatics are normally best done at the DB level.

However, there is a significant drawback if you just want to look at time to process a call to the DB. A proper string scraper in the front end to remove all possibilities of injection to your particular system can take anywhere from 15 to 20 milliseconds. No, that's not long. Every parameter for every call however has to be ran through that process. It's just one place where you're easing the time for development and placing it on the time to process.

Another thing is every dynamic call to the DB has to be uniquely compiled. It can't re-use cache plans. Now, cached plans have some of their own quirks but under 99.9% of the situations they're just fine.

Thirdly if the app is 'overhandling' data, you're dealing with a lot of unnecessary network traffic and waits.

Finally, and my most vehement argument, is eventually one of those little apps grows up. Your 'prototype' becomes the best thing since sliced bread and someone in accounting just dumped every spreadsheet for the last 30 years into it over a weekend. Now, since you've cut so many corners, you're looking at re-designing the entire thing into optimal calls. Wouldn't it have made sense just to have done it 'right' in the first place, and then tweak for volume?

As for the requirement of dynamic SQL, nearly all application back-offices need this in spades for search and/or real-time reporting purposes.

Catch All Searches, I agree. And they can be done under the hood without significant problems. Real Time reporting you'll have to be a bit more explicit to me about. If you mean real time from the data, I hook SSRS reports to procs all day long off our OLTP system and they're used quite effeciently. Prior to SSRS ASP.NET served similar purposes for us, and so did MS Access with passthrough queries. Now, if you're talking dynamically built user reports, we're in a whole different conversation.

A) That's not common. Most firms expect Tech to build them for them. Part of why SSRS 2k8 went more dev driven than user driven, like SSRS 2k5 was.

B) If you've got power users at that level invest some time in SSAS and PowerPivot. It's a lot easier.

If I compare the issues we have in our applications developed this way and the number we find others have, it is staggering! We never have the issues related parties have (being unable to get anything data related to be consistent and/or correct). And it is not for lack of having to do complex stuff, it is just a matter of knowing what you do and have the basics in place, plus the proper knowledge level of SQL (this last part is an area that is pretty thin out there I am afraid).

If you have the proper level of SQL, putting it in a proc is no more difficult than writing the query and wrapping it in CREATE PROC @param1 INT, @param2 VARCHAR(50) AS ... GO

Part of the reason is that most developers do not think in data, but code and at the same time DBA’s generally have the reverse mindset.

I couldn't agree with this specific statement more. Thus why I get 'enforced' rules of using things like CI for database scripts in storage. Ever try to CI an index column order modification, and not have it rebuild the index EVERY time the CI is ran?

Both parties like to have total control and freedom in their own domain and comfort zone! That does not mix and it would be elevating if more people would step a bit further outside their own domain and comfort zone and gain relevant experience and insights that way.

Does it help you if you knew that I started as a programmer? That it helps me understand what my devs need? That I still give them bloody havoc when they start sending dynamic trash down the pipe that ignores the indexes in place and they're scanning a 400 gig audit table?

That last is mostly because they didn't bother to wrap it in sql_execute with parameters properly and didn't have me do a code review than because they wrote crap SQL. Sometimes you need to scan. Sometimes there's ways to do it without shared locking the entire table and thus locking the triggers that are trying to write to it from the main table.

Sorry for the long text, but after reading the latest discussion I don’t want to have people feel there is only one right way in what seems of this class between dba titans.

I agree, there is not only a single right way... if your machines can handle an unoptimized load on your database(s). Those also aren't the times when you need a SQL Dev professional like the majority of us who hang around here.

There are cases where I allow my front ends to send me down dynamic SQL. Particular cases where there is an exact scenario that makes for difficult or non-optimal T-SQL coding and it won't break security or expose the company awkwardly if the front end login information is compromised. However, data storage and retrieval logic will never fit directly into object/instance logic, you'll always require widgets of some form. Procs and their ilk are generally the most secure method and allow for both logic reusage and portability as well as separation of components during troubleshooting issues at 3 in the morning when some client 8 hours east of you on the planet is screaming holy rebellions because they can't check their cat's color for today.

If you want to ease development by avoiding them, that's your shop's call. Just hope the powers that be will give you time to rebuild from scratch once it gets out of scope.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA