Blog Post

Separation of interface from implementation

,

One of the supposed advantages of the EF is that it abstracts you from

your database implementation. This is considered to be a good thing. It

lets me move between different database products (SQL Server, Oracle,

MySQL, etc) without having to rewrite anything the business tier of my

application. I just need to hook up a different provider to the entity

framework and I'm done. All of my business logic and query construction

stays at the application layer.

Ironically, this paradigm, which

comes from the object oriented world, fails to recognize one of the

most basic tenents of object oriented design. The separation of

interface from implementation!

First of all, I'd better point

out the reason for this post. Why is it a bad thing to put the query

construction into automated tools like the entity framework or

nHibernate, or into the hands of the application layer more generally?

The

short answer is performance. Databases aren't given much respect in

many organizations. It's just a place to persist information, right?

No, of course not, that's completely wrong. A database is an extremely

complex system of physical constructs like indexes, partitions, columns

composed of data types, keys, etc. SQL databases also have a

standardized (though vendor extended) language for data retrieval and

modification. This, of course, is SQL.

SQL is not a procedural

language. SQL is a declarative language. Procedural programming fu do

not translate to SQL-fu, and vice versa.

A person focused on

database development generally knows at least a little something about

database design, indexes, partitions, keys, and of course SQL. There's

a lot to know. I've been doing it for 7 years and still learn new and

complex technical details about SQL Server every week, if not every

day. A person focused on database development also, as a rule, knows a

little something about SQL.

A procedural programmer, as a rule

does not. Or at least, obviously someone who has spent 10 years mostly

focused on, say, .NET won't know as much about it as someone who has

spent the same amount of time focused on, say, SQL Server. The people

who wrote an object-relational mapping tool might know a little, but

they don't necessarily. And more importantly, the tool certainly

doesn't know anything about your specific indexes, partitions,

performance requirements, storage system, etc.

Long story short,

this means that the ability to write sensible and efficient SQL for

your specific database will only ever be possible by someone with

knowledge of the logical and physical structure of that database, and

never by an automated tool, or a tool which translates one language

(like LINQ) to another (like SQL).

Now here's the key to this

post. An ORM layer which dictates certain database requirements and

builds your queries for you is COMPLETELY UNNECESSARY in order to

achieve the goal of database vendor independence. The belief that it

accomplishes this disregards the separation of interface from

implementation.

Your database developers can, and should, do

this for you by providing the application code with an API. From an

OOAD point of view, this is simply the separation of public methods

from private data, and has been around FOREVER. How do we accomplish

this? With stored procedures, or routines. If we switch vendors the

only thing we have to do is ensure that the database provides the same

set of routines to the application layer. The code inside those methods

is irrelevant to the API itself. I can build a GetCustomer routine

(stored proc) in SQL. If a decision is made to move to Oracle, all that

matters is that the GetCustomer routine is provided, taking the same

parameters and returning the same results. The fact that internally it

is built using PL/SQL instead of T-SQL is completely hidden from the

application layer.

This does mean that the code INSIDE the

routines may have to be rewritten. I say may because if standards

compliant SQL is used, it doesn't. You can literally do a copy and

paste. But the thing is, you probably want to rewrite that code anyway.

Why? For the reasons I discussed above. The performance of your queries

depends extremely heavily on your logical data model, your physical

database structures, and requires a deep knowledge of same. That's

precisely the problem with automated query generation tools... they

don't. Your performance can be orders of magnitude worse if you don't

tune your queries based on your structures, and vice versa.

Need

a concrete example? What about nHibernate. GUID's are usually needed

for object creation, so each object gets it's own unique GUID which is

then used for lookup. Using GUIDs in a database has the potential for

MASSIVE performance loss. I'm talking orders of magnitude on query

times, as well as vastly (VASTLY) increasing the physical size of the

database on disk. Why? The reasons are complex, which is exactly the

point. Care in how they are used is required to avoid this. But do the

application programmers know the pitfalls? No, they're application

programmers! They don't have to know anything about databases, that's

what nHibernate is for, right?

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating