Joe Celko tackles the subject of the Stored Procedure and its place in database design. What he writes is food for thought, even for experienced database developers.
In level one, we named data elements for what they
are and classified them. In level two, we modeled the data elements with data
types and simple row or column constraints in SQL to give us valid rows for each
occurrence of the entity. In level three, we put these rows into tables for
entities, relationships and auxiliary data. In level four we had the base tables,
connected them together, added secondary indexes and then build VIEWs.
This should give you a wonderful, but false,
feeling of being done. In a very simple database, this is true. But in a real
database there are more structures to consider. Among these other schema level
things are: cursors, triggers and stored procedures. There are more things like
collations, translations, privileges and suchlike. I am going to deal with only
those three things - cursors, triggers and stored procedures - I named and only
in general ways. T-SOL and other products can get highly proprietary in spite of
the ANSI/ISO Standards. The reason is simple; these were things that were built
on top of the preexisting file systems that early SQL products used. These
procedural constructs were ways of making up for the lack of declarative code in
the old products. The vendors have users who are locked into this "code museum"
today and cannot abandon their client base.
SQL allows procedural code modules to be kept in
the schema. While there is a SQL/PSM (Persistent Stored Module) language in
Standard SQL, you will probably use a proprietary vendor language like T-SQL.
These proprietary languages are usually members of the Algol family; that mean
they have an IF-THEN-ELSE, a WHILE loop and BEGIN-END blocks with scoping rules.
Most of these proprietary languages were never
meant to be used for applications development. The rules of thumb for T-SQL are
not to write any procedure over 50 lines (one page) and not to use PRINT. But in
fact, you could avoid all set oriented programming,treat each table like a file
and code as if the database is a procedural file system. You can also drive
nails into your body if you like pain.
The purpose of stored procedures is much like
VIEWs. They guarantee that all the users will do the same task the same way all
the time. VIEWs encapsulate a query and give it a name, thus creating a virtual
table. A procedure encapsulates UPDATEs, INSERTs, DELETEs and SELECTs in a
similar way, but it adds parameters.
A parameter is the "place holder" in the procedure
header and an argument is the actual value passed to the procedure. T-SQL
parameters used to be limited to simple scalar values. Today, they can be table
valued and XML strings. Let's start with the simple scalar parameters.
The compiler reads an argument and checks its data
type, range and other things to be sure it is valid. The compiler is doing a lot
more work than you might think. Consider this bit of T-SQL:
SELECT ISDATE ('2010-01-01'); -- TRUE
SELECT CAST ('2010-01-01' AS DATE); -- no problem
SELECT ISDATE ('010-01-01'; --TRUE
SELECT CAST ('010-01-01' AS DATE); -- error
Likewise, floating point notation works funny. If you cast the exponential
notation as DECIMAL or FLOAT, you have no surprises.
SELECT CAST (62.3E8 AS DECIMAL(18,5)); -- returns 6230000000.00000
SELECT CAST (62.3E8 AS FLOAT); -- returns 6230000000
But now put the exponential notation in a string and try to cast it to DECIMAL
or FLOAT, you do have a surprise, thus:
SELECT CAST ('62.3E8' AS DECIMAL(18,5)); -- error
SELECT CAST ('62.3E8' AS FLOAT); -- returns 6230000000
If you were parsing a string, you have to allow for the exponential notation
with code like this:
SELECT CAST(CAST ('62.3E8' AS FLOAT) AS DECIMAL(18,5)); -- 6230000000.00000
Now try passing these test values as parameters and
see how they behave.
CREATE PROCEDURE Test
SELECT @in_test_date, @in_test_decimal;
EXEC Test '2010-01-01', 62.3E8;
Returns (2010-01-01, 6230000000.00000) as desired.
You cannot pass an expression as an argument, but you can pass a local variable,
which is previously set with the value of an expression. That means this fails:
EXEC Test '2010-01-01', 62.3 * POWER(10, 8) ;
But this will succeed:
DECLARE @local_decimal DECIMAL (18,5);
SET @local_decimal = 62.3 * POWER(10, 8);
EXEC Test '2010-01-01', @local_decimal ;
You will see new SQL programmers try to pass arguments as XML or as CSV (Comma
Separated Values) list strings. They have added the overhead of a parser (the
XML parser or whatever code they wrote for splitting up the CSV string) and
endangered their data integrity. SQL Server can handle over 2,000 arguments,
which is more than enough power for any realistic situation.
Another underused feature in the parameter list is the default value. The syntax
is quite simple. The parameter declaration is followed by an equal sign and an
appropriate constant value.
If an argument is not provided, the default value is used instead.
CREATE PROCEDURE Test
(@in_test_date DATE = '2010-01-01',
@in_test_decimal DECIMAL(18,5) = 0.00000
SELECT @in_test_date, @in_test_decimal;
Unless you specifically assign the argument to the parameter, they are assigned
in left to right order,
EXEC Test; -- returns (2010-01-01, 0.00000)
EXEC Test '2010-12-25' -- returns (2010-12-25, 0.00000)
EXEC Test @in_test_date = 789; -- returns (2010-01-01, 789.00000)
Finally, a parameter can be used for output. This means it has to be a local
variable in the calling module, so the return value(s) have some place to go.
Here is an example that shows the syntax:
CREATE PROCEDURE Test
(@in_test_date DATE OUTPUT,
@in_test_decimal DECIMAL(18,5) OUTPUT)
SET @in_test_date = '2010-12-25';
SET @in_test_decimal = 789;
DECLARE @local_date DATE;
DECLARE @local_decimal DECIMAL(18,5);
SELECT @local_date, @local_decimal; --returns (NULL, NULL)
EXEC Test @local_date OUTPUT, @local_decimal OUTPUT
SELECT @local_date, @local_decimal; -- returns (2010-12-25, 789.00000)
Basics of Software Engineering
So much for the header; what about the body of a
procedure? Well, let's not jump into that right away. We want to worry about
principles first, then worry about the particular tools. We need a high level
view of how to approach coding -- Software Engineering.
The basics of Software Engineering do not change in
SQL. But they are realized differently. Most of us (with the exceptions of those
rare programmer who started in LISP, APL, FP, Haskell or other exotic languages)
learned a block structured programming language that evolved from Algol-60. The
same principles that apply to procedural code still apply to SQL procedures.
In the late 1970's, we found that we could write
better code (i.e. faster, provably correct, easier to maintain, etc) in
languages that had local scoping rules and code modules with one entry and one
exit point. We eliminated the GO TO statement and used a simple set of control
structures. This was the Structured Programming revolution.
Cohesion is how well a module does one and only one
thing; that it is logically coherent. The modules should have strong cohesion.
You ought to name the module in the format "<verb><object>", where the
"<object>" is a specific logical unit in the data model and “<verb>” is a single
clear action. There are several types of cohesion. We rank them going from the
worst form of cohesion to the best.
Coincidental cohesion is when parts of a module are
grouped arbitrarily. A Coincidental module is a train wreck of unrelated
actions. It is a “Lady GaGa, Squids and Automobiles” module whose description
would be a compound or complex sentence. The best example of this in SQL is the
OTLT (One True Lookup Table) design flaw. This is a loook-up table with ALL the
codes for the entire schema in it.
- Logical cohesion is when parts of a module are
grouped because they logically do the same thing, even if they are different by
nature. In SQL the most common example is a general module that does an update,
insert or delete on any table – it works on Lady GaGa, Squids or
Automobiles. Look for this to be implemented with dynamic SQL, XML and CLRs.
Cohesionis when parts of a module are
grouped by when they are processed. For example, a module which does all
the initialization for the whole system.
Cohesion is when parts of a module are
grouped because they always follow a certain sequence of execution. For example,
when a user logs on the database, we check user privileges and log the sign-in.
Instead of putting these things in separate modules under the control of a
superior, we write a monolithic block that does it all.
- Communicational cohesion is when parts of a module
are grouped because they operate on the same data element. Imagine a series of
UPDATE statements that affect the same column in one procedure, based on what
the previous module did.
- Sequential cohesion is when parts of a module are
grouped because the output from one part is the input to another part like an
assembly line. In SQL, look for the use of temporary tables as a replacement for
scratch tapes in a magnetic tape file system.
- Functional cohesion is when a module always
does a single well-defined task, like a mathematical function. This is
what we want in a module and it is the basis for functional programing
languages. There are no side effects or external data.
You can look up the detailed definitions, if you
missed them in your intro Software Engineering classes.
Coupling is how dependent modules are on each
other. If modules have to be used in a certain order, then they are strongly
coupled. If they can be executed independently of each other and put together
like Lego blocks, then they are loosely or weakly coupled. There are several
kinds of coupling, which are ranked from worse to best:
- Content coupling is when
one module modifies or relies on the internal workings of another module.
Control jumps in and out of module. In SQL, the use of UDFs and CLRs are the
prime example. You don't know what is going to happen until run time.
- Common coupling is when two
modules share the same global data. That sounds like tables in SQL, but it
refers to having two or ways to do the same task on common data. Doing the same
task soon becomes doing almost the same task.
- External coupling occurs
when two modules share an externally imposed data format, communication
protocol, or device interface. Ideally, we do not want to worry about the
external environment. This is why a tiered architecture does not do any
formatting or packaging in the database.
- Control coupling is one
module controlling the execution by passing flags. You have no idea what the
module will do until its master tells it. This is one of the reasons that BIT
flags are not a good SQL programming.
- Stamp coupling is when
modules share a composite data structure and use only a part of it. This can be
done with VIEWs in SQL.
- Data coupling is when
modules share simple data elements. Think about passing parameters; these are
the only data elements which are shared.
Coupling is the loosest type of
coupling. Modules are not dependent on each other, instead they use a public
interface to exchange parameter-less messages. This is more of an OO approach
but you see it in triggers, exception handlers and other SQL features.
This is covered briefly in a chapter on writing
stored procedures in my book on SQL PROGRAMMING STYLE. In the meantime, you can
read DeMarco, Yourdon, Constantine, Myers or several other of the pioneers of
Software Engineering. This is far more basic than SQL programming. This
is what you are supposed to know before you write any code in any
Good SQL Procedures
As a generalization, a good stored procedure has
high cohesion and loose coupling which it gets by not using the flow of
control structures unless absolutely needed. This is a surprise to procedural
programmers. The reason is to put as much "programming stuff" into the pure SQL
so that the optimizer can do its thing with the code.
How does that work? That is the next level on the stairway.