Business Logic in the Database

2 January 2012 | jma

Chris Travers recently responded to Tony Marston’s critique of an earlier post where Chris advocated “intelligent databases”¹. Chris’ response is well reasoned, particularly his point that once a database is accessed by more than a single application or via third-party tools, it’s almost a given that one should attempt to push “intelligence” and business logic into the database if possible.

However, there is a paragraph in Tony’s post that merits further scrutiny:

The database has always been a dumb data store, with all the business logic held separately within the application. This is an old idea which is now reinforced by the single responsibility principle. It is the application code which is responsible for the application logic while the database is responsible for the storing and retrieval of data. Modern databases also have the ability to enforce data integrity, manage concurrency control, backup, recovery and replication while also controlling data access and maintaining database security.

If a database (actually, a shortening of DBMS—a combination of the data and the software that manages it) has always been dumb, then presumably one would never specify UNIQUE indexes. It is a business requirement that invoice or employee numbers be unique, so if all the business logic should reside in the application, then the DBA should only create a regular index and the application —all the applications and tools!— should enforce uniqueness.

Tony’s mention of “data integrity” is somewhat ambiguous because different people have varied understandings of what that covers. As C. J. Date points out, “integrity … is the part [of the relational model] that has changed [or evolved] the most over the years.”² Perhaps Tony believes that primary keys, and unique and referential constraints should be enforced by the DBMS, but apparently an integrity constraint such as “No supplier with status less than 20 supplies any part in a quantity greater than 500”³ should instead only be code in an application (make that all applications that access that database).

As for me, as I pointed out earlier, “constraints should be implemented, preferably declaratively, in the database schema” while “type constraints … should also be implemented in the user interface” (emphasis added). Ideally, the user interface should derive its code directly from the schema.

Update: Many thanks to everyone who participated in the discussion. I think we’ve covered just about every angle pro or con incorporating business logic in the database, so I’ve closed comments now.

Update 2: Chris has an update post that may be of interest.

¹ Interestingly, the first time I heard the term “intelligent database” it was from Dave Kellogg giving a marketing presentation for Ingres 6.3, which had incorporated some of the features in the UC Berkeley POSTGRES project.
² Date, C. J. An Introduction to Database Systems, 8th Edition. 2004, p. 253.
³ Ibid., p. 254.

57 thoughts on “Business Logic in the Database”

David Johnston says:

2 January 2012 at 20:12

I agree.

Single-responsibility is defined at the functional level; the at the server or framework level. There is some ambiguity when it comes to client-specific requirements (versus data model integrity requirements) but the don’t-repeat-yourself principle should be a larger motivator then trying to make everything fit nicely into (arbitrary) layers.

For all Tony’s talk about avoid unnecessary complexity I would have thought the idea of an intelligent, centralized, database would be appealing. Since his coding seems to be heavy in meta-data discovery pulling both the structure and the constraints from the database should be trivial.

Maybe the language features and infrastructure frameworks are currently less than optimal but the benefits of a data-centric and centralized foundation upon which many different programming languages already can readily bridge to means that the issue is one that is worth overcoming.
Pingback: Joe Abbate: Business Logic in the Database | PostgreSQL | Syngu
Roger Hunwicks says:

3 January 2012 at 04:22

I’m also a firm believer in pushing business logic into the database where possible (and cost-effective). Toon Koppelaars wrote an excellent paper many years ago called “A database centric approach to j2ee application development” (unfortunately I can’t find a current link), but his blog at http://thehelsinkideclaration.blogspot.com/2009_03_01_archive.html has a similar message.

It is worth bearing in mind that the many applications accessing the database may be developed over time. For example, I have worked on Oracle databases where all logic was in the database (using the Oracle Designer Table API and CDM Ruleframe) and which had a succession of user interfaces based on Forms 4.5 (client-server), Forms 9 (web) and PHP (web). iOS or Android is only a matter of time! The cost of developing these successor applications was tiny compared to the probable cost if all the business logic had to rebuilt each time. Toon’s observation that front end technology evolves much faster than database technology still holds true – at least for the use case of line-of-business / window-on-data type applications. I am not disputing that there is a set of problems for which NoSQL solutions are more appropriate.

The problem that I see with pushing business logic into the database, is that database development is much less efficient than java/python/ruby/php development – modern OO languages and their associated IDEs, unit test frameworks, continuous integration servers, etc. make developing and testing business logic fast (and therefore cheap). Developing the same functionality inside the database generally takes longer (and therefore costs more). You only get that money back if the database lasts for a long time.

This is one area where the Andromeda Project seemed to have promise – the ability to define common “automations” (their terminology) declaratively and have the required trigger code generated automatically. Have you considered implementing that feature in Pyrseas?
- jma says:
  
  3 January 2012 at 07:53
  
  Hi Roger,
  
  I mentioned Toon’s thoughts in an earlier post. With regard to Andromeda, yes, I am working on something similar. My series of posts tagged “user interfaces” relate to that, in particular the post on Design Notes on Database Application Development reflects my current thinking.
  - Roger Hunwicks says:
    
    3 January 2012 at 08:47
    
    My reading of that post was that dbappgen would use the same .yaml data dictionary as the database management function (i.e. yamltodb / dmtoyaml) and would use it to create an app based on it. Based on your other posts I had assumed that the app would be web-based and written in Python using a light-weight framework if you can find one you think is suitable.
    
    This is where the Andromeda Project was going before deciding that the very useful database functionality was being lost in the attempt to create a PHP app framework that would then be ignored by anyone using another language or a different PHP framework – hence Triangulum DB.
    
    For example, we have developed an application using the Zend Framework (a popular PHP framework) which uses a data dictionary (currently stored in a database, rather a configuration file) and which can produce CRUD screens for all tables defined in the data dictionary, including “jump” buttons to parent records, and automatic master-detail forms for tables with child tables (as defined by a foreign key from the child to the parent). The child records are displayed in a tabbed component, with a separate tab for each child table with a foreign key to this parent table.
    
    This is not as seamless as your proposed dbappgen because currently we maintain the database and the metadata separately – i.e. we maintain migration scripts to put the database into the correct state, and the metadata is only used to control how the web application works.
    
    I would like to have a single data dictionary serving as the basis of the web application and the database, hence my interest in Pyrseas. However, from my point of view this doesn’t mean I want Pyrseas to produce a python app for me – it means that I want Pyrseas to generate business logic in the database to do things that I am currently doing in PHP: e.g. automatic insert and update timestamps (with associated user stamp), denormalizations (e.g. mainly copying grandparent primary key into a child table to simplify joins – this was called a FETCH in Andromeda), automatically maintaining a history table (FWIW my preference is for an Oracle Designer-style journal table), etc.
    
    In order for me to have a single data dictionary, the .yaml file would also need to support additional metadata (even if that just means ignoring it) not required to create the database objects. For example, our dictionary stores the prompt for each column, the display type (checkbox, select list, etc. – this can normally be derived – boolean = checkbox, foreign key = select list, etc. – but does need to be set manually sometimes), the display length – which is shorter than the maximum length for some columns, etc. Similarly, for foreign key we store whether the child table should be displayed on the master-detail form and if so what position it should appear in.
    
    If you take Pyrseas in this direction I would love to help, although I’m afraid my coding skills are all on the PHP / Postgresql side of the fence.
  - jma says:
    
    3 January 2012 at 15:17
    
    The single data dictionary (.yaml file) is not appealing. Instead, I’m thinking of multiple, application-level .yaml files, that supplement the database dictionary (the single .yaml file that parallels the PostgreSQL catalogs). The application-level files would support additional metadata because they would drive the application. The initial app-level files could build the additional metadata by default, e.g., given that your app will insert into the emp table, it could default the prompt for emp_name to “Emp name”, the help text to the COMMENT text from the database if available, etc.
    
    The question as to whether to generate business logic in the database is one that I’m still cogitating. From what I saw/recall, Andromeda created a single procedure per table to implement all its integrity constraints, instead of relying on schema-defined FOREIGN KEYs, CHECK CONSTRAINTs, etc. Ironically, IIRC the “intelligence”/business logic was in the database but not automatically activated. I’d rather use the PostgreSQL capabilities, but in general that means the DBA has to define them before the app developer can make use of them, and as you mentioned database development is less efficient than app development, so it seems like we need a happy/happier medium.
  - Roger Hunwicks says:
    
    4 January 2012 at 15:43
    
    Having separate .yaml files for the database and the application makes a lot of sense. I suppose that having multiple database .yaml files might also make sense if each file represents a discrete set of database objects, akin to a module in Andromeda terms.
    
    With regard to putting the logic into the database, I think we (and most other people in this discussion) agree that we should use the tools that the database makes available to us in order to ensure the integrity of the data.
    
    I think we should use the native functionality where it exists, i.e. use check constraints in preference to triggers. Many old databases use triggers (or just application code) for these sorts of checks, for example because there is an occasional requirement to disable them, or because they are sometimes invalid during (but not at the end of) the transaction. I would guess that is the reason for Andromeda’s approach. Obviously, now that we can defer constraints one those reasons no longer applies!
    
    However, there is some functionality (e.g. denormalisations, audit fields and history tables) that seems to require triggers, and probably requires lots of boilerplate code. Putting this code in the database would be cost-effective if I could declare in a .yaml file “fetch the value of this column from this parent table via this foreign key” and rely on the tool (i.e. Pyrseas) to create triggers on the table in question to fetch that value using the foreign key AND to create a trigger on the parent table to cascade the value to the child table if it is changed in the parent. If I have to write these triggers manually then they will be prone to bugs (even though the code is mostly copy and paste) and probably not implemented – the high estimate for doing them properly will mean they never get high enough up the priority list in any given increment.
  - jma says:
    
    4 January 2012 at 16:47
    
    Agreed that the functionality in the last paragraph would be best handled (somehow) by the “framework.” However, aside from audit fields (which can be done with triggers) I haven’t given much thought to that “somehow.”
  - Roger Hunwicks says:
    
    5 January 2012 at 00:04
    
    Would it be helpful if I created some issues in the Pyrseas tracker to act as a starting point for discussion?
  - jma says:
    
    9 January 2012 at 17:19
    
    Roger, it seems I missed your offer before. Yes, you can add issues here.
Tony Marston says:

3 January 2012 at 07:28

You go too far by implying that by a “dumb” database I meant that it should not deal with unique keys, indices and data types. Of course the DBMS should ensure that each record that is written conforms to its schema, but when you start loading the database with business logic which normally resides within the application then your database becomes the bottleneck.

I recently received an email from a newbie programmer who had read my article http://www.tonymarston.net/php-mysql/databaseobjects.html and wanted to know why I didn’t offload more processing to the database, such as zero-filling numbers and checking for integers that are out of bounds. I shall give you the same answer that I gave him:

– How a field is displayed, whether a number is zero-filled or blank-when-zero, is part of the presentation logic not the data access logic, therefore is not specified in or controlled by the database.
– Checking that a value conforms to a column’s data type is primarily handled within the application. Thus if a particular value is found to be invalid the application can send it back to the user with a suitable error message and ask for the value to be re-input. If the SQL query fails then it is *ALWAYS* treated as a fatal error because of some fault within the application, and so the application aborts.

I have seen some programmers attempt to relegate data validation to the database, then they have the task of examining the error codes to check if it is actually a recoverable or non-recoverable error. If it is recoverable then they may need to edit the error message so that it is more meaningful to the user. This has always resulted in ugly code, so I avoid it like the plague.

As for our comment “constraints should be implemented, preferably declaratively, in the database schema while type constraints … should also be implemented in the user interface. Ideally, the user interface should derive its code directly from the schema” I couldn’t agree more as that is precisely how my framework works. I import the data structures into my data dictionary, and when I export them to my application I produce two files for each database table – a class file and a structure file. The structure file identifies all the columns and their data types, the primary key, any unique keys, and all relationships (where this table is the parent and also where it is the child). This information is used by standard code within the framework to validate the data before the SQL query is constructed. Relationship details are used in two ways:

– When a SELECT is constructed for a table then JOINs to its parent tables can be automatically included.
– Before a DELETE is actioned then a table’s children can be checked to see whether a particular relationship is marked as CASCADE, NULLIFY or RESTRICTED, and the appropriate action is taken or error message is displayed.

So it appears that our thinking may not be too far apart after all.
- jma says:
  
  3 January 2012 at 13:10
  
  Hi Tony,
  
  I think you may have a narrower view of what an integrity constraint may be than I do. To take an extreme, one could define a quantity on hand (qoh) column as a TEXT datatype. Then it would be up to the application to validate that the qoh entered was all numeric, without decimals, positive or zero, and within a range suitable to the particular business. Of course, most sane DBAs and developers would define qoh as INTEGER in the database and int or similar in the application.
  
  Where do you draw the line on ensuring that qoh isn’t changed to a negative value or an enormous value for a particular item? If your answer is “only in the application,” I contend you’ll be surprised when a user who has been given access via ODBC/JDBC spreadsheet updates qoh to -1 or 99999, because he wants to attribute some “special” meaning to those values. Or when a month-end script run outside of your application changes values incorrectly. That’s why I believe most sane DBAs would prefer to add a CHECK (qoh BETWEEN 0 AND 10000). If an application wants to implement that validation as well, it’s welcome to do so.
  
  Furthermore, I presume most DBAs will want to disallow most direct updates of qoh, but instead insist on some trigger or similar mechanism, so that even the CFO can safely be granted access to inventory from her smartphone.
  
  Presentation (displaying qoh in hex or Roman numerals) is up to the application. Integrity (validating qoh) can be done by an application but IMHO ought to reside in the database/DBMS.
  - Tony Marston says:
    
    3 January 2012 at 13:39
    
    I would never define an integer column as a string in the database and have the application validate it as a number. That’s a ridiculous suggestion. I would always define a column with the most relevant data type, but my application would still validate that the value was within range before sending it to the database.
    
    My validation code uses data obtained from the INFORMATION_SCHEMA to decide how to validate each field (integers, dates, decimals, etc). This data is imported into my data dictionary then exported to the application in the form of a structure file for each table.
  - jma says:
    
    3 January 2012 at 14:28
    
    Yes, of course it’s a ridiculous suggestion. I did say it was extreme, like my earlier mention of not defining unique keys. The point I’m trying to make is that those are business requirements, those are the semantics of the data from a business perspective, and there are gradations of specifying and checking those requirements.
    
    For example, a hypothetical (non-SQL) DBMS could ask the DBA to define the type of data (number or text) and its length, and choose to store all columns as Unicode characters, validating them without converting into another type. When someone defines qoh as INTEGER they’re providing an incomplete specification of the semantics. A CHECK constraint enhances that specification, as when they add some other business constraint via a procedure or trigger.
    
    If your application only validates the type of data that’s a very narrow view of data integrity. If the application extends to checking for referential integrity, without FOREIGN KEY constraints, then the database is vulnerable to any access outside of your application. Similarly, if your application enforces other business requirements, e.g., decreases qoh when a shipment is made, then the database is also vulnerable to those requirements not being observed by any external tool, including simple SQL access by a DBA.
  - VG says:
    
    9 January 2012 at 10:22
    
    I find this dicsussion interesting, but I’d like to point out that consideration should be given to the liklihood of the business rule changing. It is highly unlikely that your qoh field would need to contain non-numeric characters, so narrowing the definition to Int instead of String makes sense. However, what happens if the check constraint now needs to restrict the amount to 1-5000? Then changing that check constraint, while older records still exist (and should) that are greater than 5000, would be trickier to do. So I’m with Tony in that this business logic should live at the application layer.
  - jma says:
    
    9 January 2012 at 10:33
    
    I don’t find that example very convincing. If the business rules change in that manner, I’d say the semantics of the data have changed as well and we’d need to introduce additional columns/tables in the database in order to adapt to it.
  - Tony Marston says:
    
    9 January 2012 at 11:19
    
    I totally disagree, and I think a lot of other people would too. If a column’s data type does not change, but the business rules regarding the range of values that it may contain do, then most people would simply update the business rules in their application and leave the database alone. Adding a new column with a different name but exactly the same data type is only something a masochist would contemplate.
  - jma says:
    
    9 January 2012 at 12:58
    
    I didn’t say anything about “adding a new column with a different name but exactly the same data type.” I said “introduce additional columns/tables in the database in order to adapt to it [the change in the business rules].” For example, if the rule for qoh is going to fluctuate based on some date, then it may be necessary to add a date column, or if the rule for qoh now depends on the type of product category, then maybe the category has to be added there–or more likely, a separate table containing the rule parameters can be introduced. For more on this, I recommend Ken Downs’ Minimize Code, Maximize Data.
  - Roger Hunwicks says:
    
    9 January 2012 at 14:23
    
    I agree that if the business rules change we should change the code that enforces them in preference to changing the data model. That does not really speak to “where is the best place to enforce the business rules”. For example, if we know from speaking to the users that the maximum quantity on hand is likely to change, then we probably stored it in a system parameter and the check constraint can call a function that retrieves the current value of the parameter to perform the check. In our case, we have a system parameter values table with start and end dates, so we can call the function with the transaction date, so the check constraint is still valid for the old rows that have a different allowed maximum. Obviously, this example is wholly artificial – it is more likely the maximum allowed qoh varies by product or product type, etc. and would be stored as an attribute against the relevant entity.
    
    I think the argument still stands – unless you can be 100% certain that your database will never (now or in the future) be updated by anything other than your application code, there is value is enforcing data integrity in the database. To me the only question is how much value, for what cost? I.e. we need to consider the cost-benefit analysis of putting data integrity rules inside the database.
    
    Writing trigger code to enforce business rules is generally less efficient than writing application code, so if the risk is small we might choose to enforce complicated business rules in the application only. On the other hand, foreign key constraints, check constraints, domains and enums allow us to enforce some rules declaratively, and therefore cheaply. I find it cost effective to use these tools, and that they cover the majority of the data integrity rules.
    
    If I was writing Tony’s framework I would have application-side checks to enforce referential integrity, but I would also have database-side referential integrity constraints for all relations within a database, given how easy and cheap they are to create and enforce. I would only be relying on the application framework to enforce the cross-database ones. Any updates to either database from outside the application are only a risk if they are trying to update data that is part of the relation – and I would be tempted to try and write code inside the databases to prevent that. Incidentally, does anyone know if Postgresql foreign data wrappers and be used to create a foreign key from a postgresql child table to a externally managed parent table?
    
    Furthermore, I would also prefer to enforce complicated business rules inside the database if they can be implemented cost-effectively. Oracle Consulting had a great product called CDM Ruleframe that could be used to enforce complex business rules almost declaratively (and certainly cost-effectively) including the ability to turn off individual rules on an per-session basis (with the correct privileges) and to report all errors together at the end of a transaction. If such a tool existed for Postgresql (or I could find the time to write it) I would certainly use it; hence my desire to encourage Pyrseas to support declarative implementation of some business logic that is not supported natively by Postgresql.
  - jma says:
    
    9 January 2012 at 17:10
    
    Re: Postgres FDWs: Out of the box, the capability for cross-database REFERENCES is not there yet (but of course, it being extensible, one could add it).
    
    Re: CDM Ruleframe: I’ve been meaning to take a look at Toon Koppelaars RuleGen which is a replacement for that, albeit also for Oracle. I’ve also been meaning to check out Toon’s latest at Triggers Considered Harmful, Considered Harmful.
  - Roger Hunwicks says:
    
    9 January 2012 at 14:27
    
    I missed jma’s reply before I posted my response. With the added clarification that we adding the necessary data to describe the business requirements – I agree with him completely.
David Johnston says:

3 January 2012 at 09:19

Fully agree regarding presentation decisions but the database designer should do everything possible to prevent bad data from being saved; whether intentional or due to application oversight/laziness/ignorance. Yes, it makes for more work in the applicationS but ugly code or not the end result is necessary if you care about having good data and cannot force all transactions to go through THE master API.

If you indeed can deal with meta-data then do so for the constraints as well and you will never get any constraint violation errors, from the database, in the first place. But if you forget one or hit the tables directly for some reason, you still have a layer of defense in place.

By putting the constraints on, or directly above, the tables you minimize the risk that an application or user can enter bad into the database. Yes, this mitigation has costs and so if your exposure is low (or tolerance for bad is high) then leaving the condom off of the database may be acceptable.
Richard Gibson says:

3 January 2012 at 11:03

Tony Marston states in his article “Stored Procedures are EVIL” that he doesn’t use any foreign keys, so I’m not sure what integrity he is talking about, as his tables will never rely on anything else but themselves.
His articles appear to be adverts for his PHP framework which doesn’t appear to accommodate complex table designs or where data is pulled from multiple tables by a single object (as far as I can tell), which is odd given his push for normalisation (which is good). It looks very limiting.
Tony Marston says:

3 January 2012 at 11:37

I don’t use foreign key constraints in my database as I can do all that is necessary within my framework, including turning off the restraints should it be necessary. The framework does this using the relationship definitions which are exported from my data dictionary, so no additional programmer coding is required.

I can assure you that my framework *CAN* deal with complex tables designs. If I want to obtain data from multiple tables in a single operation I use an sql JOIN, which is far more efficient than going through a separate object for each table’s data.

I can even write to several tables within a single operation. Suppose I have some data which must be written to Table A, Table B and Table C – I pass the data to an object for Table A, then in the post_insert() method call the insertRecord() methods for Table B and C. What’s so difficult about that?
David Johnston says:

3 January 2012 at 11:53

So it sounds like the reason you don’t recommend putting stuff like Foreign keys into the database itself is that your framework then becomes obsolete. I presume that you would avoid updatable views for the same reason. You apparently agree that having Foreign keys is important so what does it really matter if they are defined on the table themselves or in a higher-level framework?

Instead of re-inventing the wheel maybe you need to modify your framework to play more nicely with others by letting the database take care of definitions and focusing more on bridging the gap between data entry and data storage.

How does your implementation improve upon what most databases can do natively and/or mitigate their short-comings? I can see, to some degree, adding features to reduce database lock-in but if someone is going to use foreign keys they should pick a database that uses them natively. Introducing external concurrency issues alone makes doing it externally problematic.
Tony Marston says:

3 January 2012 at 12:19

One limitation of foreign key constraints is that they can only be defined when both tables exist in the database – I use multiple databases with relationships across different databases, so this can only be handled within my application.

Another limitation is that constraints, once defined within the database, cannot be turned off and on again at will. They can within my framework.

Another point is that my ERP package is used by several customers, and each may require the addition of custom logic to either replace or enhance the standard code. Sometimes this customisation may require an additional table which has a relationship with an existing table, and I have found it easier to define this “new” relationship within my application logic than with amending the database schema.
Tony Marston says:

4 January 2012 at 06:21

@ jma:
> For example, a hypothetical (non-SQL) DBMS could ask the DBA to define the type of data
> (number or text) and its length, and choose to store all columns as Unicode characters,
> validating them without converting into another type.

This is irrelevant. I do *NOT* use a non-SQL database, and I *DO* define my database with all the relevant data types for each column. I have standard code which validates that the value for each column is consisent with its data type before any sql INSERT or UPDATE query is executed.

> When someone defines qoh as INTEGER they’re providing an incomplete specification of the
> semantics. A CHECK constraint enhances that specification, as when they add some other
> business constraint via a procedure or trigger.

If a column is to hold an INTEGER value then I define its data type as INTEGER, but my application will validate that the given value is an integer before constructing any SQL query. Any additional checking, such as the value must be positive, or in the range 1-1000, is also performed within the application.

> If your application only validates the type of data that’s a very narrow view of data integrity.

I never said that it *ONLY* validates the type of data. You can have whatever extra validation you want. Some of this extra validation, such as an integer value must be in the range 1-1000, can be added to the data dictionary so that it can be performed autoatically by the application. Other validation is added to table’s class in the form of custom code.

> If the application extends to checking for referential integrity, without FOREIGN KEY constraints,
> then the database is vulnerable to any access outside of your application

I repeat, my database is *NEVER* accessed by an external application. It is a web application, so any external access is via web services which will use my applcation’s table classes.

> Similarly, if your application enforces other business requirements, e.g., decreases qoh when a
> shipment is made, then the database is also vulnerable to those requirements not being
> observed by any external tool, including simple SQL access by a DBA.

I ensure that there are sufficient transactions within the application to adjust any values, and some of these transactions are only available to administrators, so I *NEVER* have to go directly into the database to make adjustments.
- jma says:
  
  4 January 2012 at 08:35
  
  You’re so lucky that your database is NEVER accessed outside of your application! I suspect most other people aren’t so lucky.
  - Tony Marston says:
    
    4 January 2012 at 08:50
    
    Luck has nothing to do with it. Nobody is allowed direct access to my databases without my permission, and I never grant that permission. Access is either through my application or through my web services.
- Thomas says:
  
  12 January 2012 at 03:00
  
  While I realize you think your systems are confined to only being accessed by your applications, in the 25 years I’ve worked with databases, I’ve never encountered of a situation where such a restriction could be enforced indefinitely. Typically, reporting kills this idea. Many companies want to hire less expensive report designers that can simply access the database read-only. Another requirement that kills this idea of exclusive access is integrations which will push and pull data to the database and requiring that they go through an object model will large volumes of data is probably more costly. Lastly, just changes in personnel will often kill this idea. You win the lottery and go off to your island paradise and the next person yanks that restriction. The database should be designed to protect itself against direct access queries which can accidentally and typically screw up the data and by far the easiest mistake that can be made is orphaning rows.
  - Tony Marston says:
    
    12 January 2012 at 11:37
    
    > Typically, reporting kills this idea. Many companies want to hire less expensive report designers
    > that can simply access the database read-only.
    
    If access to a reporting tool is read only, then it does not need to access any data validation rules whether they are in the database or the application.
Bob Henkel (@TeamHenkel) says:

4 January 2012 at 12:49

“Luck has nothing to do with it.” That is still a lucky situation or maybe a better wording is specific situation. Depending on your definition of user making the user go through a web service or specific application isn’t an option. What if you have 100’s of developers who needed different levels of access in different environments(dev, qa, ctt, and prod to name a few). In this case as a DBA I consider the developers users and they require special access to do their jobs separate from the customer user who is using a gui on top of the database. Just my 2cents for the day.
Richard Gibson says:

5 January 2012 at 04:00

Perhaps a collection of CSV files is all that is needed for Tony’s framework? This would remove the burden of a relational database system that irritatingly implements foreign keys in the database 🙂

Only kidding. It seems that half of the useful features of a DB are being implemented higher up in Tony’s framework, which does seem like reinventing the wheel, and I suspect would not be ideal in a massive multi-user environment because how can his PHP framework enforce single access to the background data store? It can’t, unless it is using a single connection to the data store, which is useless for concurrent users.

I would argue that using the features of a DB system would be better, even if that means learning the idiosyncrasies of different systems. You could write or use a framework that abstracts away the differences of the underlying systems if necessary, which probably isn’t necessary in many instances because you write for the database, not writing for a database system that might never be needed in a solution anyway!

In any case, I suspect the users of the features of databases and their approach amounts to a greater number than users of Tony’s framework, which I suppose is strong evidence to suggest that the former’s approach is not wrong and is not a broken approach. If it were, why is everyone not using Tony’s framework?

Where can I buy it?
- Tony Marston says:
  
  5 January 2012 at 05:17
  
  This is a totally ridiculous comment, but one which I have come to expect from my critics who can’t stand the fact that I are to hold an opinion which is contrary to theirs.
  
  No, a collection of CSV files would not be sufficient for my needs.
  Yes, I do use an DBMS (actually my users can choose between 4 – MySQL, PostgeSQL, Oracle and SQL Server).
  No, my approach is not “reinventing the wheel”. Data validation and referential intregrity were performed in application code long before they were available in database systems. I only use the features that I want to use. I choose not to use foreign key constraints (they don’t work across database boundaries), stored proedures and database triggers.
  No, my framework does not enforce single-user access to the database, it has always been mullti-user.
  No, you can’t buy my framework. It’s open source, so you can download it for free.
  
  Do you have any more stupid comments?
  - jma says:
    
    5 January 2012 at 13:29
    Tony, I would appreciate if you refrained from calling other people “stupid” or similar epithets.
    
    With regard to “referential intregrity were performed in application code long before they were available in database systems,” I beg to differ: IMS, IDMS and various other early DBMSs implemented that from the very beginning. It’s only relational DBMSs that didn’t implement that feature early on. As for validations, the UC Berkeley INGRES implemented an INTEGRITY CONSTRAINT statement, e.g.,
    
    RANGE OF E IS EMP INTEGRITY CONSTRAINT IS E. SALARY > 8000
    
    That’s from “The Design and Implementation of INGRES”, Stonebraker et al., in ACM TODS, Sept. 1976.
  - Tony Marston says:
    
    5 January 2012 at 13:47
    
    How come the chapter on foreign keys in the latest version (9.1 at http://www.postgresql.org/docs/9.1/interactive/ddl-constraints.html#DDL-CONSTRAINTS-FK makes no mention of this?
  - jma says:
    
    5 January 2012 at 14:02
    
    I suspect it’s because in general any syntax element referring to a schema-owned object, like REFERENCES <reftable>, is understood to be schema qualified, unless otherwise stated. See for example “Column References” here
  - Tony Marston says:
    
    10 January 2012 at 11:42
    
    Just because “some” databases implemented column constraints is not a valid argument when you consider that “most” programmers (including myself) were limited to database systems which did not. That is why most programmers still implement such constraints within their application code.
  - jma says:
    
    10 January 2012 at 14:31
    
    SQL-92 included CHECK constraints, and I’m pretty sure that by 1999 all major commercial and open source RDBMSs supported that feature (the only potential exception is MySQL, but according to the 3.23/4.0/4.1 manual–3.23 came out in 2000/2001–CHECK was supported). So it’s been available broadly for at least a decade.
  - Tony Marston says:
    
    11 January 2012 at 13:41
    
    Just because constraints became generally available by the turn of the century did not suddenly change their usage from being “optional” to “mandatory”. I have been building business logic into my applications for several decades, and the few times I have worked for organisations which preferred the “modern” method of using stored procedures/database triggers/etc it has always been the case that it is more difficult, more time consuming and therefore more costly. When development times and costs are major issues then the “old fashioned” (or “tried and trusted”) method is still a clear winner.
  - Roger Hunwicks says:
    
    12 January 2012 at 07:23
    
    Tony said:
    >I have worked for organisations which preferred the “modern” method of using stored
    > procedures/database triggers/etc it has always been the case that it is more difficult, more
    > time consuming and therefore more costly.
    
    I don’t know that stored procedures is the modern method – I would have thought the Agile guys would have said ORMs or ActiveRecord or something like that was the modern method.
    
    I agree that coding business logic using stored procedures and/or triggers can be more time-consuming and therefore expensive that writing application-side code. It is up to the client whether the value of the data and/or the anticipated lifespan of the database warrant that approach.
    
    However, domains/enums, foreign keys, unique keys and check constraints are neither time consuming nor expensive to implement, which is why most people in this discussion recommend using them.
  - Tony Marston says:
    
    12 January 2012 at 11:56
    
    > I don’t know that stored procedures is the modern method – I would have thought the Agile guys
    > would have said ORMs or ActiveRecord or something like that was the modern method.
    
    ORMs? Phooey!! They are evil! (http://www.tonymarston.net/php-mysql/object-relational-mappers-are-evil.html)
    
    > I agree that coding business logic using stored procedures and/or triggers can be more
    > time-consuming and therefore expensive that writing application-side code.
    
    So I’m not the only one who thinks that, then.
    
    > However, domains/enums, foreign keys, unique keys and check constraints are neither
    > time consuming nor expensive to implement, which is why most people in this discussion
    > recommend using them.
    
    Contrary to wild opinions from some of my critics I actually *DO* use primary keys and candidate keys. Enums are not yet standard across all DBMS systems, and as my application may be deployed on any one of several DBMS systems I find myself sticking to the lowest commeon denominator. I don’t use check constraints in my database as all data is validated in my application *BEFORE* it is sent to the database, so using check constraints would be duplicate effort. I have always validated foreign keys within my application as I was coding long before FK constraints became standard practice in the organisations where I worked. Note that “became standard practice” is not te same as “became available”. On the few times I have seen an organisation try to use FK constraints in the database it never went as smoothly as expected.
Robert Young says:

5 January 2012 at 11:25

— Do you have any more stupid comments?

Considering that you’re championing the failed COBOL/VSAM/IMS paradigm that Dr. Codd put a stake in, why aren’t you the one making such comments? The Goths put Europe into the Dark Ages by enforcing the pre-intelligence paradigm. Those, such as yourself, seek to return to the thrilling days of yesteryear, where each application was its own silo, and data sat in dumb files accessible only through bespoke code. Kind of the Dark Ages. Have a look at WebSocket, and see where the rest of us are going. No more disconnected client edit screens; just call the validation from the database from any client application. Complete flexibility, on both the server and client side. Separation of responsibility. And all that nice rhetoric that OO folks espouse, but seldom perform. You’re not “reinventing the wheel”, just rediscovering the square one that your grandfather used. That’s not progress.

— I choose not to use foreign key constraints (they don’t work across database boundaries)

That’s not fully true. Depending on the engine, a “database” object can reside with others in the scope of the engine, and FKs are definable across all tables visible to the engine. Typically, the addressing is in the form: database.table.column. DB2, for one, does this. FKs across federated databases, not so much. But then, why would you use more than one engine for an application?
- Tony Marston says:
  
  5 January 2012 at 12:21
  
  If you think that I’m “championing the failed COBOL/VSAM/IMS paradigm” then you are talking out of the wrong end of your alimentary canal. I have said quite cleary that my framework can switch between MySQL, PostgreSQL, Oracle and MS SQL Server simply by changing a single setting in the config file. My framework is written in OO PHP, not non-OO COBOL.
  
  I still don’t agree with putting data validation inside the database, and I never will. It may surprise you, but I’m not the only one.
  
  I have yet to work with a DBMS which allows foreign key constraints to work with tables outside the current database/schema. PostgreSQL certainly doesn’t. The manual clearly states that in the REFERENCE clause you can only specify a table name, which means that it cannot exist outside of the current schema.
  - Robert Young says:
    
    5 January 2012 at 13:10
    
    — If you think that I’m “championing the failed COBOL/VSAM/IMS paradigm” then you are talking out of the wrong end of your alimentary canal.
    
    Lots of folks, take Struts as an example, bifurcate classes between function and data (they give the classes names like FooAction and FooData) which is semantically identical to COBOL/VSAM (or FORTRAN or Flow-Matic); one can stamp one’s foot and claim the mantel of OOD, but it’s just a lie. It just is. This bifurcated mentality then infiltrates the rest of the application code, and it is just a bunch of “Action” classes (the functions) and a bunch of “Data” classes (the data): that’s what your grand daddy wrote back in 1965. Admit it. If you want to see real OOD/OOP in action, read Allen Holub. A separate issue from whether your paradigm/framework is relational.
    
    — I have said quite cleary that my framework can switch between MySQL, PostgreSQL, Oracle and MS SQL Server simply by changing a single setting in the config file.
    
    So what? Your data is still flatfile-in-engine. Putting data in an RDBMS engine doesn’t make it relational data. Many wish that one couldn’t define schemas less than 3 NF. Just as many wish one could only use an OO language where classes must contain both the data and method used by the class, but the youngsters who don’t know any better cleave to the bifurcation approach as you may have (I’ve not reviewed the framework, but all that I have behave as Struts does). But, pretty clearly, your “schema” is very little different from a flatfile, and certainly not what Dr. Codd designed. Being able to switch between engines doesn’t mean a thing.
    
    — My framework is written in OO PHP, not non-OO COBOL.
    
    And who ever said that PHP was OO??
    
    And, in the end, I care naught whether folks like you want to live in The Dark Ages and never have. What irritates me is folks like you who want the prestige of doing “database systems”, but actively subvert what that is and champion your assault as “modern”. That pisses me off. Skip the foolery and have the gonads to write your own I/O, language specific of course, and be done with it. You will run faster, and no one will be able to touch your data; you could even write your own encrypt/decrypt. Really silo your application.
  - Tony Marston says:
    
    5 January 2012 at 13:40
    
    > This bifurcated mentality then infiltrates the rest of the application code, and it is just a bunch
    > of “Action” classes (the functions) and a bunch of “Data” classes (the data)
    
    That may be how YOU write code, but I certainly don’t. To me the term encapsulation means “the act of placing data and the operations that perform on that data in the same class”, and that is precisely what I do. My framework is built around the 3 Tier Architecture, and every class in the business layer encapsulates both the data and the operations (which includes the business rules) for each database table.
    
    > A separate issue from whether your paradigm/framework is relational.
    
    “Relational” applies to a database, not software. Where do you come up with these stupid ideas?
    
    > Your data is still flatfile-in-engine.
    
    The data in my applications exists in a relational database, not flat files, so stop making such stupid claims.
    
    > But, pretty clearly, your “schema” is very little different from a flatfile, and certainly not what
    > Dr. Codd designed.
    
    I design my databases according to relational theory, according to the rules of normalisation, and implement those designs in a variety of relational databases. Only an idiot such as you would still claim that they are still no more than flat files.
    
    > And who ever said that PHP was OO??
    
    It *IS* by virtue of the fact that it supports (and has done since version 4) the concepts of encapsulation, inheritance and polymorphism, and regardless of what you or anyone else thinks those are the *ONLY* factors which decide if a language is OO or not.
    
    > you could even write your own encrypt/decrypt.
    
    I have, actually.
  - jma says:
    
    5 January 2012 at 13:40
    With regard to “The manual clearly states that in the REFERENCE clause you can only specify a table name, which means that it cannot exist outside of the current schema,” I’m afraid you’re mistaken:
    
    Table "s2.t2" Column | Type | Modifiers --------+---------+----------- c12 | integer | not null c2 | integer | Indexes: "t2_pkey" PRIMARY KEY, btree (c12) Foreign-key constraints: "t2_c2_fkey" FOREIGN KEY (c2) REFERENCES s1.t1(c1) Table "s1.t1" Column | Type | Modifiers --------+---------+----------- c1 | integer | not null c2 | text | Indexes: "t1_pkey" PRIMARY KEY, btree (c1) Referenced by: TABLE "s2.t2" CONSTRAINT "t2_c2_fkey" FOREIGN KEY (c2) REFERENCES s1.t1(c1)
    
    That’s PostgreSQL 8.4.10, in case it matters.
Tony Marston says:

5 January 2012 at 14:51

@jma

> I suspect it’s because in general any syntax element referring to a schema-owned object, like
> REFERENCES , is understood to be schema qualified, unless otherwise stated.

“is understood” is not good enough. A reference manual is supposed to be precise and explict, not vague and implicit, so if the addtion of a schema name is optional it should be written as “[schema.]table” and not just “table”. That is the convention I have come to expect from *ALL* reference manuals.
- jma says:
  
  5 January 2012 at 16:06
  
  I’m sure PostgreSQL developers will welcome a patch to fix the documentation. Or you could leave a comment on the manual page.
David Johnston says:

5 January 2012 at 15:58

How about “precise and implicit”? Maybe there could be a section regarding “qualified identifiers” somewhere that describes the fact that ALL tables must be schema qualified but that, in the interest of efficiency, the “search_path” GUC variable can have one or more schemas listed that are searched in order of declaration. Same goes for how column names do not have to be table qualified unless there is a collision. In both cases one learns the rules very quickly and then no longer needs to be reminded in the many-many places that “table” appears that the specification of a schema is required unless it is on the search path.

For the rest of the discussion there really isn’t a correct theory but rather solutions that work given our constraints. Progress does not always result in improvements but often we are able to learn from past experiences and do things better in the future. Failing to learn is bad but so is refusing to re-learn things that may have been forgotten over time.

Given the requirements and tools available I can see why Tony is confident regarding the architecture of the ERP system he is involved with. Given a different set of requirements I would hope that he’d be able to critically evaluate whether his past experiences still provide the best guidance for the new project and be willing to learn/try something different; or whether he would redefine the requirements to better fit the existing architecture. Either way I really do not care. The two most important pieces of information are the answers to the questions:
“What is your design; its benefits and limitations?”
“What specific problems and requirements are you trying to fulfill with the design?”
I can then make an informed decision regarding whether to use, cherry pick, or ignore the experience/designs. I’ll share any feedback and critiques and then move on. But, I will attempt to be as specific and detailed as possible with my feedback.

Also, I thought that “functional programming” way the new architecture-du-jure? In functional programming you would, indeed, separate the actions and the data. Pull in the data from the database, transform it using side-effect-less functions, and the store the results back into the database. With fewer side-effects complexity decreases and productivity increases. My point isn’t to start an argument along these lines but to point out that there really is no point to arguing in the abstract. If you are not going to ask questions and deal with specifics – but instead just throw insults at each other – then please consider that even if you think your position and opinions are “correct” that your ability to impact the thinking and behavior of others is impaired.

We all take shortcuts by making assumptions and generally believing that newer things are better. We also try to avoid meaningful rhetoric and instead resort to yelling louder and getting as many people on our side as possible. But strength in numbers can mean that there are simply more ignorant and lazy people in the world. If I truly believe I am right (as opposed to am just agreeing with the majority – or being contrary intentionally) then no amount of people shouting “you are wrong” is going to change my mind – you need to find out why you think I am right and either refute that or change your own mind. If you do not want to do that, and prefer to simply shout and berate, please consider helping increase the community signal-to-noise ratio and shut up. Thanks.
Robert Young says:

6 January 2012 at 10:51

— > Your data is still flatfile-in-engine.

— The data in my applications exists in a relational database, not flat files, so stop making such stupid claims.

I worked with a Texan for some years, and one of his bits of wisdom (censored for the intertubes): “You can put cat crap in an ice cream cone, that don’t make it chocolate ice cream.”
David Johnston says:

10 January 2012 at 12:51

I do believe that implementing business constraints an application is probably the best way to go; but the way it is being done currently – replacing the functionality of the existing CHECK constraint – it flawed. Many business constraints are arbitrary in nature and so having out-of-bounds values generate fatal exceptions means that many constraints are over-defined. What should happen is that soft-bounds are determined and then an application interface should be provided to allow a data-maintainer to review those entries that fail the constraint to be reviewed and the corresponding constraint modified if necessary. As no current database implements this kind of soft-constraint in a native/declarative fashion it does fall onto AN application to implement such functionality. External processes can check for valid values and attempt to adhere to them but in situations where progress out-paces the constraint the external application can simply stores its data and let the maintenance routine flag it for review.

Data type and more generic constraints (like not allowing newlines/form-feeds/etc…) should still be declared in the database to minimize mal-formed data from being entered. It is similar to the XML concept of well-formed versus validated – but the validation should be more flexible than is currently allowed. Hard constraints (like positive-only numbers) are a little more problematic since external code may rightly depend (though it probably shouldn’t) on the value not becoming negative; but whether maximum quantity-on-hand is 100 or 500 is arbitrary and using code should not be depending on the magnitude of the allowable values. Hard-bounds should try to be implemented in the database AND should only be modified after great contemplation whereas soft-bounds are better implemented in software-only and can readily be altered to meet changing business realities. But then you also need to insure that if invalid data gets entered that some sort of asynchronous rejection mechanism exists to tell the inserting process/user that they need to redo their transaction.
Antonio says:

12 January 2012 at 00:09

I agree with the premise that application logic should be pushed down into the database layer.

Are there any books/articles that discuss the analysis, design and programming of a modern database application? Most books seem to just discuss the data model. Most vendor documentation has the application logic in a different tier with the database just used for persistence. I struggle to find anything that discusses in detail how to structure and design your application using stored procedures, table valued functions etc. Should you try and use an OOP approach for example with the tables representing classes and stored procedures the methods? Or should you attempt a traditional procedural approach with high level business actions being decomposed into smaller procedures.

Also should all table access be through explicit CRUD procedures or is it OK for procedures to access table data directly.

On the other hand functional programming is becoming popular these days, should I try and do database programming in a functional style? Thanks.
- jma says:
  
  12 January 2012 at 00:25
  
  As far as books, most that I’ve seen recently are primarily concerned with a particular framework (Rails, Python, PHP or Java). In terms of articles, I do have a couple of suggestions. First is the series of posts by Kenneth Downs in The Database Programmer. Second is Toon Koppelaars’ Helsinki Declaration and his more recent Triggers Considered Harmful (which I’ve yet to look at).
Pingback: Automated Database Augmentation | Taming Serpents and Pachyderms
Pingback: A Pythonic, TTM-inspired interface to PostgreSQL – Requirements | Taming Serpents and Pachyderms
Pingback: The Future of Pyrseas: Part 1 | Taming Serpents and Pachyderms