An Introduction to Database Models

,

Consider this article a very basic introduction to an area in which countless multi-hundred

paged books have been written during the last four decades. This article can not and does not intend to compete

with any of these books, but rather tries to explain different database models in a non-academic style. 

I will discuss the following database models:

  • Flat Files
  • Hierarchical Model
  • Network Model
  • Relational Model
  • Object-Relational Model
  • Object-Oriented Model
  • Other Models

Flat Files

Simply put, one can imagine a flat file database as one single large table. A good example to visualize

this is to think of a spreadsheet. A spreadsheet can only have one meaningful table structure at a time.

Data access happens only sequentially; random access is not supported. Queries are usually slow, because the

whole file always has to be scanned to locate the data. Although data access could be sped up by sorting the

data, in doing so the data becomes more vulnerable to errors in sorting. In addition you'll face following

problems.

  1. Data Redundancy.

    Imagine a spreadsheet where you collect information about bonds to manage a fixed income portfolio. One

    column might contain the information about the issuer of that bond. Now when you buy a second bond from this

    issuer you again have to enter that information.

  2. Data Maintenance.

    Given the above example consider what happens when the information of this issuer changes. This change has to

    be applied to every single bond (row) of that issuer.

  3. Data integrity.

    Following 2; what happens when you have a typo when changing the information? Inconsistencies are the

    outcome.

As a result one concludes that Flat File Databases are only suitable for small, self-evident amounts of

data.

Hierarchical Database Model

Hierarchical databases (and network databases) were the predecessors of the relational database model.

Today these models are hardly used in commercial applications. IBM's IMS (Information Management System) is

the most prominent representative of the hierarchical model. It is mostly run on older mainframe systems.

This model can be described as a set of flat files that are linked together in a tree structure. It is

typically diagrammed as an inverted tree.

The original concept for this model represents the data as a hierarchical set of Parent/Child relations.

Data is basically organized in a tree of one or more groups of fields, that are called segments. Segments

make up every single node of the tree. Each child segment can only be related to just one parent segment

and access to the child segment could only happen via its parent segment. This means, that 1:n relations

result in data redundancy.

To solve this problem, the data is stored in only one place and is referenced through links or physical pointers.

When a user accesses data he starts at the root level and works down his way through the tree to the desired

target data. That's the reason why a user must be very familiar with the data structure of the whole database.

But once he knows the structure, data retrieval could become very fast.

Another advantage is built-in referential integrity, which is automatically enforced. 

However, because links between the segments are hard-coded into the database, this model becomes inflexible to

changes in the data structure. Any change requires substantial programming effort, which in most cases comes

along with substantial changes not only in the database, but also in the application.

Network Database Model

The network database model is an improvement to the hierarchical model. In fact it was developed to

address some of the weaknesses of the hierarchical model. It was formally standardized as CODASYL DBTG

(Conference On Data System Languages, Data Base Task Group) model in 1971 and is based on mathematical set

theory.

At its core the very basic modeling construct is the set construct. This set consists of an owner record,

the set name, and the member record. Now, this 'member' can play this role in more than one set at the

same time; therefore this member can have multiple parents. Also an owner type can be owner or member in one or

more other set constructs. This means the network model allows multiple paths between segments. This is a

valuable improvement on relationships, but could make a database structure very complex.

Now how does data access happen? Actually, by working through the relevant set structures. A user need not

work down his way through root tables, but can retrieve data by starting at any node and working through the

related sets. This provides fast data access and allows the creation of more complex queries than could be

created with the hierarchical model. But, once again, the disadvantage is that the user must be familiar with

the physical data structure. Also, it is not easy to change the database structure without affecting the

application, because if you change a set structure you need to change all references to that structure within

the application.

Although an improvement to the hierarchical model, this model was not believed to be the end of the line.

Today the network database model is obsolete for practical purposes.

Relational Database Model

The theory behind the relational database model will not be discussed in this article; only the differences

to the other models will be pointed out. I will discuss the relational model along with some set theory and

relational algebra basics in a different set of articles in the near future, if they let me :).

In the relational model the logical design is separated from the physical. Queries against a Relational

Database Management System (RDBMS) are solely based on these logical relations. Execution of a query doesn't require

the use of predefined paths like pointers. Changes to the database structure are fairly simple and easy to

implement.

The core concept of this model is a two-dimensional table, comprising of rows and columns. Because the data

is organized in tables, the structure can be changed, without changing the accessing application. This is

different to its predecessors, where the application usually had to be changed when the data structure changed.

The relational database model knows no hierarchies within its tables. Each table can be directly accessed and

can potentially be linked to each other table. There are no hard-coded, predefined paths in the data. The

Primary Key - Foreign Key construct of relational databases is based on logical, not on physical links.

Another advantage of the relational model is that it is based on a solid house of theory. The inventor,

E.F.Codd, a mathematician by profession, has defined what a relational database is and what a system needs

to call itself a relational database 1), 2). This model is firmly based on the mathematical theories of

sets and first order predicate logic. Even the name is derived from the term relation which is commonly used

in set theory. The name is not derived from the ability to establish relations among the table of a relational

database.

Object-Relational Model

Also called post-relational model or extended relational model. This model addresses several weaknesses of

the relational model. The most significant of which is the inability to handle BLOB's.

BLOBs, LOBs or Binary Large Objects are complex data types like time series, geospatial data, video

files, audio files, emails, or directory structures.

An object-relational database system encapsulates methods with data structures and can therefore execute

analytical or complex data manipulation operations.

In it's most simple definition data is a chain of 0s and 1s, that are ordered in a certain manner.

Traditional DBMSs have been developed for and are therefore optimized for accessing small data elements like

numbers or short strings. These data are atomic; that is they could be not further cut down into smaller

pieces. In contrast are BLOB's large, non-atomic data. They could have several parts and subparts.

That is why they are difficult to represent in an RDBMS.

Many relational databases systems do offer support to store BLOBs. But in fact, they store these data outside

the database and reference it via pointers. These pointers allow the DBMS to search for BLOBs, but the

manipulation itself happens through conventional IO operations.

Object-Oriented Model

According to Rao (1994), "The object-oriented database (OODB) paradigm is the combination of object-oriented

programming language (OOPL) systems and persistent systems. The power of the OODB comes from the seamless

treatment of both persistent data, as found in databases, and transient data, as found in executing programs."

To a certain degree, one might think that the object-oriented model is a step forward into the past, because

their design is like that of hierarchical databases. In general, anything is called an object, which can be

manipulated. Like in OO programming objects inherit characteristics of their class and can have custom properties

and methods. The hierarchical structure of classes and subclasses replaces the relational concept of atomic data

types. As with OO programming the object-oriented approach tries to bring OO characteristics like classes,

inheritance, and encapsulation to database systems, making a database in fact a data store.

The developer of

such a system is responsible for implementing methods and properties to handle the data in the database from

within his object-oriented application.

There is no longer a strict distinction between application and database.

This approach makes Object DBMS an interesting and sometimes superior alternative to RDBMS when complex

relationships between data are essential. An example of such an application might be current portfolio risk

management systems.

However, these systems lack a common, solid base of theory that Codd provided for relational databases.

There is a model proposed by the Object Management Group (OMG), which could be viewed as de facto standard, but

the OMG can only advise and is not a standards body like ANSI.

Other Models

For the sake of "completeness" what now follows is a list of other models without further detailed

explanation:

  • Semistructured Model
  • Associative Model 3)
  • Entity-Attribute-Value (EAV) data model 4)
  • Context Model

Conclusion

Well, no real conclusion.

I hope you are with me so far and have seen that there (still) is a world outside the relational model,

which can be pretty exciting to discover.

If you are interested in more details, you might want to follow the links provided in the reference.

References

1)Codd, E.F. "Is Your DBMS Really Relational?" and "Does Your DBMS Run By the

Rules?" ComputerWorld, October 14 1985 and October 21 1985.

2)Codd, E.F. The Relational Model for Database Management, Version 2;

Addison-Wesley; 1990.

3)

http://www.lazysoft.com/docs/other_docs/amd_whitepaper.pdf

4)

http://ycmi.med.yale.edu/nadkarni/eav_cr_contents.htm

Rate

Share

Share

Rate