Stairway to T-SQL DML Level 4: The Mathematics of SQL: Part 1

Question

Stairway to T-SQL DML Level 4: The Mathematics of SQL: Part 1

Greg Larsen

SSC-Insane

Points: 20961
More actions
November 16, 2011 at 12:00 am

#146086

Comments posted to this topic are about the item Stairway to T-SQL DML Level 4: The Mathematics of SQL: Part 1
Gregory A. Larsen, MVP

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Ron McCullough SSC Guru Points: 63877 More actions · Answer 1

Thanks Greg.

Copied and ran all your sample code, conclusion - makes understanding the various operators (EXCEPT, INTERSECT, UNION and UNION ALL ) exceptionally clear.

And the additional tidbits of information such as:

In addition, something worth noting is that the column headers displayed take the names associated with the columns identified in the first SELECT.

Definitely a "must read" for newbies to SQL.

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

sqlvogel SSCrazy Eights Points: 9526 More actions · Answer 2

Not a great start in my opinion:

A relational database contains tables that relate to each other by key values

Relational databases fundamentally have nothing to do with tables being related to each other. The topic is allegedly mathematical so it's a pity the piece couldn't begin with an accurate and informative explanation of what the relational model is about. Someone who doesn't know any better is going to be misled into thinking that "relational" is all about "relating tables".

What bothers me even more are the examples. What is the point of demonstrating tables without keys, with duplicate rows and with every column nullable?? These are not sets at all.

Ron McCullough SSC Guru Points: 63877 More actions · Answer 3

What bothers me even more are the examples. What is the point of demonstrating tables without keys, with duplicate rows and with every column nullable?? These are not sets at all.

In your definition they may not be sets, but from seeing the questions frequently asked in the forums they exist all too often in the real world.

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

TomThomson SSC Guru Points: 104767 More actions · Answer 4

Although I don't much like the article (I was a mathematician before I got involved in computers, and a computer scientist before I was an engineer/developer/architext, so I prefer people to avoid misusing/making incorrect statements about well understood mathematical concepts like sets; if the author had used "bag" instead of "set" it would have been fine, because in maths bags do allow duplicates) and agree with David P that it's not really a great start, I think his other remarks are over the top (misuse of one term is not a total disaster) because the diagrams and some of the other stuff probably will help some people.

It always amuses me when I see the claim that relational algebra is nothing to do with tables (do you know a better graphical representation of a relation, David? I suppose there may be such a thing. And Jeff's pork chop launcher might be made obsolete by the discovery that flying pigs with detachable parts are abundant in Minnesota and can be trained to divebomb people who do things row by row).

As for a tables without keys, with duplicates, and allowing nulls not being a set at all that's certainly true. However, a relation (in the sense of the relational model for data) is not a set either (it can't be: it has a key, which is not anything a set has), so why complain that a table is not a set? And I see the usual unreasoned anti-null nonsense rearig its head there: how on earth could allowing null affect whether something could be a set.

Anyway, I don't believe that getting a mathematically correct definition of of a relation (in the relational data model sense) will be of much use to anyone, not even to a mathematician or a theoretical computer scientist (although I've done it several times in different forms, just for fun). How useful is it to the average DBA or DB developer to know that in the relational data mathematical world we can safely go at defining a relation by something like the following:

edit (some time later): edit: now that David declined the error detect challenge, I can say that I intended to have one; but I can't convince myself that it worked - it looks as if there isn't one. careless of me. it just shows how silly a mathematician can get when playing at mathematics this abstract. Reminds me of Martin Gardner's statement that "Pure mathematics is very like the grin without the cat".

[/i]

Let S be a finite map from names (text strings) to datadomains,

O1 be a total order on S,

K be a submap of S,

O2 be a total order on K, and

[D] be the tuple <S, O1, K, O2>.

Then the relation type RT with definition D is the tuple <D, P> where

P is a totally recursive function on positive integers to ordered maps of type S with order O1 such that for all integers n the cardinality of the map set P(n) restricted to K is the same as the cardinality of P(n), and there is no proper submap of K for which this holds true, and no superset of P for which this holds true.

A relation R of type RT is a tuple <RT, V> where V is finite map from time to positive integers.

At time t the value of R at time t is the tuple <D, P(V(t))>.

Of course that only works when each relation has exactly one key, and all the business rules to be enforced by the keys are enforced ny the primary key. It gets more complicated when there are additional candidate keys.

I don't think anything like that is the slightest bit useful, unless one wants to get into machine driven theorem proving in which case this sort of definition (in far more complex form, to cover everything) might conceivably turn out to be useful. (Incidentally, David, can you tell wether I put a deliberate mistake in that definition or not? If not, why do you complain about someone talking tables instead of mathematically correct definitions?)

Tom

sqlvogel SSCrazy Eights Points: 9526 More actions · Answer 5

Tom.Thomson (9/6/2011)It always amuses me when I see the claim that relational algebra is nothing to do with tables

I guess it might amuse me too but I didn't notice that anyone in this thread said such a thing. What I didn't like was the suggestion that a relational database consists of "tables that relate to each other" - a description which misses the point, doesn't say anything useful and is probably only going to mislead people.

a relation (in the sense of the relational model for data) is not a set either (it can't be: it has a key, which is not anything a set has), so why complain that a table is not a set?

A relation variable isn't a set but its value at a point in time is a subset of the cartesian product of some set of domains. A key is a feature of a relation variable, not the set of tuples assigned to it. At least that's pretty much the definition used in many respected textbooks e.g. those by David Maier, Serge Abiteboul and others. To be sure it's a less formal definition than yours and probably one that a mathematician can pick holes in but I believe it's the description most useful to database practictioners. I'm not a mathematician. I'm interested in what's useful for a practical understanding of database management. I'm inclined to agree with you that your definition is not useful.

TomThomson SSC Guru Points: 104767 More actions · Answer 6

David Portas (9/6/2011)
Tom.Thomson (9/6/2011)It always amuses me when I see the claim that relational algebra is nothing to do with tables
I guess it might amuse me too but I didn't notice that anyone in this thread said such a thing. What I didn't like was the suggestion that a relational database consists of "tables that relate to each other" - a description which misses the point, doesn't say anything useful and is probably only going to mislead people.

But a relational database is naturally represented as a set of single column tables that relate to each other to form more complex tables (the relations) which in turn relate to each other again :hehe:

a relation (in the sense of the relational model for data) is not a set either (it can't be: it has a key, which is not anything a set has), so why complain that a table is not a set?
A relation variable isn't a set but its value at a point in time is a subset of the cartesian product of some set of domains. A key is a feature of a relation variable, not the set of tuples assigned to it.

Why would anyone think it was a set of tuples? It is in fact an ordered list of attributes (although some pople drop the order, which has some interesting implications that they mostly choose to ignore). It's defining properties (which make it a key) is that for any value of the relation a projection of the current set of tuples on a key has the same cardinality as has the current set of tuples itself, and that no subsequence of the key has this property.

At least that's pretty much the definition used in many respected textbooks e.g. those by David Maier, Serge Abiteboul and others.

Well actually, no , a relation's value is never a subset of a cartesian product of a set of domains (unless the relation has only one attribute), for two reasons: first, a domain may occur more than once in the list from which the cartesian product is formed so that list is clearly not a set (a list has an order for its elements, and can include duplicates; a set has neither) - the very concept of the cartesian product of a set is meaningless (unless the set has cardinality smaller than 2) and second, cartesian products have order but the ordered elements don't have names. If (as appears unlikely) the "respected textbooks" you mention use the definition you have given they probably don't deserve any more respect than does the article you are commenting on, because they've fallen into the same trap - using the word "set" to describe something that allows duplicates (and even worse, in these books it - according to you - seem to imply an order too).

To be sure it's a less formal definition than yours and probably one that a mathematician can pick holes in but I believe it's the description most useful to database practictioners. I'm not a mathematician. I'm interested in what's useful for a practical understanding of database management. I'm inclined to agree with you that your definition is not useful.

Well, but you've declined the deliberate mistake question. SO I'll admit that I intended to have one, but appear to have screwed up - I put the maximimality property of the permitted values function in the right place instead of on the wrong one (on the function, not on its results).

Tom

SQL Paul SSC Enthusiast Points: 186 More actions · Answer 7

I thought it was a good introductory article for those interested in understanding data structure who do not have a doctorate in applied mathematics.

I especially liked the use of the Venn diagrams for visual clarity and even though I have been a DBA for 14 years, I found it an interesting read.

"A relational database contains tables that relate to each other by key values" I really do think this is a fair statement - If you have a database that contains tables that do not relate to each other it wouldn't be very relational would it and that is the point the article is making.

Good start, looking forward to the next article and be thankful you don't have to work with some of the people that leave comments!

TomThomson SSC Guru Points: 104767 More actions · Answer 8

SQL Paul (11/16/2011)
I thought it was a good introductory article for those interested in understanding data structure who do not have a doctorate in applied mathematics.
I especially liked the use of the Venn diagrams for visual clarity and even though I have been a DBA for 14 years, I found it an interesting read.

I agree about theVenn diagrams, they will be a great aid to people who don't have a gut feel (as opposed to a theoretical idea) for what the operators concerned do, and the gut feel is probably more important than the theory. And the whole article is certainly interesting. My concern with it was that it was a bit sloppy on mathematical terminology for an article with that title.

"A relational database contains tables that relate to each other by key values" I really do think this is a fair statement - If you have a database that contains tables that do not relate to each other it wouldn't be very relational would it and that is the point the article is making.

I think it's a fair statement too - it's a nice way of saying that referential transparency enforced by keys is an essential part of the relational model, which will be useful for people coming to the subject from no background of relational theory or terminology. I think DP was the only person to object to the statement.

Good start, looking forward to the next article and be thankful you don't have to work with some of the people that leave comments!

I hope the next article comes soon.

Tom

TimothyAWiseman SSCrazy Eights Points: 8819 More actions · Answer 9

Great article. I think a solid grounding in set theory would be very helpful for many DBAs.

If I may respectfully offer a couple of minor corrections and clarifications though:

Your definition of subset is mathematically the definition of a Proper Subset. A subset can contain the same number of elements as the original set. This is significant in a number of proofs in set theory, and is also very significant when dealing with infinite sets (admittedly, you will never see an infinite set in SQL). For instance, the set of even integers is a subset of the set of integers, but they both contain an infinity of elements (even more precisely, they both contain Aleph_null elements).

Also, you correctly quote Cantor's definition, but one element of that definition is worth noting in SQL. It requires the elements to be "definite and separate." In other words, unique. That is why the normal except, intersect and union operators all enforce distinctness by default (there are variations like UNION ALL which do not enforce distinctness and also often operate substantially faster). But it also shows that SQL Server is more inspired by rather than really implementing set theory. SQL Server permits tables, and definitely results, to contain duplicated rows.

And finally, if I may point out that "Applied Mathematics for Database Professionals" is a great book that goes through these topics.

---
Timothy A Wiseman
SQL Blog: http://timothyawiseman.wordpress.com/

danielk1 SSC-Addicted Points: 462 More actions · Answer 10

Gregory:

Good article and thank you.

I found it amusing the discussion, cough, on terms stated in article and math representations. Sigh... I guess some folks have nothing better to do than to pick away at what others are trying to contribute.

Maybe this is in another article, but would like to see comparison between INTERSECT and INNER JOIN w/wo/DISTINCT. When is it appropriate to use one versus the other with regards to performance and readability.

Thank you again.