Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12

Stairway to Database Design STEP 3: Building Tables Expand / Collapse
Author
Message
Posted Wednesday, September 18, 2013 8:40 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, September 26, 2013 11:09 AM
Points: 1, Visits: 18
My wife complains that I express my opinions as facts. And so it goes with Mr. Celko. Although he clearly knows more about SQL than I ever will he also has no compunction about expressing his personal dogma as if it were canon.
He decries "tibblers" when they name database objects in a way that identifies the object type i.e. tblEmployee arguing that it is more flexible to name it without the object type because that way if you need to change the object type say from a table to a view then you don't need to find all the code that references the entity to rename it. I disagree. Since SSMS will not allow me to organize entities into areas I find it convenient to name my objects area_objecttype_name. Aside from being able to find objects quicker when you have well over a hundred tables say you are inspecting a bit of code and you see an entity named Employee, or Employees as Mr. Celko would have it, if I don't know that it's a view I am wasting my time writing and testing a table constraint to protect data integrity unless I schemabind the view and then index it. Also I would posit that as far as changing object names in code is concerned I would want to touch every bit of code that uses that object to check that a change to the object type won't have any unintended consequences.

As far as using industry standard identifiers is concerned Mr. Celko does what he does frequently and makes rules without knowing the particulars of how a business operates. For example in the system I work on we deal with tens of thousands of small businesses many of which are identified for us through third party sources. How many do you think know their DUNS. I can't even get all the DOT numbers much less some esoteric number most people have never heard of.

As far as the proscription against surrogate or artificial identity integer based keys is concerned all I can say is poppycock. If an entity does not have an industry standard identifier available then what's the difference whether I assign some arbitrary key or allow the machine to generate one? Much easier for me to standardize an approach with an identity column for each table named in a standardized way even when it is not necessarily needed. A waste of space maybe but space is cheap and I find it to be convenient because it allows me to be more responsive to business requests i.e. I can write queries much faster because I don't have to wonder about what columns to join on.

I'm not saying that I'm sure that I'm right but I cannot be convinced by appeals to authority and frankly the arguments I've read are not particularly persuasive.
Post #1495946
Posted Wednesday, September 18, 2013 1:07 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, November 17, 2014 2:14 PM
Points: 18, Visits: 187
As an exercise, try to write a constraint that will prevent the start and end ranges from overlapping and from having gaps. You can re-design the table if you need to.


So we have a challenge. Very well, lets try

I will slightly rename the column names, to make the typing easier:
start_order_amt_tot -> Start_amt
end_order_amt_tot -> End_amt


The table Joe proposed looks somewhat like this:
[Start_amt]	[End_amt]	[Shipping_cost]
---------------------------------------
0.00 100.00 3.50
101.00 200.00 8.00
201.00 300.00 14.00
301.00 999.00 25.00
;

We can say about this table:
"When the order total amount is between [Start_amt] and [End_amt] then we will apply shipping charge of [Shipping_cost]."

How to prevent overlaps and gaps? Not an easy task, eh Fortunately, Joe allowed us to redesign the table. Let’s do it. Instead of storing the range, we can store the point of change.

Since this is a game, we are free to make up rules. I will make up a rule that for every 100 money units we change the shipping cost. This assumption is not likely in the real life, but it will simplify things and make the demonstration more understandable. The solution is somewhat hard to understand, but without this simplification it would have been much worse.

Also, I will assume that the shipping cost goes up with the amount ordered.

The full solution (non even intervals), without any simplification, would require a bit more space (an article?) so we will stick with the simplified one. The goal is to show one possible answer to Joe's challenge. If it is not totally boring, we can talk more about this.

Here is how the modified table should look:
[Amt]	 [Shipping_Cost]
-----------------------------
0.00 3.50
100.00 8.00
200.00 14.00
300.00 25.00

We can describe the new table layout with this relational predicate:
"When the total amount for given order is greater than or equal to [Amt] and less than next greater [Amt], we will charge the [Shipping_Cost] amount that corresponds to the [Amt]."

Relational predicates are logical functions, where attributes are the parameters. Here, attribute [Amt] appears three times in the same predicate, which is confusing and worrisome. If we cannot even describe clearly what we are doing, how can we solve the problem? No worries. The second appearance refers to "the next greater [Amt]" - the [Amt] in the "next" row. This implies use of self referencing foreign keys somehow. That will be one of the cornerstones for the solution. The third apperance "we will charge [Amt]" we can skip from the sentence, but we will loose clarity. when we turn thr prdicate into a table, we will skip it then.

The desired table layout doe not allow enforcement of given constraints. To find the solution, we must break some design rules and look for answers out of the usual box. Here we go.

Without any introductions, let me say I will build a table like this:
[Amt]	 [Prev_Amt]	 [Shipping_Cost]   [Prev_Shipping_Cost]
----------------------------------------------------------------------------
0.00 0.00 0.00 0.00
100.00 0.00 8.00 0.00
200.00 100.00 14.00 8.00
300.00 200.00 25.00 14.00

Relational predicate for the table: "We will charge for shipping the [Shipping_cost] amount when the total cost for the order is greater than [Prev_Amt] and smaller than [Amt], where the shipping cost for the previous [Amt] is [Prev_Shipping_Cost]".

Again we are referencing rows other than current - more self referencing foreign keys to come. However, we see the ranges back - [Prev_Amt] and [Amt] in a way describe ranges for the shipping charges. The reason for repeating the shipping cost will be explained in due time.

By examining the table, we see that the following constraints need enforcing:
1) [Amt] is UNIQUE
2) The first row all columns have value 0.00
3) In each row, except the first one, [Prev_Amt] is equal to [Amt] from the previous row.
4) [Amt] - [Prev_Amt] = 100 -- (this is a huge assumption, but a convenient one!)
5) In each row, except the first one, [Prev_Shipping_Cost] is equal to [Shipping_Cost] from the previous row.
6) In each row, except the first one, [Shipping_Cost] in each row is greater than [Prev_Shipping_Cost]

Only the first one is easy to enforce - UNIQUE or PRIMARY KEY will do.

Let's start from basic table design and then we will add the rest of the constraints:
IF Object_ID('Shipping_Costs_B') IS NOT NULL DROP TABLE Shipping_Costs_B
;
CREATE TABLE Shipping_Costs_B
(-- obvious constraints only
[Amt] decimal (10, 2) NOT NULL CHECK ([Amt] >= 0)
, [Shipping_Cost] decimal (10, 2) NOT NULL CHECK ([Shipping_Cost] >= 0)
, [Prev_Amt] decimal (10, 2) NOT NULL CHECK ([Prev_Amt] >= 0)
, [Prev_Shipping_Cost] decimal (10, 2) NOT NULL CHECK ([Prev_Shipping_Cost] >= 0)
, CONSTRAINT PK_Shipping_Costs_B PRIMARY KEY ([Amt]) -- 1) [Amt] is UNIQUE
);

Next three constraints, when put together, enforce the following rules:
2) The first row all columns have value 0.00
3) In each row, except the first one, [Prev_Amt] is equal to [Amt] from the previous row.
4) [Amt] - [Prev_Amt] = 100 (this is a huge assumption, but a convenient one!)

-- 2) The first row all columns have value 0.00
We can say
IF ([Amt]=0)
THEN
([Amt]=[Prev_Amt]=[Shipping_Cost]=[Prev_Shipping_Cost])

Now we are in trouble. There is no IF..THEN (P=>Q) logical operator in SQL. However, we can use (AND,OR) to represent IF...THEN construct. P=>Q is the same as (NOT P OR Q). This is easy enough to prove, or find it in many books, including "SQL for Smarties" by Joe Celko This is another cornerstone for the solution - use of P=>Q logical construct.

Hence the constraint for rule 1):
ALTER TABLE Shipping_Costs_B 
ADD CONSTRAINT CK_Shipping_Costs_B_FirstRow
CHECK (
[Amt]>0 -- NOT [Amt]=0 (Amt = 0 for the first row)
OR -- OR (each column have value 0)
([Amt] = 0 AND [Prev_Amt]=0 AND [Shipping_Cost]=0 AND [Prev_Shipping_Cost]=0)
)
;

3) In each row, except the first one, [Prev_Amt] is equal to [Amt] from the previous row.
The following FK almost gets us there. It enforces that [Prev_Amt] is equal to [Amt] from A previous row, not from THE previous row.
ALTER TABLE Shipping_Costs_B 
ADD CONSTRAINT FK_Shipping_Costs_B_Amt
FOREIGN KEY (Prev_Amt) REFERENCES Shipping_Costs_B ([Amt]) -- provides sequence of [Amt]
;

-- 4) [Amt] - [Prev_Amt] = 100
-- The following CHECk enforces rule 3 and makes the rule 3) complete
--- (non first row) => [Amt] - [PrevAmt] = 100 ---> [Amt]>0 => [Amt]-[Prev_Amt] = 100
ALTER TABLE Shipping_Costs_B
ADD CONSTRAINT CK_Amt_Prev_Amt_Diff_NonFirstRow
CHECK (
[Amt]=0 -- NOT Amt>0
OR -- OR
[Amt]-[Prev_Amt] = 100 -- Amt is grater than Amt_Prev by 100
)
;

We have to enforce two more rules:
5) In each row, except the first one, [Prev_Shipping_Cost] is equal to [Shipping_Cost] from the previous row.
6) In each row, except the first one, [Shipping_Cost] in each row is greater than [Prev_Shipping_Cost]

We need a super-key for these rules. Super-key is an unique key with added columns. By adding columns to an unique key, the uniqueness is preserved. Since super-key is unique, we can use it in foreign key constraints.

The super-key:
ALTER TABLE Shipping_Costs_B 
ADD CONSTRAINT SK1_Shipping_Costs_B UNIQUE ([Amt], [Shipping_Cost])
;

--With super-key in place, we can define a FK constraint:
ALTER TABLE Shipping_Costs_B
ADD CONSTRAINT FK_Shipping_Costs_B
FOREIGN KEY ([Prev_Amt],[Prev_Shipping_Cost]) REFERENCES Shipping_Costs_B ([Amt], [Shipping_Cost])
;

The constraint FK_Shipping_Costs_B almost enforces the rule 5. Again, it enforces that
[Prev_Shipping_Cost] is equal to [Shipping_Cost] from A previous row, not from THE previous row. The next constraint will make this one complete.

We are trying to enforce this rule:
6) In each row, except the first one, [Shipping_Cost] in each row is greater than [Prev_Shipping_Cost]
The rule can be written as:
([Amt]<>0 => [Shipping_Cost]>[Prev_Shipping_Cost])
which is equivalent to
<=> ( [Amt]=0 OR [Shipping_Cost]>[Prev_Shipping_Cost])
ALTER TABLE Shipping_Costs_B
ADD CONSTRAINT CK_Shipping_Cost_Prev_Diff_NonFirstRow
CHECK (
[Amt]=0 -- NOT Amt <> 0
OR
[Shipping_Cost]>[Prev_Shipping_Cost] -- cost greater that the cost from the previous row
)
;

Without repeating the shipping cost from the previous row, we would have not been able to enforce the rule "shipping costs go up with the amount on the order".

We are done. Overlaps or gaps are not possible. The sequence cannot be disrupted by INSERT/UPDATE or DELETE. Each next row will have to contain a multiple of 100 in Amt column.
The only peculiar thing is that the first row will have to contain all zeros. However, that will not affect the end user, since all queries against this table will look like:
SELECT 
O.TotalAmt
, S.Shipping_Cost
FROM Orders AS O, Shipping_Costs_B AS S
WHERE O.TotalAmt > S.Prev_Amt AND O.TotalAmt <= S.Amt

There will be some difficulties in writing INSERT INNTO procedure, but we will do it once ,and the programmers or end users will never see anything but this:
SELECT 
start_order_amt_tot = prev_Amt
, end_order_amt_tot = Amt
, [Shipping_Cost]
FROM Shipping_Costs_B
WHERE Amt>0

The solution is inspired greately by articles and books written Joe himself and Alex Kuznetsov.

Post #1496094
Posted Monday, September 23, 2013 10:00 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, September 23, 2013 9:58 AM
Points: 2, Visits: 0
Wonderful.

Share a website with you ,

( http://www.shoes2.us// )

Believe you will love it.

We accept any form of payment.


http://www.shoes2.us/Michael-Kors-Handbags-n2447/

Post #1497455
« Prev Topic | Next Topic »

Add to briefcase ««12

Permissions Expand / Collapse