Database Design Queries

Question

Database Design Queries

mailtonoorul

Valued Member

Points: 71
More actions
May 26, 2014 at 7:37 am

#289159

1.Is it advisable to use IDENTITY [(seed, increment)] to generate the sequence number in a table for concurrent user in Microsoft SQL Server 2012?
2.Is it advisable to use SCOPE_IDENTITY () to get the latest identity number during concurrent user in Microsoft SQL Server 2012?
3.Kindly help us select the best solution from the following for creating reference in Microsoft SQL Server 2012
Primary key column with integer data type with IDENTITY [ (seed , increment) ] to generate a sequence of number or
Code Column / Composite primary key column with varchar data type
4.Kindly help us to select best solution from the following to store hierarchy data
Single table with ID and parent ID as a column or
Multiple table for each entity and a mapping table
5.Until which Normal Form it is advisable? can we go until 2nd Normal form?

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004669 More actions · Answer 1

IDENTITY is a good choice but does have some caveats when it come to "replication". If you intend to use replication, you'll need to study that a bit.

Yes, both IDENTITY and SCOPE_IDENTITY() are nasty fast and are easy methods to produce such sequences without fear of duplication. Of course, the new SEQUENCE methodology will also work just fine but does require a little more coding. The SEQUENCE thing will also solve some of the problems you may have with replication but it's not a panacea. You still have to pay attention. 😉

On the hierarchy thing, I say enjoy the best of all worlds...

The parent/child table (known as an "Adjacency List") is comparatively super easy to maintain and, if something goes haywire with it, it's a lot easier to troubleshoot than all the other methods (including the HierarchyID datatype, which I won't use). You do have to do a little planning to avoid "cycles". I recommend using unique "positions" rather than something like (for example) an EmployeeID if an Employee can have more than one position in the hierarchy.

For performance of searches for hierarchies, I've found that "Nested Sets" are usually the berries but it's a real PITA to maintain. If something goes haywire with that, it's extraordinarily difficult for a human to fix (and it will require a human). It would be better to enjoy the ease of maintenance of an "Adjacency List" and re-generate the "Nested Sets" after each modification to the "Adjacency List". Of course, most people use a RBAR on steroids method for doing that known as the "push stack method" and that method is absolutely impractical (can take a couple of days for a million node hierarchy even on fast machines). There is a much better way to rebuild "Nested Sets" from an "Adjaceny List" that takes about 52 seconds to rebuild a million node hierarchy. We'll get to that in a minute.

I've also found that the purpose of hierarchies is usually to do things like costing or calculating sales of a "downline" group or level and that's a bit of a pain no matter what type of hierarchical structure. The "Hierarchical Path" method (that's what HierarchyID is based on) can really be a performance benefit forr doing such things but, it too, can be impossible for a human to troubleshoot if something goes wrong.

So, here's my recommendation...

1. Base the hierarchy on an "Adjacency List" to make life easy for maintenance and for finding "uplines".

2. Rebuild "Nested Sets" fron the "Adjacency List" for wicked performance in finding "downlines" with "level controls" and to do certain calculations.

3. Rebuild a "Hierarchical Path" to make certain level-sensitive aggregations easy. In fact, precalculate all such aggregates and store them in a different table.

Heh... I know what you're thinking. Sounds daunting and slow. Not to worry. It's neither daunting or slow if you know how. You just need to know "how". With all of that in mind, please see the following two articles.

This article teaches you how to quickly convert an "Adjacency List" to both a "Hierarchical Path" and "Nested Sets" and storing both of those structures in the same table as the "Adjacency List" so that you can easily enjoy the benefits that each hierarchical structure provides. It does it all on a million node "Adjacency List" in about 52 seconds on any decent box.

http://www.sqlservercentral.com/articles/Hierarchy/94040/

This next article makes the realization that a lot of work done in hierarchies is horribly repetative especially where aggregates are concerned. It demonstrates how to make all such calculations (again, in about 52 seconds on a million node hierarchy) and store them in a simple lookup table (think of it as a single table CUBE with easy and incredibly fast lookups). This also allows the ease of maintenance found in an "Adjacency List".

http://www.sqlservercentral.com/articles/T-SQL/94570/

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

TomThomson SSC Guru Points: 104768 More actions · Answer 2

It's pretty dificult to make recommendations when there's a dearth of detail as to th erequirement, but as Jeff didn't attack your point 5 (on normalisation) I will. I'm assuming that your aim is not the sort of data warehouse application where data deletes and updates (as opposed to insertions) don't happen so that most of normal best practise on normalisation can be discarded in favour of something which for most databases would be regarded as irrational - if you are aiming at that sort of thing the following may not apply.

In the absence of any stromg contrary indication, it is always best to normalise at least to Elementary Key Normal Form (EKNF). Most schema in Third Normal Form (3NF) are already in elementary Key Normal Form, and if you obtain a 3NF version of your schema by applying Bernsteins algorithm to ensure than all simple dependencies are enforced by the schema's keys you will always get EKNF. Don't even consider stopping at Second Normal Form (2NF) unless you are prepared to write a lot of complicated code to avoid bugs that should have been eliminated by going to a higher normal form.

If a schema in EKNF is not in BCNF, turning it into BCNF will be a nasty step backwards; so wherever you see a definition of a higher normal form which says it requires BCNF as a precondition just replace BCNF by EKNF.

If you want to go to Fourth Normal (4NF) form, use a 4NF definition that doesn't violate EKNF. That's a definition which makes it clear that a multi-valued dependency must be multi-valued - one single value isn't multiple values, all single value cases are dealt with correctly by EKNF (and most of them by 3NF and a lot by 2NF) but some of them may be handled incorrectly if a 4NF definition which treats single value cases as multi-valued is used. Going to 4NF can be quite difficult if you have a very complex schema, but for some schemas it will reduce development costs quite a lot because it eliminates the need to code around some of the anomalies that can arise from the schema not adequaltely representing multi-valued dependencies. Most schemas in EKNF are already in 4NF anyway. whether you stop at EKNF or go on to 4NF (in the rare cases when it is different) needs evaluating carefully, but it's generally best to go to 4NF unless you can see a good reason not to.

Going to Fifth Normal Form(5NF) again has to be evaluated carefully (only in the rare cases where 4NF isn't already 5NF, of course). I don't have any view on whether it is usually helpful or not.

Going to Sixth Normal Form (6NF) depends on what you mean by 6NF: there are two completely unrelated definitions sculling around. The original 6NF is now usually called Domain Key Normal Form, and it isn't clear whether it's always achievable - so it's probably best not to go there. The newer definition (due to Chris Date) is designed for data with certain special properties, and should only be used where relevant. If you are looking at data where it is relevant, I think you should probably use it if you are in an OLTP environment but perhaps not if you are in a data warehouse environment, but that's just a personal view - I suspect that there are as many different views on Date's 6NF as there are people who understand its definition.

Tom

Jeff Moden SSC Guru Points: 1004669 More actions · Answer 3

Heh... I normally avoid conversations about which normal form to use because even after such a discussion, people tend not to follow the best practices anyway. But to briefly render my opinion on the question of...

5.Until which Normal Form it is advisable? can we go until 2nd Normal form?

... I have to say, "It Depends". In almost every case, I suggest meeting at least the 3rd normal form that Tom spoke of. Exceeding that can certainly have benefits as Tom pointed out. For most tables, going less than 3rd normal form will bite you in the butt in a major fashion somewhere down the line.

Please don't take this as a slam of any kind but, based on your questions in general, I'd also like to suggest that you get some addition training on database design before you start the project or hire a good consultant on database architecture. You have the rare opportunity to be on the ground floor of doing a database right. Don't cheat yourself... get the right training to actually do it right.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mailtonoorul Valued Member Points: 71 More actions · Answer 4

whether it is mandatory that table should be in 3rd normal form?whether it can be in 2nd normal form?

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 5

mailtonoorul (5/27/2014)
whether it is mandatory that table should be in 3rd normal form?whether it can be in 2nd normal form?

Tom and Jeff both answered that.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass