Help designing a patent database

  • I need some help with designing my database.

    I'm creating a database that holds patents.

    A patent has this data

    Patent number

    Title

    Classifications

    Inventors

    Filed date

    Issue date

    Referenced patents

    Referenced by patents

    Short description

    Since Inventors and Classification can be more than one, I have created three tables

    Patents

    Inventors

    Classification

    I have also created a two join tables

    PatentsInventors

    PatentsClassifications

    To top it off I have the Referenced patents and Referenced by patents, which are other patents (in the Patents table) that this patent has a reference to/from.

    That's two more tables

    PatentsReferenced

    PatentsReferencedBy

    After that rather long winded intro...

    Have I created a structure that is OK?

    How should I use Foreign Keys on these tables?

    I'm using MS SQL 2008 R2 Express, Windows 7 Ultimate

    // Anders

  • This was removed by the editor as SPAM

  • One thing to add - change your ID fields to int. Unless I'm missing something, there's no reason to make them varchar or any text based data type.

    Mark

  • The ids have characters in them...

    Maybe I should create a new column that holds the value and change patent_id and classification_id to INT and make them an Identity (I'm using MS SQL Server Management Studios label on it)

    Then I will automatically get a new unique id...?

    // Anders

  • Yes, create a PK on each table and make that the auto incrementing ID and set up FK constraints where appropriate. I tend to reserve "ID" for this field only.

    Mark

  • OK, I will do that.

    What are the advantages for using INT as PK (which I assume stands for Primary Key) vs CHAR?

    // Anders

  • The main advantage is the storage 4bytes against 20bytes.

    Another advantage would be the ease of creation for new identities (SQL Server will create them automatically).

    However, there's a constant debate on how you should declare you Primary Keys. The option for the auto-increment ints referred as surrogate keys that end-users shouldn't see and the option for natural keys that are defined by business rules (a good example would be the patent number that uses an alpha-numeric code).

    I would suggest to use both (a surrogate and a natural key) in the tables that apply. Natural keys will help you to identify the row and surrogate will help you with other processes and to ensure the uniqueness. I realize that it would sound redundant to use both, but you just use both in your primary table and use the surrogate for the foreign keys.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Another advantage of INT declared as IDENTITY for surrogate keys is the increased potential of keeping fragmentation down over time. Are patent ids ever-increasing in value over time as they are issued?

    More important than your primary key, which can be a logical choice like your alpha-numeric patent id if it is in fact unique and non-null, is the clustering key. A clustered index is more important to focus on when it comes to performance, space planning and maintainability. The primary key and clustering key do not necessarily need to be one in the same. Some reading for you:

    More considerations for the clustering key - the clustered index debate continues!

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply