Table structure with a VIN

  • Hi there, fairly new to SQL, been an access developer for 20+ years.

    Now using SQL 2008R2, very familiar in parts, very different in others, I'm learning every day.

    I am setting up a database and would like some advice on Primary keys for two tables Vins and Orders

    tblVins

    -------

    Vin -- 17 digits AlphaNumeric -- Primary Key Should it be VarChar or what???

    PoNum -- 15 digit AlphaNumeric -- Should it be VarChar or what???

    Other fields -- many other fields regarding that particular VIN

    tblOrders

    -------

    PoNum -- 15 digit AlphaNumeric -- Primary Key Should it be VarChar or what???

    Other fields -- many other fields regarding that particular PO

    Foreign Key = 1 PoNum to Many VIN

    I have heard always use a numeric generic ID, and then some conflicting advice regarding using real world numbers such as VIN, serial number, SSN and using that as an PK.

    If you use an ID field, is this below the structure?

    tblVins

    -------

    VinID -- numeric -- Primary Key

    Vin -- 17 digits AlphaNumeric -- Unique

    PoNum -- 15 digit AlphaNumeric

    Other fields -- many other fields regarding that particular VIN

    tblOrders

    -------

    OrdID -- numeric -- Primary Key

    PoNum -- 15 digit AlphaNumeric -- Unique

    Other fields -- many other fields regarding that particular PO

    Foreign Key = 1 OdrID to Many VinID

    Foreign Key = 1 PoNum to Many VIN

    What is the best choice??

    1.4M vins currently, expanding by 300-500k per year

    140k Orders, expanding by 20k per year.

    Thanks in advance

  • Ahh the debate between natural and surrogate keys. This has been going on since the dawning of relational data. With things like VIN it is probably easiest to use that as your primary key. If you do use an identity column as your primary key I would make it be nonclustered and add a clustered index to VIN.

    You mentioned several other natural key candidates. I know this isn't in your discussion but SSN is a horrible thing to use as a key. That is because it is personal and confidential data that can be used in identity theft. This type of data should always be encrypted.

    As a side note, I would recommend dropping the "tbl" prefix on every table. It doesn't provide any tangible benefit but it does make everything require more typing. I would think your table would be Vehicles with a column for VIN unless you are storing the actual VIN and some information about how it was generated and that sort of thing.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • So using two non-numeric fields as keys won't affect performance given the number of records that I'm talking about?

    Also what about the field type Char, VarChar ??

    (I just tossed in tbl to be clear what I was talking about, and I don't use SSN, just an example of the things I was reading)

  • Just wondering if I'm reading your various constraints right but you say you want a 1 to M relationship between PoNum and VIN, but that doesn't look possible. Did you mean you want a composite key of VIN, PoNum on your VIN table?

    As for the difference between using a numeric auto increment id or such vs using a primary key that occurs naturally in the data that's more up to you. There's real specific technical advantage to either one just based on the differences between a number and a string so it would be more based on how your data is structured and entered and how you plan to use it.

  • mwilliams4nc (6/4/2015)


    Hi there, fairly new to SQL, been an access developer for 20+ years.

    Now using SQL 2008R2, very familiar in parts, very different in others, I'm learning every day.

    I am setting up a database and would like some advice on Primary keys for two tables Vins and Orders

    tblVins

    -------

    Vin -- 17 digits AlphaNumeric -- Primary Key Should it be VarChar or what???

    PoNum -- 15 digit AlphaNumeric -- Should it be VarChar or what???

    Other fields -- many other fields regarding that particular VIN

    tblOrders

    -------

    PoNum -- 15 digit AlphaNumeric -- Primary Key Should it be VarChar or what???

    Other fields -- many other fields regarding that particular PO

    Foreign Key = 1 PoNum to Many VIN

    I have heard always use a numeric generic ID, and then some conflicting advice regarding using real world numbers such as VIN, serial number, SSN and using that as an PK.

    If you use an ID field, is this below the structure?

    tblVins

    -------

    VinID -- numeric -- Primary Key

    Vin -- 17 digits AlphaNumeric -- Unique

    PoNum -- 15 digit AlphaNumeric

    Other fields -- many other fields regarding that particular VIN

    tblOrders

    -------

    OrdID -- numeric -- Primary Key

    PoNum -- 15 digit AlphaNumeric -- Unique

    Other fields -- many other fields regarding that particular PO

    Foreign Key = 1 OdrID to Many VinID

    Foreign Key = 1 PoNum to Many VIN

    What is the best choice??

    1.4M vins currently, expanding by 300-500k per year

    140k Orders, expanding by 20k per year.

    Thanks in advance

    Have you spent any time studying Books On Line to learn about the various data types available in SQL?

    varchar is a Variable Character

    char is a fixed character datatype.

    If VIN is ALWAYS going to be 17 characters, use a char.

    Likewise with the PoNum.

    If they will contain strings of varying lengths, then use a varchar. Names are a good example of a field where a varchar would be used.

    I have heard always use a numeric generic ID, and then some conflicting advice regarding using real world numbers such as VIN, serial number, SSN and using that as an PK.

    You heard only part of the story. SQL has an IDENTITY property. This works in the same manner as an auto increment in Access.

    You would use this as your CLUSTERED (you need to look that one up too) primary key instead of the natural keys when there is a very high number of inserts into a table and the natural key is inserted in no apparent order.

    Why? Page splits and index fragmentation.

    Data in SQL is stored in rows, rows make up pages, and pages make up extents.

    When a record needs to be inserted that "goes in the middle" of a page, SQL needs to do a page split to put in the correct order. This is also one cause of index fragmentation.

    Page splits are costly operations, so we try to avoid causing them.

    If your CLUSTERED index or primary key is of every increasing values, then new rows are always inserted at the end into new pages.

    Is this a rule? No. But it is one way to do it.

    So yourself a favor and read the Stairway series on this site.

    http://www.sqlservercentral.com/stairway/

    And, database design theory and practices are the same regardless if you are using Access, SQL Oracle, MySql or any other relational database The best practices that you have been following in Access, from a design perspective, are the same.

    Good Luck!

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • it is 1 to Many...

    PO is issued for let's say 20 vehicles.

    20 Vins fill that Order

    Each Vin row has the POnum field

  • Thanks Michael for the advice, I will do some reading up on the site you advised.

  • A vehicle should always have a VIN, and if you look up by VIN, I would go with VIN. Define it as char(17) rather than varchar, though, to save the overhead (I don't believe a true VIN is ever less than 17 anyway).

    As to PoNum, first, go slap whoever thought a company needed fifteen characters for a po number. My guess is they built meaning into the po number value, which you shouldn't. [Yes, VIN has many built-in meanings in its value but it's a universal number that really requires it.] But since po num isn't just a simple number already, as it should be, I guess you'll have to live with it. Define it as char(15), for the same type of reasons, unless it's frequently shorter.

    I'd use an order_id simply because an order can exist without a po. Maybe in your current business situation it can't/won't, but in real life it can. I'd want to be able to add an order without having to know the po num first.

    Finally, given the: "1 PoNum to Many VIN" criteria, you'll need a table for those anyway, so I wouldn't repeat the PoNum in the VIN table. You could denormalize it, but normally you don't unless it's really too much overhead to look up, which in this case it shouldn't be.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Well each client has their own PO number scheme, and they submit to us their order. Those Order Numbers come attached to each VIN from each manufacturer, we have 10+.

    We used 15 as sort of a max -- some are 6 digits some 12 some more. Some are numbers, some alphas, some with spaces...

    I know it's ugly but unfortunately we have no control over it.

    So given the mix of characters should it be char or VarChar?

  • I'd say varchar ... and now you have a lot of people to slap :-D.

    Why on earth do people want characters in a po number??

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • I'll toss this out here as a starter and welcome any feedback from peers.

    First, don't prefix your table names with 'tbl'. Personally, I do prefix views with 'v' and functions with 'fn', only because they are derived programmatic things which should not be confused with base tables when reading code.

    I'm assuming your business purchases (and perhaps re-sells) automobiles, so you may want to track purchase orders and sales orders in separate tables.

    Vehicle identification numbers are practically random. Maybe not random to auto manufactorers, because they're making only a specific model and numbering them sequentially as they roll off the line, but to an auto retailer you're receiving many different makes, models, and years in no particular order. Also keep in mind that the person who receives the inventory may mis-enter the VIN and then have to go back and re-enter it, and there are a handful of reasons why it's problematic to update primary keys. So you will probably want to have a sequential surrogate key as the primary.

    If the PO is generated internally by your own purchase order system and is a sequential value, then you could just go ahead and use VPO as a stand-alone primary key. But purchase orders are also things can can require re-entry.

    You'll want primary key constraints, foreign key relationships, and clustered indexes on your sequential identifiers to support efficient joins.

    You'll want unique non-clustered indexs on your natural keys to support bookmark lookups. Also you'll probably want a default page fill of 80% and index padding to reduce rate of accumulating fragmentation, but that's a guess depending on usage.

    create table VehiclePurchases

    (

    VehiclePurchasesID int not null identity(1,1)

    constraint PK_VehiclePurchases

    primary key clustered

    with (fillfactor = 80, pad_index = on),

    VPO varchar(15) not null

    );

    create unique index UK_VehiclePurchases

    on VehiclePurchases( VPO )

    with (fillfactor = 80, pad_index = on);

    create table Vehicles

    (

    VehicleID int not null identity(1,1)

    constraint PK_Vehicles

    primary key clustered

    with (fillfactor = 80, pad_index = on),

    VIN varchar(17) not null,

    VehiclePurchasesID int null

    constraint FK_VehiclePurchases

    foreign key (VehiclePurchasesID)

    references VehiclePurchases (VehiclePurchasesID)

    );

    create unique index UK_Vehicles_VIN

    on Vehicles( VIN )

    with (fillfactor = 80, pad_index = on);

    create index IX_Vehicles_VehiclePurchasesID

    on Vehicles( VehiclePurchasesID )

    with (fillfactor = 80, pad_index = on);

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • mwilliams4nc (6/4/2015)


    it is 1 to Many...

    PO is issued for let's say 20 vehicles.

    20 Vins fill that Order

    Each Vin row has the POnum field

    With just the primary keys specified it is not 1 to M, as specified you would never be able to have more than 1 PoNum assigned to a VIN, but it sounds like a composite key is what is needed. Or an intersection table between your VIN table and your Orders table.

  • ScottPletcher (6/4/2015)


    Why on earth do people want characters in a po number??

    PO Number has become the name of the value to indicate how a company identifies a purchase order. They are NOT always numeric. We have literally hundreds of thousands of PO values that are not numeric. When our CSR team takes an order they ask the customer if they have a PO and many times it is just some words to help that customer match the invoice to their own customer invoices later.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • There is a standard format for VIN's since 1980.

    You can create a function that will validate the VIN

    Start here.

    http://en.wikipedia.org/wiki/Vehicle_identification_number

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • This concerns me:

    You heard only part of the story. SQL has an IDENTITY property. This works in the same manner as an auto increment in Access.

    You would use this as your CLUSTERED (you need to look that one up too) primary key instead of the natural keys when there is a very high number of inserts into a table and the natural key is inserted in no apparent order.

    Why? Page splits and index fragmentation.

    Data in SQL is stored in rows, rows make up pages, and pages make up extents.

    When a record needs to be inserted that "goes in the middle" of a page, SQL needs to do a page split to put in the correct order. This is also one cause of index fragmentation.

    Page splits are costly operations, so we try to avoid causing them.

    If your CLUSTERED index or primary key is of every increasing values, then new rows are always inserted at the end into new pages.

    so we are adding 2500+ vins every day and they are definitely all over the map in sequencing

    Would VinID / OrderID be better?

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply