Identity column and "EntityID" column naming convention

  • This isn't* specific to 2022, but SQL in general.

    I will be joining a full green field development project of a new application.

    One of the things I will be setting as a standard will be to ban the use and visibility of identity primary key columns to end users or (especially) report writers and instead use separate natural identifiers that are agnostic to the primary key.

    I am struggling with naming conventions to use for the identity column and the human-readable identifier column.

    Naming identity columns just 'ID', then foreign key columns table_ID creates ambiguity in joins with multiple tables. (But EF also likes it when identity columns are named ID.)

    If I name them TableID, then it creates ambiguity in attributes like 'Product ID' or 'Employee ID.'

    I experimented with naming them TableIdent then using the same column name for the foreign key column, but this got pretty verbose in wide junction tables and tables with a lot of look ups and maintaining the relationships in EF got extremely confusing in C# naming conventions. (Implementing it database first is still a possibility, EF just seems easier to iteratively add more entities to the database)

    Any recommendations?

  • You can't guarantee that natural keys will always be a single column.

    For artificial integer keys, I recommend using <TableName>ID (where <TableName> is the actual table name)to avoid ambiguity and confusion. And use the same name on the foreign key side, adding further descriptors when necessary because you have multiple foreign keys to the same table. Use naming consistently across the domain -- don't call a thing something different just because it occurs in a different table. If you need to rename them in the output for clients, do so there.

    I hate seeing a soup full of ID's in a query full of aliased tables (especially if aliased with only one or two letters) and trying to determine what each ID represents.

    I've adopted the convention of using <TableName>Identifier for character/guid IDs.

  • Correct - I didn't mean to say I would be using natural keys.

    Every table will have a surrogate key that no user will be allowed to see or use and every entity will have a human-consumable identifier (even if it is a guid) so that we can use the ID everywhere and not have the current system of a bunch of things being named randomly all over the place that somewhat works in the job/report because they use surrogate keys from the source system, but isn't discoverable anywhere else because they are named incorrectly.

    Or even worse, having to go through the ERP system and append strings to some of the identifiers with non-sargable queries because the source system identifier collided with an identifier already in ERP

  • I like the idea of splitting the identity keys, artificial keys, from the natural keys. I also like the idea of hiding the artificial keys from the end users. I'm not sold on the idea of creating a second artificial key and giving that to the users. Seems like extra work for not much return. Instead, I'd just use the natural key (which I'd enforce on every table, regardless), even if it is multi-column.

    As to naming, it's all about clarity. TableID is pretty clear. Whether or not you camel case or put an underscore in or whatever, I don't care. Let's just make things clear through an easily understood pattern. You're 100% correct that having every table have an ID column, that transforms to TableID in FKs is a nightmare. As to your added artificial column, again, go with clarity. Ident is OK, but feels redundant to ID. Maybe switch things. Call your hidden artificial key columns TablePK, then call the exposed ones TableID? Again, just go for something clear and easy to understand.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Ok I like the EntityPK name, it seems really obvious now, don't know why I didn't think of it.

    There would only be one surrogate key and the one natural 'key.' Currently everyone knows the primary keys for everything, and I have problems on a regular basis where a record will get messed up then either be deleted or set to an inactive status, breaking stuff or producing plausibly correct but incorrect data indefinitely. It also enables them to make up fake codes and IDs out of thin air that spread all over the place with production data, collide regularly and then must be translated back into whatever it is really supposed to be.

    I did have an application at one time that used two surrogate keys, the primary key that was hidden, and a guid that users could have which was used to track changes to the report. It only was used when a report was approved, then later discovered to have a problem. A new row would get inserted with the same guid, the contents of the existing report inserted into the new row, then changes made to the original row. It kept the references intact allowing a problem in a single entity to be fixed without having to rewrite the whole report even if they entered the wrong Report number. Not quite normalized but it was how that was done in that app.

  • I do not understand why you are keen on separating natural and surrogate keys.

    In my experience any rules that require thought and skills to implement very rarely survive the tenure of the person who imposed them. Consider what your application might look like at few years after you have moved on, and try to design something that can cope with newcomers who might not share your mindset.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • Use of natural keys for table relations will not be tolerated in my databases.

  • That got posted before I intended.

    Natural keys for relational references within the DB are always bad. Natural keys can change which destroys referential integrity, they can't survive data corrections, they aren't portable outside of the table and create unnecessary dependencies for the sake of the database engine that users and applications don't care about.

    If natural keys are used, a change in HR system can break everything that used HR automation. If a product ID is changed, a simple update statement becomes a massive data migration. If an accounting transaction needs to be corrected, it will be impossible to maintain an intact accounting ledger using a natural key for the transaction. If the natural key is used outside of the application in a report or something and anything involving a change to the natural key takes place, it will be impossible to correlate what is in the report back to the data it came from.

  • CreateIndexNonclustered wrote:

    That got posted before I intended.

    Natural keys for relational references within the DB are always bad. Natural keys can change which destroys referential integrity, they can't survive data corrections, they aren't portable outside of the table and create unnecessary dependencies for the sake of the database engine that users and applications don't care about.

    If natural keys are used, a change in HR system can break everything that used HR automation. If a product ID is changed, a simple update statement becomes a massive data migration. If an accounting transaction needs to be corrected, it will be impossible to maintain an intact accounting ledger using a natural key for the transaction. If the natural key is used outside of the application in a report or something and anything involving a change to the natural key takes place, it will be impossible to correlate what is in the report back to the data it came from.

    BUT

    Enforcing natural keys is still a necessary part of most databases. It's more than fine to use artificial keys. Faster, easier to maintain, don't muck up the system as you describe, absolutely a fan. However, they frequently can't uniquely identify whatever value or values do, naturally, define a unique row from a business standpoint. As such, I've almost always had an artificial key, and a natural key, enforced on my tables. Sometimes, seldom, but sometimes, that's a single constraint. Usually though, it's two.

     

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I hadn't considered that. I will definitely have duplicates of natural keys that exist in different statuses while some of the status should be able to duplicate and Active statuses should not be possible to duplicate.

    I'm curious if a unique filtered index could enforce that, but I hate to hard code values into a filtered index when a list of active statuses could increase or reduce over time...

  • CreateIndexNonclustered wrote:

    I hadn't considered that. I will definitely have duplicates of natural keys that exist in different statuses while some of the status should be able to duplicate and Active statuses should not be possible to duplicate.

    I'm curious if a unique filtered index could enforce that, but I hate to hard code values into a filtered index when a list of active statuses could increase or reduce over time...

    In that case, you may have to have the ACTIVE value, whatever that is, be a part of the natural key... maybe. I'd have to see the structures & understand the behaviors better.

    You could use filtered index in that way, yeah.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Yeah, an active status is easy when there are only two status, but more difficult when there are multiple active and inactive status each.

    Got it worked out though, little messy and not a big fan of the value of the primary key being meaningful, but that's all my time for thought experiments today:

    CREATE TABLE [dbo].[ReportAdministrativeStatus] (
    [ReportAdministrativeStatusPK][smallint]identity(0,-1) not null,
    [ReportAdministrativeStatusName][varchar](50) not null,
    [ReportAdministrativeStatusIsActive][bit] default 0 not null,
    CONSTRAINT [PKReportAdministrativeStatusPK] PRIMARY KEY CLUSTERED ([ReportAdministrativeStatusPK] ASC)
    );

    CREATE UNIQUE NONCLUSTERED INDEX [ix_ReportAdministrativeStatusPK_ReportAdministrativeStatusIsActive]
    ON [dbo].[ReportAdministrativeStatus]
    (
    [ReportAdministrativeStatusPK] ASC
    ,[ReportAdministrativeStatusIsActive] ASC
    )
    WHERE [ReportAdministrativeStatusPK] < 0
    AND [ReportAdministrativeStatusIsActive] = 0;

    SET IDENTITY_INSERT [dbo].[ReportAdministrativeStatus] ON;

    INSERT INTO [dbo].[ReportAdministrativeStatus] (
    [ReportAdministrativeStatusPK]
    ,[ReportAdministrativeStatusName]
    ,[ReportAdministrativeStatusIsActive]
    )
    VALUES
    (0,'Active',1),
    (1,'Active - Compliance Lock',1),
    (-1,'Inactive - Employee Retired',0),
    (-2,'Inactive - Employee Deceased',0),
    (-3,'Inactive - Employee Terminated',0),
    (-4,'Inactive - Executive Request',0);


    CREATE TABLE [dbo].[Report] (
    [ReportPK][int]identity(0,1) not null,
    [ReportId][varchar](10) not null,
    [ReportStatusPK][tinyint] not null,
    [ReportAdministrativeStatusPK][smallint] not null,
    CONSTRAINT [PKReportPK] PRIMARY KEY CLUSTERED ([ReportPK] ASC)
    );

    CREATE UNIQUE NONCLUSTERED INDEX [ix_ReportId_ReportAdministrativeStatusPK]
    ON [dbo].[Report] ([ReportId] ASC)
    WHERE [ReportAdministrativeStatusPK] >= 0;

    ALTER TABLE [dbo].[Report]
    ADD CONSTRAINT [FKReportAdministrativeStatus]
    FOREIGN KEY ([ReportAdministrativeStatusPK])
    REFERENCES [dbo].[ReportAdministrativeStatus]([ReportAdministrativeStatusPK]);

    CREATE NONCLUSTERED INDEX [ix_ReportAdministrativeStatusPK]
    ON [dbo].[Report] ([ReportAdministrativeStatusPK]);

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply