Multi Tenancy Key Design

  • Hi

    I try to design multiple customers share one db schema. I tried to use a natural key approach but ended up with some 8 column primary keys.

    So I think I should use surrogates. But now I need to ensure that a tenant can only refer to its own data. My idea is to have the tenant_id field in each table and each primary key and foreign keys share this column.

    Example:

    CREATE TABLE MasterOne (

    master_one_id INT IDENTITY,

    tenant_id SMALLINT,

    ...

    CONSTRAINT PK_MasterOne PRIMARY KEY (master_one_id, tenant_id)

    )

    ALTER TABLE MasterOne

    ADD CONSTRAINT FK_MasterOne_Tenants

    FOREIGN KEY (tenant_id) REFERENCES Tenants (tenant_id)

    ----------------------------

    CREATE TABLE MasterTwo (

    master_two_id INT IDENTITY,

    tenant_id SMALLINT,

    ...

    CONSTRAINT PK_MasterTwo PRIMARY KEY (master_two_id, tenant_id)

    )

    ALTER TABLE MasterTwo

    ADD CONSTRAINT FK_MasterTwo_Tenants

    FOREIGN KEY (tenant_id) REFERENCES Tenants (tenant_id)

    ----------------------------

    CREATE TABLE Data (

    data_id INT IDENTITY,

    tenant_id SMALLINT,

    master_one_id INT,

    master_two_id INT,

    ...

    CONSTRAINT PK_Data PRIMARY KEY (data_id, tenant_id)

    )

    ALTER TABLE Data

    ADD CONSTRAINT FK_Data_MasterOne

    FOREIGN KEY (master_one_id, tenant_id)

    REFERENCES MasterOne (master_one_id, tenant_id)

    ALTER TABLE Data

    ADD CONSTRAINT FK_Data_MasterTwo

    FOREIGN KEY (master_two_id, tenant_id)

    REFERENCES MasterTwo (master_two_id, tenant_id)

    Does this pattern make any sense? Or has it any drawbacks other than the broader clustered index?

    Next question would be: Should I use custom generated sequence for each customer instead of the IDENTITY column. Are there any usable solutions? I came across the idea to use an LAST_ID column per table in customer/tenant table an increase the value when inserting new records (UPDATE table SET @next_id = LAST_ID = LAST_ID + 1). Has anbody used this one successfully? I think there would be problems if I use it in transactions.

    Thanks

    Thomas

  • A few thoughts:-

    Performance aside, does the tenant_id need to be part of the primary key?

    Should I use custom generated sequence for each customer instead of the IDENTITY column.

    You might have a scalability issue here if you have lots of inserts, as users may be blocked while trying to access the LAST_ID table, especially if this is done as part of a transaction). The generation of an identity doesn't get blocked and isn't deemed part of a transaction (if you roll back, that identity is not re-used)

    If you are using the Enterprise Edition, then you may want to consider PARTITONS, using your tenant_id as the partitioning column

  • Having the tenant_id in each table will not be a problem. You don't have to make it part of the clustered index, but since every query you run will probably havt it, I would. It will probably fragment your indexes a bit, but if you set up a good fillfactor and reindex regularly, small tables will not be an issue.

    For very large tables, you may want to consider partitioning, or just be very careful with your clustered index.

    As far as your own identity - don't do this unless you absolutely have to. It is usually a disaster. Just use the identity columns. They work fine, perform great, and generally don't cause too many conflicts. They can be a bit of work with replication, but are usually worth it.

  • It doesn't need to be part but when the tenant_id is included in the primary key (or an unique index) and every foreign key shares the same tenant_id field, there's no need for another constraint. In the Example above the Data table has only one tenant_id which is used in Data's PK and as part of the foreign key fields. Maybe there's another solution for this problem but I think the slightly larger index is better than trigger or something to ensure data integrity.

    English is not my first language. I hope you understand the idea.

    Okay. I forget about the custom identity.

  • The solution depends on the details of your business requirements. If the number of tenants is not large, one path is to create a new schema for each new tenant. You would have one "common" schema that contains lookups and other data domains common to all tenats. You will add a new schema for each new tenant with the entities that are tenant-specific. For the enterprise usage you would create another datab ase that eventially consolidates all the data together for the overall reporting and analysis.

    If you are to implement your solution with tenant_id in each tabel, consider NOT constraining it with FK. It is a sort of "special case" in my opinion. Otherwise your tenant table will have a FK to every other table. It is similar to having user_id of the person who modified the data last time, something like modified_by: I would not put a FK from the app_user table on every entity in your DB.

  • Thanks

    Few things:

    Our data model has about 100 tables. 10 or so are general lookups which could be shared in a "each tenant has its own schema" design. That means 90 tables per tenant or thousands of tables if we have more than a dozen customers. It's not unlikely that there will be more than a hundred customers. Is that the way to go? There will be most probably future changes to the model but no custom changes for a single tenant.

    What are the drawbacks of having FKs to each modify_user or other user field? Actually we did it that way. We have many user relations in the model. You don't have to put an index at all FK fields. So what are the costs of "too many" FKs? The additional checks for data integrity? Okay, it looks very confusing in our ERD tool 🙂

    In case of tenants there's no need for a FK to the main Tenants table because many tables "derive" the tenant_id through a foreign key.

  • I'll try to be brief.

    Don't think of it as "90 tables per tenant'. Think "one schema per tenant". All tenants have SAME schema. Their data sets do not intersect in any way (except for the "shared" lookups). Any change to DDL would apply equally to each schema. Doing it through a re-runnable script will help to automate the process and assure that all schemas are the same. Benefits are a small size of each table and absence of the tenant_id everywhere, meaning no need to worry about ref. integrity for the tenant_id domain. The down side is maintenance of multiple schemas instead of just one although all the schemas are the same.

    Yes, your analysis is consitent with mine: "it looks very confusingin our ERD tool". your user_id or tenant_id is not really data - it is more like metadata, and I would not put FK on metadata.

    God forbit putting tenant_id in EVERY PK or UK in the 90 tables. Performance hit on each entity might be small but overall they will add up. Besides, you will need to manage fillfactors very carefully, etc.

    By the way, what is your ERD tool?

  • Thanks. We'll think about the schema based separation.

    FKs: But how do you ensure data integrity? We need to have an existing user in the field, cause it is data shown to the user. Not having a FK allows us to put every value which has the proper data type in modify_user field. Trigger?

    ERD: As we don't have that large models we want a lightweight tool. We use Happy Fish (what a name). Okay it has its quirks an limitations (e.g. we haven't found a way to set if an index is clustered or not). But overall it is very useful. At least for us.

    Happy Fish: http://www.polderij.nl/happyfish/index.html

  • FK: Triggers are somewhat evil by nature (sorry for strong words). Your modify_user is set by your application, not by user input. Hopefully, you are using stored procedures. Then your modify_user value is managed there. Users themselves do not put their hands on altering those values. This should be good enough to ensure integrity. Again, metadata and referential integrity do not play well with each other. I am a strong defender of referential as well as attribute and domain integrity. We just need to apply those appropriately. We should understand the difference between data on business (data itself) and data on data (metadata).

    I wish you happy fishing!

    Michael Romm

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply