Database without ANY primary keys

  • Hi all,

    I work with a database that has no primary keys defined on any tables. As a customer's consultant I have no control over the supplier's database. It frustrates me that I have no control over the database and I'm looking for some points of discussion so I can go back to the supplier to ask them to review the design and consider making some changes.

    So I guess I'm just looking to provoke a discussion about this in order to gather some points I can then discuss with the supplier.

    Anyone got any comments...?

  • Questions:

    1. Are there any clustered indexes on the tables? Without them, indexing will be inefficient since each table without a clustered index will be a heap.

    2. Are you talking about natural primary keys or setting up IDENTITY columns that will be used as clustered primary keys? If IDENTITY columns exist then you are one step closer to making efficient indexing.

    Sometimes vendors treat SQL Server as if it were a kind of set-and-forget file system. That's not what it is.

    Perhaps we can help you educate your client.

    Thanks

    John

  • it might be the product of someone who thought to keep their database design as if it were secret and proprietary, and have no PK's so you can't deduce the design by following Foreign Key relationships.

    if that's true you could probably run a trace when the application is running, parse the JOIN criteria of any statements, and infer the PK/FK's to create.

    I've seen that quite a few times here on SSC, where someone thinks their database design, and the relationship between Invoice and InvoiceDetails was unique to their thought pattern and no one else would think of it.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Thanks, perhaps a bit more background is in order.

    The database was originally built years ago in Progress (old OpenEdge) but some Clients run it through the OpenEdge SQL DataServer which basically migrates the schema and data into SQL and then uses a schema placeholder to translate OpenEdge database queries into SQL queries. This means that the database can run on SQL whilst the main GUI and interface doesn't need re-coding and can continue to run in Progress 4GL.

    There genuinely are no (clustered) primary keys and very few clustered keys; most indices are non-clustered indices that yes will just exist on the heap. It's not a case of them trying to keep the db a secret as it is fairly self-explanatory; I used to work for the supplier and know the db inside out.

    I'd like an idea for what kind of impact having no primary, clustered keys at all would have on your typical transactional database...

  • Drammy (11/20/2013)


    Thanks, perhaps a bit more background is in order.

    (snip)

    I'd like an idea for what kind of impact having no primary, clustered keys at all would have on your typical transactional database...

    A lot. The heap would be a big slow-down.

    Is there an identity-type column (int or bigint) on any table?

    You can likely change the underlying indexing to clustered and rebuild all indexes, and gain performance, without impacting the app. Can you set up a test environment to verify this and to regression test before running such a script on production? You may not be able to specify PK in the DDL, since that brings restrictions along with it. But you can still create an unique clustered index on the natural primary key and achieve the identical effect.

    Obviously, with no defined PK, features like replication are not going to be a future option. If there is an unique identity key (a sequential integer field), you can cluster that column and reap maximum indexing benefits, especially since inserts will always go to the end of the cluster.

    Then once you get indexes clustered on all tables, go find Michelle Ufford's dba_indexdefrag freeware and schedule yourself a weekly index rebuild job that runs on a weekend schedule. If you wish, another step to the same job could update statistics, and you would reap best speed and keep your DB nice and clean.

    Thanks

    John.

  • Plus the fact that no unique constraints radically changes the optimizers choices for retrieving data. I'll bet you see a LOT more scans than you should since the optimizer has no way of knowing if your data is unique.

    Honestly, I'd be hard pressed to list the very large number of issues you're seeing here that you shouldn't in terms of performance. But the biggest worry would be data quality. Your business can quite easily insert duplicate data if there are no constraints in place (and probably has). That would be the first, and most important, point I'd raise.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks for all the guidance and help.

    I have plenty to be going back to them with and might even change some indexing on the primary tables and prove the performance improvement as part of my argument.

    Thanks again,

    Drammy

  • Are you talking about application performance or reporting performance.

    If it is reporting, then I woudl be tempted to replicate the database and create reporting indexes. In a large system the differing demands on OLTP and OLAP can cause conflicts as lots of indexes to support reporting will slow down the insert and update of records.

  • aaron.reese (12/11/2013)


    Are you talking about application performance or reporting performance.

    If it is reporting, then I would be tempted to replicate the database and create reporting indexes. In a large system the differing demands on OLTP and OLAP can cause conflicts as lots of indexes to support reporting will slow down the insert and update of records.

    That is a good idea, but SQL Server replication requires defined primary keys on all articles. Therefore, if reporting requires data in close to realtime, the PK issue that the original poster raised persists.

    JT

  • Yeah, understood - this is an application database only. No reporting requirements or indices on this database.

  • Where's Celko? I thought he'd be in here administering some kind of beat down based on the title alone. 🙂

  • Happy Christmas Joe 😛

    Which reminds me.

    Q) Why do Assembly programmers get Xmas and Halloween confused

    A) Because OCT31 = DEC25

    Anyway, JC wouldn't touch Progress with a barge pole - come to think of it, neither would I

    JC - maybe we should be nicer to him - it might also be his birthday

  • Don't we celebrate JC's birthday on the 25th regardless of when it actually is?

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply