How to Improve Database Design to Speed Up SQL Queries

  • Comments posted to this topic are about the item How to Improve Database Design to Speed Up SQL Queries

  • The Social Security Number may be theoretical unique per natural person, but there are people without (e.g. foreign customers). And as in any systems there are "holes" where people got two numbers and they will be merged. Or two people got the same (because two John Smiths were born at the same day in a small town and maybe the parents are even neighbors) and this will be fixed later.

    Besides of this the SSN falls under the data protection rules and now imagin, that you have used it in 100 sub tables as foreign key - a data protection nightmare, particularly when you have to change it.

    And then the SSN is usually be saved as VARCHAR / CHAR(12) (very seldom as 8 byte BIGINT), while an customer_id will be usually a 4 byte INT (if  you are not Facebook, Google or Microsoft it will be unlikely that you will have more tha 2.4 billion customers and even in this case you could restart with negative numbers to double the range or have enough money to switch to a BIGINT). Yes, disk space is cheap now, but SQL Server can much better join on as small INT than on a big VARCHAR. And RAM is for many SQL Servers still a rare good, many are limited to e.g. 128 GB because they run on Standard Edition.


    Regarding "Use of parent and child tables" - Normalization is the absolute basic stuff regarding databases and should be done without mentioning it. But since your article is about speeding up queries - in this case normalization slows down your queries, because you have to do the lookup / join. For this reason data warehouses (DWH) will be often denormalized to speed up the queries.


    Indexing: recommending to convert all names to upper cases to allow case insentive searches is a bad advice. MS SQL Server is per default case insensitive, so there is no need to do this. Furthermore it will often lead to something as "JOIN UPPER(p.name) = UPPER(o.name)" which is non-SARGable and forces a slow index scan instead of a fast index seek.


    Partitioning: Yes, this can improve the performance by a lot, BUT the partition column MUST be included in any UNIQUE index (-> you can have two times the ID 1242 with two different creation_dates) AND you must include the partitioning column in any WHERE or JOIN on the table, otherwise the query will be slower than without the partitioning. And the creation_date is often a very bad partitioning column (because it is more or less random), contrary e.g. to a order_date. Of course a creation_date may be a good option too, e.g. for news, messages, posts etc. where you usually query only the last x days, but you have to know your datas and their common usage very well to plan this.

    • This reply was modified 11 months, 2 weeks ago by  Thomas Franz.

    God is real, unless declared integer.

  • This was removed by the editor as SPAM

  • The discussion regarding centralized versus distributed database systems indicates a misunderstanding of the difference between the two concepts.

    In a centralized system, the data exists on a single server so CRUD operations have immediate Consistency. In a distributed system, copies of the data exist on multiple servers and CRUD operations are applied to all of them, leading to Eventual Consistency.

    The example regarding multiple state agencies combining their database operations into a single database is completely off-point. That speaks to taking disparate, previously unrelated databases and combining them into a single database. The final, combined-agency database could be centralized OR distributed. There are valid arguments in favor of each model.

    An example of change from a distributed database to a centralized database would be something along the lines of, "We normalized and structured our Cassandra data and migrated it to an on-premises Oracle server."

    Programmers are tools for converting caffeine into code. - Anonymous

  • Yeah. I'm no expert but SSN is one of the classic examples of a trap for a natural unique ID. Your average person probably thinks they are unique, static and globally assigned but none of those are actually true in real life. The government is not perfect and might assign the same SSN multiple times. Some people might not get an SSN assigned a birth so even US citizens might not have one yet. Not to mention the possibility of fraud or just simple data quality issues. Maybe the patient had a dyslexic moment filling out the paper work or their handwriting is horrible and the data entry person couldn't read the numbers. So many reasons why someone might not have an SSN or it might not be unique either in real life or  just in your data. Making it your PK for patient is just asking for headaches in your data.

    Even if we ignore all those facts HIPAA and other data protection laws pretty much guarantee that you don't want to use SSN as someone's ID for tons of reasons. You should be encrypting it when storing it. I haven't tried it but would SQL even allow you to have a primary key on an encrypted column? I wouldn't be surprised if it didn't allow it and even if it does it sound like a horrible idea. You have to limit who has access to sensitive information like an SSN and that makes it a horrible unique ID. You now can't display the unique ID everywhere and anywhere. You can't print it on all the paper work for easy look up of the patient.

    Then there is always the fact that the government itself says companies should not use SSNs as a unique identifier for people or for confirming their identity. It is a common practice but is discouraged and that seems like an exceptionally poor choice for a Primary key.

    The table normalization also looks pretty poor.

    Why would you have the patients name and SSN on the Claim table? Wouldn't normalization specify a Patient table with their name SSN and other information on it for the same reason you'd create a Doctor table. One Patient could easily have multiple claims stored in your database and there's no reason to repeatedly store the patients general information that isn't specific to that particular claim.

    Offhand, I'd think that certainly Claim to Doctor can be a one to many relationship as one claim could easily involve multiple doctors but at the same time doctors will obviously work on many different claims for many different patients so a parent child relationship doesn't really make sense so you probably need a linking table so that each claim can have multiple doctors on it and each doctor can be on many claims. Depending on the rest of the design of your system you might also need that for patients too if a claim can be for multiple patients. The most obvious example seems like a birth where you have the mother and the child(ren). Making this multiple claims because each claim can only have one patient just seems messy and problematic but knowing insurance you might be doing it anyway because of separate deductibles and coverage differences might apply.

  • I admit that for Medicare patients, as specified by the author, SSN might be valid to use (to be eligible, a person would need an SSN (or perhaps a TIN)).  But, you would limit SSN to appearing in only one table and not waste space storing dashes with it.  To me, encrypting it was outside the scope of the original article.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • This was removed by the editor as SPAM

  • To me, encrypting it is always in scope.  I just got yet another hack-attack alert about 6 million peoples' data being stolen including SSNs.  Seems like I get one every week now and most of them are related to businesses in the medical community.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Vik,

    Thanks for writing the article. Unfortunately based on my many years of experience in designing and developing databases and data warehouses, for optimisation, I do disagree with your suggestion of using composite text keys for primary Keys for this purpose it may be more efficient to use GUIDs. for the following reasons:

    1. Indexing would lead to extensive overheads and consume more resources etc.
    2. This practice is also discouraged by the Kimball's principles.

    In practice Such keys are used for  BusinessIDs in the master table  but not for foreign keys etc. and certainly not long and cumbersome composite indexes.

    Kind regards,

    Roy

  • This reply has been reported for inappropriate content.

    Improving database design can have a significant impact on the speed of SQL queries. Here are some tips to optimize your database design and enhance query performance:

    • Indexing: Proper indexing can greatly improve query performance. Identify the columns frequently used in WHERE, JOIN, and ORDER BY clauses, and create indexes on those columns. Be cautious not to over-index, as it can slow down data modification operations.
    • Normalize the schema: Normalize your database schema to eliminate data redundancy and improve data integrity. This involves breaking down data into logical tables and reducing data duplication. Normalization helps minimize the storage required and can speed up query execution.
    • Denormalization: While normalization is generally recommended, there are cases where denormalization can improve query performance. Denormalization involves adding redundant data or pre-calculating values to avoid expensive joins or calculations. This can be beneficial for read-heavy applications or when dealing with complex queries.
    • Partitioning: Partitioning divides large tables into smaller, more manageable parts based on a specific criterion (e.g., date range or key range). Partitioning can enhance query performance by reducing the amount of data that needs to be scanned.
    • Query optimization: Review and optimize your SQL queries to ensure they are structured efficiently. Use appropriate JOIN types, filter rows early in the query using WHERE clauses, and avoid unnecessary calculations or function calls in the SELECT clause. Analyze query execution plans to identify areas for improvement.
    • Database indexing statistics: Keep your database's indexing statistics up to date. These statistics help the query optimizer make better decisions when creating execution plans. Regularly update statistics to ensure accurate query optimization.
    • Avoid unnecessary data retrieval: Retrieve only the necessary columns and rows. Minimize the amount of data transferred between the database and the application. Use SELECT statements with specific column names instead of selecting all columns using.
    • Optimize data types and table structure: Use appropriate data types that match the nature of the data. Avoid using large data types when smaller ones would suffice. Additionally, ensure that your table structure is well-designed, avoiding excessive column sizes and null values where possible.
    • Proper hardware and configuration: Ensure that your database server is properly configured and has sufficient hardware resources such as CPU, memory, and storage to handle the workload. Proper server configuration can significantly improve query performance.
    • Regular maintenance: Perform regular database maintenance tasks, such as index rebuilding, statistics updating, and data purging/archiving. This helps keep the database in optimal condition and avoids performance degradation over time.

    Remember, the effectiveness of these optimizations may vary depending on your specific database system, workload, and data characteristics. It's essential to benchmark and test the impact of any changes before applying them to a production environment.

  • Muskan wrote:

    Improving database design can have a significant impact on the speed of SQL queries. Here are some tips to optimize your database design and enhance query performance:

    • Indexing: Proper indexing can greatly improve query performance. Identify the columns frequently used in WHERE, JOIN, and ORDER BY clauses, and create indexes on those columns. Be cautious not to over-index, as it can slow down data modification operations.
    • Normalize the schema: Normalize your database schema to eliminate data redundancy and improve data integrity. This involves breaking down data into logical tables and reducing data duplication. Normalization helps minimize the storage required and can speed up query execution.
    • Denormalization: While normalization is generally recommended, there are cases where denormalization can improve query performance. Denormalization involves adding redundant data or pre-calculating values to avoid expensive joins or calculations. This can be beneficial for read-heavy applications or when dealing with complex queries.
    • Partitioning: Partitioning divides large tables into smaller, more manageable parts based on a specific criterion (e.g., date range or key range). Partitioning can enhance query performance by reducing the amount of data that needs to be scanned.
    • Query optimization: Review and optimize your SQL queries to ensure they are structured efficiently. Use appropriate JOIN types, filter rows early in the query using WHERE clauses, and avoid unnecessary calculations or function calls in the SELECT clause. Analyze query execution plans to identify areas for improvement.
    • Database indexing statistics: Keep your database's indexing statistics up to date. These statistics help the query optimizer make better decisions when creating execution plans. Regularly update statistics to ensure accurate query optimization.
    • Avoid unnecessary data retrieval: Retrieve only the necessary columns and rows. Minimize the amount of data transferred between the database and the application. Use SELECT statements with specific column names instead of selecting all columns using.
    • Optimize data types and table structure: Use appropriate data types that match the nature of the data. Avoid using large data types when smaller ones would suffice. Additionally, ensure that your table structure is well-designed, avoiding excessive column sizes and null values where possible.
    • Proper hardware and configuration: Ensure that your database server is properly configured and has sufficient hardware resources such as CPU, memory, and storage to handle the workload. Proper server configuration can significantly improve query performance.
    • Regular maintenance: Perform regular database maintenance tasks, such as index rebuilding, statistics updating, and data purging/archiving. This helps keep the database in optimal condition and avoids performance degradation over time.

    Remember, the effectiveness of these optimizations may vary depending on your specific database system, workload, and data characteristics. It's essential to benchmark and test the impact of any changes before applying them to a production environment.

    ChatGPT?

  • Jonathan AC Roberts wrote:

    Muskan wrote:

    Improving database design can have a significant impact on the speed of SQL queries. Here are some tips to optimize your database design and enhance query performance:

    • Indexing: Proper indexing can greatly improve query performance. Identify the columns frequently used in WHERE, JOIN, and ORDER BY clauses, and create indexes on those columns. Be cautious not to over-index, as it can slow down data modification operations.
    • Normalize the schema: Normalize your database schema to eliminate data redundancy and improve data integrity. This involves breaking down data into logical tables and reducing data duplication. Normalization helps minimize the storage required and can speed up query execution.
    • Denormalization: While normalization is generally recommended, there are cases where denormalization can improve query performance. Denormalization involves adding redundant data or pre-calculating values to avoid expensive joins or calculations. This can be beneficial for read-heavy applications or when dealing with complex queries.
    • Partitioning: Partitioning divides large tables into smaller, more manageable parts based on a specific criterion (e.g., date range or key range). Partitioning can enhance query performance by reducing the amount of data that needs to be scanned.
    • Query optimization: Review and optimize your SQL queries to ensure they are structured efficiently. Use appropriate JOIN types, filter rows early in the query using WHERE clauses, and avoid unnecessary calculations or function calls in the SELECT clause. Analyze query execution plans to identify areas for improvement.
    • Database indexing statistics: Keep your database's indexing statistics up to date. These statistics help the query optimizer make better decisions when creating execution plans. Regularly update statistics to ensure accurate query optimization.
    • Avoid unnecessary data retrieval: Retrieve only the necessary columns and rows. Minimize the amount of data transferred between the database and the application. Use SELECT statements with specific column names instead of selecting all columns using.
    • Optimize data types and table structure: Use appropriate data types that match the nature of the data. Avoid using large data types when smaller ones would suffice. Additionally, ensure that your table structure is well-designed, avoiding excessive column sizes and null values where possible.
    • Proper hardware and configuration: Ensure that your database server is properly configured and has sufficient hardware resources such as CPU, memory, and storage to handle the workload. Proper server configuration can significantly improve query performance.
    • Regular maintenance: Perform regular database maintenance tasks, such as index rebuilding, statistics updating, and data purging/archiving. This helps keep the database in optimal condition and avoids performance degradation over time.

    Remember, the effectiveness of these optimizations may vary depending on your specific database system, workload, and data characteristics. It's essential to benchmark and test the impact of any changes before applying them to a production environment.

    ChatGPT?

    I would concur, looks like ChatGPT.

  • My thoughts on this are that this article is quite shallow and lacks breadth.

    1. Indexes -- Yes they can always create the potential to speed things up but will slow down your inserts and maintennce operations. You have to have them fairly orderly because a sufficietly fragmented index is of no use.
    2. Partitioning -- Its been repeated here on SSC that partitioning is not in general a tool for performance. If I need to count how many people ordered a certain item from me... well I have to piece back those partitions to get back my table to query from. Partitions help with maintenance operations, ideally speaking.
    3. SSN as a primary key. This is not a good idea. You are basically saying you will use another organisations Identifying number/system as your own. As another person mentioned, any mistakes (accidental duplications) will be yours as well. It could serve as a business key but surrogoate keys exists for a reason. Surrogates are not to be avoided just because someone does not want to deal with the pain of creating and utilizing them. Joining tables on INTs is much preferable to joining on a VARCHAR(n).
    4. This whole parent child thing... I think the only thing that points to is what we call "noramalization." Does breaking things apart really makes things faster to read knowing I have to piece things back together? A database design geared towards faster reads is "denormalised."

     

    ----------------------------------------------------

  • Muskan wrote:

    Improving database design can have a significant impact on the speed of SQL queries. Here are some tips to optimize your database design and enhance query performance:

    • <b>....</b>
    • ...
    • ..
    • Denormalization: While normalization is generally recommended, there are cases where denormalization can improve query performance. Denormalization involves adding redundant data or pre-calculating values to avoid expensive joins or calculations. This can be beneficial for read-heavy applications or when dealing with complex queries.
    • ...

    ...

    ...

     

    .

    This section make no sense put together as it is.

    LOL, denormalization ....It probably meant NOT recommended.

    It can speed up reads?  As if this was the exception

    This looks AI generated, and off target with meaning.

     

     

    ----------------------------------------------------

  • MMartin1 wrote:

    Muskan wrote:

    Improving database design can have a significant impact on the speed of SQL queries. Here are some tips to optimize your database design and enhance query performance:

    • <b>....</b>
    • ...
    • ..
    • Denormalization: While normalization is generally recommended, there are cases where denormalization can improve query performance. Denormalization involves adding redundant data or pre-calculating values to avoid expensive joins or calculations. This can be beneficial for read-heavy applications or when dealing with complex queries.
    • ...

    ...

    ...

    .

    This section make no sense put together as it is.

    LOL, denormalization ....It probably meant NOT recommended.

    It can speed up reads?  As if this was the exception

    This looks AI generated, and off target with meaning.  

    Actually I think denormalization is often used to speed up queries when designing a database or a data warehouses as it removes a table and a join to that table in a query.

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply