Designing indexes for static archive table

  • We are rebuilding a section of our application to replace an old, unwieldy and slow version. The clients would like us to "mothball" it - there will be no INSERT/UPDATE/DELETE operations at all.

    My intention is to create a denormalised summary table purely for searching and display. There will be about 200,000 rows. I want to provide the client the ability to search very quickly using between 1-16 separate criteria (most are currently Foreign Keys or columns in the primary table). There will then be 6-7 columns to show the data retrieved. The table will contain the PK of the primary table to enable a click-through to the actual data (which will remain unchanged).

    What approach should I adopt for the indexes on the table to make searching as swift as possible? As I said, there will be a one-time-only data population operation.

    Below is the DDL for the table (identifying columns removed or renamed for confidentiality purposes)

    CREATE TABLE [dbo].[ArchiveSearch](

    [CheckId] [int] NOT NULL,

    [RetailerId] [int] NULL, -- Criteria

    [ZoneId] [int] NULL, -- Criteria

    [MarketId] [int] NULL, -- Criteria

    [BusinessRegionId] [int] NULL, -- Criteria

    [RegionId] [int] NULL, -- Criteria

    [BrandId] [int] NULL, -- Criteria

    [PIN] [varchar](10) NULL, -- Criteria

    [TechnicianId] [int] NULL, -- Criteria

    [AuditorId] [int] NULL, -- Criteria

    [ServiceAdvisorId] [int] NULL, -- Criteria

    [CheckDate] [datetime] NULL, -- Criteria

    [WidgetId] [int] NULL, -- Criteria

    [Status] [int] NULL, -- Criteria

    [CheckTypeIds] [xml] NULL, -- Criteria

    [Technician] [nvarchar](200) NULL, -- Display

    [Auditor] [nvarchar](200) NULL, -- Display

    [Retailer] [nvarchar](100) NULL, -- Display

    [Widget] [nvarchar](255) NULL, -- Display

    [CheckTypes] [nvarchar](max) NULL, -- Display

    [FailedChecks] [nvarchar](50) NULL, -- Display

    CONSTRAINT [PK_ArchiveSearch] PRIMARY KEY CLUSTERED

    (

    [CheckId] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

  • The same general rules are going to apply to this as they would to any other set of indexes. You're talking about having 1-16 different criteria for searching the table. There's no short answer for how to index that. You could try 16 different indexes and hope that index intersection supports what you need (although, that assumes that all 16 columns are good candidates for having a useful index, if the data is not selective the index will be a waste of time). That's unlikely to work very well. Instead, you can try to identify the most commonly combined columns in the sixteen and then create compound indexes based on the appropriate key values. The thing is, there's no way to tell you what to index without a more thorough understanding of the query patterns. Also, you need to consider the clustered index. That should usually be on the most common access path to the data because the clustered index defines data storage.

    Just a question, why denormalize the storage? Is performance slow if you just do the JOIN operations?

    ----------------------------------------------------
    The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
    Theodore Roosevelt

    The Scary DBA
    Author of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd Edition
    Product Evangelist for Red Gate Software

  • The current search function is extremely slow (of the order of 20-30 seconds), for reasons mainly to do with poor data design, cruft and arcane business logic. It's felt, among our team of generalists, that the best approach (given that there is to be no new data) would be to flatten the search. We have tried but failed to determine the most common search criteria - there are any number of combinations possible and nothing stands out as the most obvious. I wonder - could adding indexes that don't really contribute actually slow the process down? In other words, is there any downside to adding an index?

  • 200k rows isn't very large to begin with, and a medium sized server can easily table scan that within a few seconds with no indexing. But since it's for archival data, you'll need to optimize for 10x growth at least. For an entirely flattened table that is infrequently inserted but frequently queried by a wide range of predicates and columns, especially aggregate queries, that to me sounds like a good application of a Clustered ColumnStore index. For this option you would need 2014 Enterprise edition or higher.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Yeah, Columnstore is a good thing is you're doing lots of aggregates in the reports. If we're doing point lookups & just general data retrieval though, it falls down badly.

    If the data is not getting updated, the only cost of indexes is storage. However, just throwing indexes at the issue, and not resolving it, will be pretty darned frustrating. Gathering query metrics and doing some math to determine which filter criteria are used most frequently and, really important, which ones are grouped together the most frequently is going to lead to a much more satisfying conclusion.

    ----------------------------------------------------
    The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
    Theodore Roosevelt

    The Scary DBA
    Author of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd Edition
    Product Evangelist for Red Gate Software

  • The table will not grow beyond 200k rows. We can't use Clustered Columnstore indexes because it's SQL 2012 Standard.

  • Especially for a flattened table, using DATA_COMPRESSION = PAGE option on a conventional row store clustered index table should result in 50% - 70% compression, which does translate into fewer pages reads and significant query performance.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • That sounds like a good approach. I've now done the data import on a test backup database and I'll write some LINQ and do some testing on the retrieval times, adding or removing indexes as we go.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply