Catalogue Hierarchy Design Strategy

  • Hi,

    I am implementing something akin to an online store which has a rather large catalogue of products (700,000+) within a 5 level catalogue hierarchy of approximately 500 nodes.

    One of the requirements of the application is to allow the user to traverse the catalogue hierarchy and see the number of products at each level, 'rolled up' from the levels below. After some investigation, I have implemented the hierarchy using 'Nested Sets' (looked at CTE but performance was poor) and been able to build a static table of product counts at each level which works nicely.

    However, I'm now considering how I might show the number of products at each level of the hierarchy when some filter is applied to the contents. For example, if the user were to search within our catalogue and I wanted to show the number of products at each level of hierarchy from the results of their search, I can't use any pre-compiled counts as this would be inaccurate due to the search filter in operation. I also cannot see how I could calculate this on-the-fly given the number of records we have and the number of searches which will take place (approx 10,000 per hour).

    Has anyone else looked at this problem?

    Many Thanks

    James

  • Given an appropriate table design and indexing, I do not see why you would not be able to query the table and get results quickly enough.

    Perhaps you can post the table structure and any index information you have - along with the query you have already written.

  • Thanks for the reply. OK, so for table structures we have CAT_Product which is the product record, CAT_CatalogueSection which is the hierarchy table (with LeftExtent and RightExtent columns to make nested sets) and a CAT_CatalogueProduct which maps a product onto a CAT_CatalogueSection. Here's the DDL

    CREATE TABLE [dbo].[CAT_Product](

    [ProductID] [int] NOT NULL,

    [Description] [varchar](512) NULL,

    CONSTRAINT [PK_CAT_Product] PRIMARY KEY CLUSTERED

    (

    [ProductID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    CREATE TABLE [dbo].[CAT_CatalogueSection](

    [CatalogueID] [int] NOT NULL,

    [SectionID] [int] NOT NULL,

    [ParentID] [int] NULL,

    [SectionName] [varchar](255) NOT NULL,

    [DisplayOrder] [int] NOT NULL,

    [LeftExtent] [int] NULL,

    [RightExtent] [int] NULL,

    CONSTRAINT [PK_CAT_CatalogueSection] PRIMARY KEY CLUSTERED

    (

    [CatalogueID] ASC,

    [SectionID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    CREATE TABLE [dbo].[CAT_CatalogueProduct](

    [CatalogueProductID] [int] NOT NULL,

    [CatalogueID] [int] NOT NULL,

    [SectionID] [int] NOT NULL,

    [ProductID] [int] NOT NULL,

    CONSTRAINT [PK_CAT_CatalogueProduct] PRIMARY KEY CLUSTERED

    (

    [CatalogueProductID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    CREATE NONCLUSTERED INDEX [IDX_SectionID] ON [dbo].[CAT_CatalogueProduct]

    (

    [SectionID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    So, lets say for example the user searches for the term 'mouse', the SQL to get the number of matches in each catalogue section would be:-

    SELECT

    CS1.SectionName,

    COUNT(DISTINCT CS2.SectionID) As SubSectionCount,

    COUNT(DISTINCT P.ProductID) As ProductCount

    FROM

    CAT_CatalogueSection CS1

    INNER JOIN

    CAT_CatalogueSection CS2

    ON

    (CS2.LeftExtent BETWEEN CS1.LeftExtent AND CS1.RightExtent AND CS2.LeftExtent <> CS1.LeftExtent)

    INNER JOIN

    CAT_CatalogueProduct CP

    ON

    CP.SectionID = CS2.SectionID

    INNER JOIN

    CAT_Product P

    ON

    P.ProductID = CP.ProductID

    WHERE

    P.Description LIKE '%mouse%'

    GROUP BY

    CS1.SectionName

    ORDER BY

    ProductCount DESC

    (Note that we will be using Full Text indexing for the search but the principal is the same).

    Running this query takes about 10 seconds on my machine here (Dual Xeon, 4GB RAM, SQL Server 2005) which is far too slow. I need less than 0.5s response time ideally.

    Any advice appreciated

    James

  • I need you to post something that will generate test data. The Left/Right Extent design does not make any sense to me and I do not want to waste more time trying to figure it out.

    You certainly have some missing indexes.

    FullText searching is going to have a significant impact if you have a lot of products. At the moment, you are searching for a product name with criteria starting with a wildcard. Even if you had an index on the product description, the wildcard at the beginning makes the index useless. FullText searching would make the criteria a word and it would be able to use the indexing to find it.

  • Hi Michael,

    The LeftExtent/RightExtent is from Joe Celko's Nested Set pattern (http://www.ibase.ru/devinfo/DBMSTrees/sqltrees.html) which allows fast searching of leafs and nodes in a hierarchy.

    I take your point about the Full Text Index but this is a simplified example of what we actually because I didn't want to confuse the issue. What I wanted was a general discussion on how this kind of thing might be implemented in a scalable way rather than a discussion of the implementation details.

    Regards

    James

  • Take a good look at your query. The columns used in the joins are all candidates for needing an index. Without some kind of realistic data, I cannot tell you which ones need to be indexed.

    My first guess would be all of the "SectionID" fields used in the joins, the "ProductID" fields used in the joins, and a composite index on LeftExtent/RightExtent. The data will make a big difference.

  • I am not an expert in this subject but my guess would be how about creating covering index.

  • James,

    Are all the mice in the same downline? I believe that your code is resolving the whole tree trying to find %mouse% and it may actually be resolving the whole tree more than once. I'm not sure because I don't have the data to play with, but I think you need to give it the extent coordinates for the top most node that is capable of having mice and then search the downline for %mouse% even if that means searching the whole tree rather than doing a self join to the tree.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply