Catalogue Hierarchy Design Strategy

Question

Catalogue Hierarchy Design Strategy

james.spibey

Mr or Mrs. 500

Points: 514
More actions
November 11, 2008 at 9:22 am

#193671

Hi,
I am implementing something akin to an online store which has a rather large catalogue of products (700,000+) within a 5 level catalogue hierarchy of approximately 500 nodes.
One of the requirements of the application is to allow the user to traverse the catalogue hierarchy and see the number of products at each level, 'rolled up' from the levels below. After some investigation, I have implemented the hierarchy using 'Nested Sets' (looked at CTE but performance was poor) and been able to build a static table of product counts at each level which works nicely.
However, I'm now considering how I might show the number of products at each level of the hierarchy when some filter is applied to the contents. For example, if the user were to search within our catalogue and I wanted to show the number of products at each level of hierarchy from the results of their search, I can't use any pre-compiled counts as this would be inaccurate due to the search filter in operation. I also cannot see how I could calculate this on-the-fly given the number of records we have and the number of searches which will take place (approx 10,000 per hour).
Has anyone else looked at this problem?
Many Thanks
James

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Michael Earl-395764 SSC Guru Points: 53873 More actions · Answer 1

Given an appropriate table design and indexing, I do not see why you would not be able to query the table and get results quickly enough.

Perhaps you can post the table structure and any index information you have - along with the query you have already written.

james.spibey Mr or Mrs. 500 Points: 514 More actions · Answer 2

Thanks for the reply. OK, so for table structures we have CAT_Product which is the product record, CAT_CatalogueSection which is the hierarchy table (with LeftExtent and RightExtent columns to make nested sets) and a CAT_CatalogueProduct which maps a product onto a CAT_CatalogueSection. Here's the DDL

CREATE TABLE [dbo].[CAT_Product](

[ProductID] [int] NOT NULL,

[Description] [varchar](512) NULL,

CONSTRAINT [PK_CAT_Product] PRIMARY KEY CLUSTERED

(

[ProductID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY]

CREATE TABLE [dbo].[CAT_CatalogueSection](

[CatalogueID] [int] NOT NULL,

[SectionID] [int] NOT NULL,

[ParentID] [int] NULL,

[SectionName] [varchar](255) NOT NULL,

[DisplayOrder] [int] NOT NULL,

[LeftExtent] [int] NULL,

[RightExtent] [int] NULL,

CONSTRAINT [PK_CAT_CatalogueSection] PRIMARY KEY CLUSTERED

(

[CatalogueID] ASC,

[SectionID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY]

CREATE TABLE [dbo].[CAT_CatalogueProduct](

[CatalogueProductID] [int] NOT NULL,

[CatalogueID] [int] NOT NULL,

[SectionID] [int] NOT NULL,

[ProductID] [int] NOT NULL,

CONSTRAINT [PK_CAT_CatalogueProduct] PRIMARY KEY CLUSTERED

(

[CatalogueProductID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY]

CREATE NONCLUSTERED INDEX [IDX_SectionID] ON [dbo].[CAT_CatalogueProduct]

(

[SectionID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

So, lets say for example the user searches for the term 'mouse', the SQL to get the number of matches in each catalogue section would be:-

SELECT

CS1.SectionName,

COUNT(DISTINCT CS2.SectionID) As SubSectionCount,

COUNT(DISTINCT P.ProductID) As ProductCount

FROM

CAT_CatalogueSection CS1

INNER JOIN

CAT_CatalogueSection CS2

ON

(CS2.LeftExtent BETWEEN CS1.LeftExtent AND CS1.RightExtent AND CS2.LeftExtent <> CS1.LeftExtent)

INNER JOIN

CAT_CatalogueProduct CP

ON

CP.SectionID = CS2.SectionID

INNER JOIN

CAT_Product P

ON

P.ProductID = CP.ProductID

WHERE

P.Description LIKE '%mouse%'

GROUP BY

CS1.SectionName

ORDER BY

ProductCount DESC

(Note that we will be using Full Text indexing for the search but the principal is the same).

Running this query takes about 10 seconds on my machine here (Dual Xeon, 4GB RAM, SQL Server 2005) which is far too slow. I need less than 0.5s response time ideally.

Any advice appreciated

James

Michael Earl-395764 SSC Guru Points: 53873 More actions · Answer 3

I need you to post something that will generate test data. The Left/Right Extent design does not make any sense to me and I do not want to waste more time trying to figure it out.

You certainly have some missing indexes.

FullText searching is going to have a significant impact if you have a lot of products. At the moment, you are searching for a product name with criteria starting with a wildcard. Even if you had an index on the product description, the wildcard at the beginning makes the index useless. FullText searching would make the criteria a word and it would be able to use the indexing to find it.

james.spibey Mr or Mrs. 500 Points: 514 More actions · Answer 4

Hi Michael,

The LeftExtent/RightExtent is from Joe Celko's Nested Set pattern (http://www.ibase.ru/devinfo/DBMSTrees/sqltrees.html) which allows fast searching of leafs and nodes in a hierarchy.

I take your point about the Full Text Index but this is a simplified example of what we actually because I didn't want to confuse the issue. What I wanted was a general discussion on how this kind of thing might be implemented in a scalable way rather than a discussion of the implementation details.

Regards

James

Michael Earl-395764 SSC Guru Points: 53873 More actions · Answer 5

Take a good look at your query. The columns used in the joins are all candidates for needing an index. Without some kind of realistic data, I cannot tell you which ones need to be indexed.

My first guess would be all of the "SectionID" fields used in the joins, the "ProductID" fields used in the joins, and a composite index on LeftExtent/RightExtent. The data will make a big difference.

sqlizer SSCrazy Points: 2120 More actions · Answer 6

I am not an expert in this subject but my guess would be how about creating covering index.

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 7

James,

Are all the mice in the same downline? I believe that your code is resolving the whole tree trying to find %mouse% and it may actually be resolving the whole tree more than once. I'm not sure because I don't have the data to play with, but I think you need to give it the extent coordinates for the top most node that is capable of having mice and then search the downline for %mouse% even if that means searching the whole tree rather than doing a self join to the tree.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)