Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Metadata Structure / Design Issues Expand / Collapse
Author
Message
Posted Tuesday, October 30, 2007 6:22 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, February 15, 2010 12:00 AM
Points: 14, Visits: 160
Hi People,

I have been developing an application that needs to store 30 fields of meta data against each item on the system. The end users are able to customize the field names on a customer by customer basis. I’ve tried a couple mock designs and haven't been able to get any decent performance out of either of the two solutions I've tried. I would be really interested to see what others have some up and any suggestion for a design that might work.

I have five key tables relating to the metadata design the book table contains 2.1 million rows this is the primary table of the application this table require each item can have metadata stored against it (not every single book does):


BookStatus
|
Book -> BookType -> BookTypeGroup
|
Customer


most of the time the user will only be allowed to search against one metadata column with displaying a maximum of 5 columns in results.
Unfortunately I've had little feedback in the terms of how much data is expected to be put in the metadata table. It maybe those users don’t store metadata against every item,
but I am assuming worse case scenario that every item will have meta data against it.


The two solutions I have tried so far:

Created a table with 30 varchar(100) columns named meta1, meta2, meta3 ... meta28, meta29, meta30

I then bulk load all 30 fields with random length data to simulate having all the meta data.

the Query would be:

SELECT TOP 25
[Dev].[dbo].[Book].[Id],
[Dev].[dbo].[Book].[Code],
[Dev].[dbo].[Book].[AltCode],
[Dev].[dbo].[BookType].[Code] AS [Type],
[Dev].[dbo].[BookMetadata].[MetaColumn1],
[Dev].[dbo].[BookMetadata].[MetaColumn2],
[Dev].[dbo].[BookMetadata].[MetaColumn3],
[Dev].[dbo].[BookMetadata].[MetaColumn4],
[Dev].[dbo].[BookStatus].[Status]

FROM [Dev].[dbo].[BookMetadata]

INNER JOIN [Dev].[dbo].[Book] ON [Dev].[dbo].[BookMetadata].[BookId] =[Dev].[dbo].[Book].[Id] AND [Dev].[dbo].[Book].[CustomerId] = 512
INNER JOIN [Dev].[dbo].[BookStatus] ON [Dev].[dbo].[BookStatus].[Id] =[Dev].[dbo].[Book].[BookStatusId]
INNER JOIN [Dev].[dbo].[BookType] ON [Dev].[dbo].[Book].[BookTypeId] =[Dev].[dbo].[BookCharge].[Id] AND [Dev].[dbo].[BookCharge].[BookGroupId] = 1

WHERE [Dev].[dbo].[BookMetadata].[MetaColumn28] LIKE '%g%'

This performs relatively slowly in most cases (1.02 - 0.50 seconds) but better than my second option ;)


Second option I tried was creating a table with a single meta data column to store each metadata value in its own row:

Id | MetaData | ColumnOrder

SELECT TOP 25
[Dev].[dbo].[Book].[Id],
[Dev].[dbo].[Book].[Code],
[Dev].[dbo].[Book].[AltCode],
[Dev].[dbo].[BookType].[Code] AS [Type],
[Dev].[dbo].[BookMetadataView].[MetaColumn1],
[Dev].[dbo].[BookMetadataView].[MetaColumn2],
[Dev].[dbo].[BookMetadataView].[MetaColumn3],
[Dev].[dbo].[BookMetadataView].[MetaColumn4],
[Dev].[dbo].[BookStatus].[Status]

FROM [Dev].[dbo].[BookMetadataView]

INNER JOIN [Dev].[dbo].[Book] ON [Dev].[dbo].[BookMetadataView].[BookId] =[Dev].[dbo].[Book].[Id] AND [Dev].[dbo].[Book].[CustomerId] = 512
INNER JOIN [Dev].[dbo].[BookStatus] ON [Dev].[dbo].[BookStatus].[Id] =[Dev].[dbo].[Book].[BookStatusId]
INNER JOIN [Dev].[dbo].[BookType] ON [Dev].[dbo].[Book].[BookTypeId] =[Dev].[dbo].[BookCharge].[Id] AND [Dev].[dbo].[BookCharge].[BookGroupId] = 1

WHERE [Dev].[dbo].[BookMetadataView].[MetaColumn28] LIKE '%g%'

Then created a view to pivot the metadata into columns so it could be joined on. This worked well until a lot of metadata was imported.
Then it failed badly as you could imagine 2.1 million rows X 30 rows.

This frequently took up to 8 minutes.

I have analyzed both solutions in query analyzer and created relevant indexes.

So I'm a tad stuck, anyone have any ideas or faced something similar? or have I done something stupid? :)

If you need more info im happy to provide it.

Thanks!
Post #416843
Posted Tuesday, October 30, 2007 8:25 PM
Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: 2 days ago @ 8:56 PM
Points: 3,033, Visits: 2,636
Have you considered using full text catalogs for your meta data columns ?


Post #416858
Posted Wednesday, October 31, 2007 6:58 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: 2 days ago @ 6:53 AM
Points: 13,890, Visits: 28,285
What about XML for the meta data? It really does sound like it's only semi-structured data.

----------------------------------------------------
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood..." Theodore Roosevelt
The Scary DBA
Author of: SQL Server Query Performance Tuning
SQL Server 2012 Query Performance Tuning
SQL Server 2008 Query Performance Tuning Distilled
and
SQL Server Execution Plans

Product Evangelist for Red Gate Software
Post #417033
Posted Wednesday, October 31, 2007 5:38 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, February 15, 2010 12:00 AM
Points: 14, Visits: 160
Thanks heaps for the suggestions!

Full Text Indexing - To be honest its not a technology I've used a lot before, It does seems to improve the query time significantly if the search string is over 5 letters long eg like = '%123456%' takes nearly 8 minutes where as contains('"*123456*") takes 10 - 30 seconds so that’s a definite help. Only issue is the fact the ORM I support doesn't handle the full text transparently

XML - I’ve been using sql 2000 a lot and found XML to be hard to work with. we are now using 2005 so that could be an option, but wouldn’t putting all that data in a field and parse it out cause a performance overhead? XML might be a better option if I wanted to extended the metadata. I will do some research into this idea.

Unfortunately I'm still getting really slow response times even when using the full text index. out of the two options i posted which do you think is a better design?
Post #417353
Posted Thursday, November 1, 2007 5:20 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: 2 days ago @ 6:53 AM
Points: 13,890, Visits: 28,285
Neither full text or xml has me terribly excited. If it were my project, I'd be reexamining the assumptions and trying to find a method of defining the requirements more fully so I could achieve a more normalized design. But, shy that, in 2005, I'd go with the XML.

----------------------------------------------------
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood..." Theodore Roosevelt
The Scary DBA
Author of: SQL Server Query Performance Tuning
SQL Server 2012 Query Performance Tuning
SQL Server 2008 Query Performance Tuning Distilled
and
SQL Server Execution Plans

Product Evangelist for Red Gate Software
Post #417469
Posted Thursday, November 1, 2007 5:58 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, February 15, 2010 12:00 AM
Points: 14, Visits: 160
Hi Grant,

Thanks for the reply, I tend to agree with you, the full text solution works but its not perfect.

In regards to requirements, in simple terms they are:

1. Able to store 30 fields of user data aginst each book on the system.
2. Each field needs to be able to store at most 100 characters.
3. Need to be able to search aginst any of the 30 fields with a like or contains style search.
4. Also be able to search aginst book information in the same search.

That is what the client has requested and is in the specification. unfortunatly it is a generic requirement which leads to a design that needs to be generic. The downside of that in my experiance is often a generic solution often ends up being slow, hard for SQL to optermise if all the fields are varchar.

Would a SQL script with some of the table structures help in finding a solution?

I've got the weekend coming up so I might try the XML stuff on my laptop at home and see if i can find a solution.

Post #417703
Posted Friday, November 2, 2007 5:37 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: 2 days ago @ 6:53 AM
Points: 13,890, Visits: 28,285
Unfortunately, I'm going out of town for a week, but posting a bit of code could help someone else to help you.

Based on the requirements... 30 fields each 100 characters in length... It's just so sloppy. Anyway, you definitely don't want the "One True Lookup Table" approach. What about something like... You have one table that has the well defined fields for the book. You also have one table that has our 100 character field. Then you have an interim table between them. Now you can index the 100 character field (without having to index 30 of them), you can index the interim table and you can index the book table, all independent of each other. This allows for more than 30 fields, but you could limit that by adding a FieldType or something to the interim table, making it part of the primary key and only allowing 30 different types (guessing a bit here). But I think this would work.

If you can't do that, I'd still go with XML over Text because, while it's certainly not an optimal solution, it does provide for the structure that the data would require.

That's my 1.5 cents worth for the moment.


----------------------------------------------------
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood..." Theodore Roosevelt
The Scary DBA
Author of: SQL Server Query Performance Tuning
SQL Server 2012 Query Performance Tuning
SQL Server 2008 Query Performance Tuning Distilled
and
SQL Server Execution Plans

Product Evangelist for Red Gate Software
Post #417862
Posted Friday, November 2, 2007 5:39 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: 2 days ago @ 6:53 AM
Points: 13,890, Visits: 28,285
Thinking about it more, you should be able to get reasonable performance, even pivoting the data, assuming the indexes are clean, especially the clustered indexes. How did you set those up?

----------------------------------------------------
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood..." Theodore Roosevelt
The Scary DBA
Author of: SQL Server Query Performance Tuning
SQL Server 2012 Query Performance Tuning
SQL Server 2008 Query Performance Tuning Distilled
and
SQL Server Execution Plans

Product Evangelist for Red Gate Software
Post #417864
Posted Friday, November 2, 2007 7:19 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 9:37 AM
Points: 7,105, Visits: 15,456
Well - you're going to be running into some serious compromises. For one - "30 100-character fields" sounds like someone's being sloppy/lazy with giving you the right specifications. You really ought to push a little harder to get them to give you some ACTUAL specifications (there are thirty possible, optional pieces of data - let's define what they are, and let's type them appropriately to what they REALLY are). You will likely find out that several of those will be very precise fields, which can be typed correctly, indexed correctly, etc...

You should also push to verify that they will use EVERY metadata element in their optional searching

In the meantime, Grant's solution should still provide you with some amount of gain. That being said - you NEED the full-text index on the 100-varchar field, since traditional indexes are worthless/not used in a lot of LIKE syntaxes. Anything other than a Like 'abcdef%' syntax will require a table scan.



----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Post #417907
Posted Friday, November 2, 2007 9:46 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, February 15, 2010 12:00 AM
Points: 14, Visits: 160
Hi Guys,

I really appreciate the feed back and ideas you have offered so far, obviously my SQL skill aren't as developed as yours are :)

I agree on the loose requirements, but unfortunately I just can’t see a way around it, every customer will store what ever they link in each column, every customer comes from a different background and a different business type so it’s almost impossible to guess what thier going to put and then be able to limit it. One option is to have the fields set so you have 10 varchar(100), 5 smalldatetime, 5 int, 10 money, or some similar configuration. But I bet one customer will want an extra field of one or another type.

By the way the books were just an example, my client is fussy about information going out and I like to keep things simple.

I tried the single column solution the advantage is it only stores a value if the customer actually has something in there and allows for expansion later on. I had a table that contains the column headers and each metadata fields belong to one of them. I then had an order field to control the column display order, using that I then pivoted the data round. This worked well, but the performace suffered having 2.5 million rows X 30 fields. I didn't test using the full text index against that solution which i plan to do tommrow. I did a bit more digging and discovered from the client that not every item will have meta data stored against it. In fact the generally said that only 30-40% of the items will, this results in a bit of a saving, but ideally I want to find the best solution possible that meets the customers needs.

Grant, Attached is a SQL diagram of what I understood you to have proposed as a solution, I wasn't 100% sure what you meant from your description. Also ignore the contents of the item table and the customer table their mock ups normally there are more fields. On the table with the single varchar column I would add a Full-Text index as suggested. I would likely use a view to transform this data using a pivot into more easily used structure. The type table allows the client to define the type of column such as it being a date, int etc.

Currently the Id columns are the ones in the clustered indexes.

Thanks
Alex


  Post Attachments 
example-v1.0.jpg (10 views, 80.44 KB)
Post #418197
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse