May 17, 2017 at 9:47 am
I wouldn't say that the part number thing is horribly overloaded, but I understand where you are going with that. We take the manufacturer part number (123) and add a prefix (ABC or DEF, etc) to create a unique ItemCode (ABC123) in our system. I am looking for a long term solution here, not a band aide so I certainly appreciate you pushing to look at the underlying issue. With that said, I could add and add'l column for MfrPartNumber and put the second part of the ItemCode in there, then index both columns and remove the wildcard prefix.
E.G.
ItemCode: ABC123
MfrPN: 123
ItemCode: DEF123
MfrPN: 123
Would that be the optimal approach that you would go after for a long term solution?
And to complicate matters, on the same subject, we also have a Descr column that holds a product description (e.g. "Red Widget with Triangle Pieces"); we also have a search for this column (optional) whereas we may need to lookup all things with Triangle. Is this column best served with Full Text Index?
Again, thanks in advance for your feedback. It's much appreciated.
May 17, 2017 at 12:08 pm
For S&G and I tried adding in a Full Text Index on the ItemCode column. I also added in the following items ABC123456, DEF123456, GHI123456.
SELECT ItemCode FROM TABLE WHERE ItemCode LIKE '%123456%'
returns all 3 results
SELECT ItemCode FROM TABLE WHERE Contains(ItemCode, '123456')
returns 0 results
If I add in an item ABC-123456
SELECT ItemCode FROM TABLE WHERE Contains(ItemCode, '123456')
returns 1 result (ABC-123456; not ABC123456, DEF123456, GHI123456)
So it appears that the Full Text Index is not searching partial strings unless there is a separator. Same goes if I use Full Text Index on the Descr field:
TEST BOX
TEST BOXRED
TEST REDBOX
SELECT ItemCode FROM TABLE WHERE Contains(Descr, 'box')
returns 1 result (TEST BOX)
SELECT ItemCode FROM TABLE WHERE Contains(Descr, '"box*"')
returns 2 results (TEST BOX, TEST BOXRED)
but no TEST REDBOX
Is this by design for Full Text Index?
May 18, 2017 at 11:15 am
josh-1127203 - Wednesday, May 17, 2017 9:47 AMI wouldn't say that the part number thing is horribly overloaded, but I understand where you are going with that. We take the manufacturer part number (123) and add a prefix (ABC or DEF, etc) to create a unique ItemCode (ABC123) in our system. I am looking for a long term solution here, not a band aide so I certainly appreciate you pushing to look at the underlying issue. With that said, I could add and add'l column for MfrPartNumber and put the second part of the ItemCode in there, then index both columns and remove the wildcard prefix.E.G.
ItemCode: ABC123
MfrPN: 123
ItemCode: DEF123
MfrPN: 123Would that be the optimal approach that you would go after for a long term solution?
And to complicate matters, on the same subject, we also have a Descr column that holds a product description (e.g. "Red Widget with Triangle Pieces"); we also have a search for this column (optional) whereas we may need to lookup all things with Triangle. Is this column best served with Full Text Index?
Again, thanks in advance for your feedback. It's much appreciated.
Yeah, that seems good. Probably a good choice on full text too. Again, hard to say for certain, testing will be your friend there.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
May 18, 2017 at 11:23 am
josh-1127203 - Wednesday, May 17, 2017 12:08 PMFor S&G and I tried adding in a Full Text Index on the ItemCode column. I also added in the following items ABC123456, DEF123456, GHI123456.SELECT ItemCode FROM TABLE WHERE ItemCode LIKE '%123456%'
returns all 3 resultsSELECT ItemCode FROM TABLE WHERE Contains(ItemCode, '123456')
returns 0 resultsIf I add in an item ABC-123456
SELECT ItemCode FROM TABLE WHERE Contains(ItemCode, '123456')
returns 1 result (ABC-123456; not ABC123456, DEF123456, GHI123456)So it appears that the Full Text Index is not searching partial strings unless there is a separator. Same goes if I use Full Text Index on the Descr field:
TEST BOX
TEST BOXRED
TEST REDBOXSELECT ItemCode FROM TABLE WHERE Contains(Descr, 'box')
returns 1 result (TEST BOX)SELECT ItemCode FROM TABLE WHERE Contains(Descr, '"box*"')
returns 2 results (TEST BOX, TEST BOXRED)
but no TEST REDBOXIs this by design for Full Text Index?
Add wild cards to the search '*box*'. See what you get then.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
May 18, 2017 at 11:48 am
josh-1127203 - Wednesday, May 17, 2017 12:08 PMFor S&G and I tried adding in a Full Text Index on the ItemCode column. I also added in the following items ABC123456, DEF123456, GHI123456.SELECT ItemCode FROM TABLE WHERE ItemCode LIKE '%123456%'
returns all 3 resultsSELECT ItemCode FROM TABLE WHERE Contains(ItemCode, '123456')
returns 0 resultsIf I add in an item ABC-123456
SELECT ItemCode FROM TABLE WHERE Contains(ItemCode, '123456')
returns 1 result (ABC-123456; not ABC123456, DEF123456, GHI123456)So it appears that the Full Text Index is not searching partial strings unless there is a separator. Same goes if I use Full Text Index on the Descr field:
TEST BOX
TEST BOXRED
TEST REDBOXSELECT ItemCode FROM TABLE WHERE Contains(Descr, 'box')
returns 1 result (TEST BOX)SELECT ItemCode FROM TABLE WHERE Contains(Descr, '"box*"')
returns 2 results (TEST BOX, TEST BOXRED)
but no TEST REDBOXIs this by design for Full Text Index?
It's by design. You can do prefix searches, but can't do suffix searches (barring hacks like storing the REVERSE of a string and indexing that).
May 18, 2017 at 12:09 pm
Jacob Wilkins - Thursday, May 18, 2017 11:48 AMjosh-1127203 - Wednesday, May 17, 2017 12:08 PMFor S&G and I tried adding in a Full Text Index on the ItemCode column. I also added in the following items ABC123456, DEF123456, GHI123456.SELECT ItemCode FROM TABLE WHERE ItemCode LIKE '%123456%'
returns all 3 resultsSELECT ItemCode FROM TABLE WHERE Contains(ItemCode, '123456')
returns 0 resultsIf I add in an item ABC-123456
SELECT ItemCode FROM TABLE WHERE Contains(ItemCode, '123456')
returns 1 result (ABC-123456; not ABC123456, DEF123456, GHI123456)So it appears that the Full Text Index is not searching partial strings unless there is a separator. Same goes if I use Full Text Index on the Descr field:
TEST BOX
TEST BOXRED
TEST REDBOXSELECT ItemCode FROM TABLE WHERE Contains(Descr, 'box')
returns 1 result (TEST BOX)SELECT ItemCode FROM TABLE WHERE Contains(Descr, '"box*"')
returns 2 results (TEST BOX, TEST BOXRED)
but no TEST REDBOXIs this by design for Full Text Index?
It's by design. You can do prefix searches, but can't do suffix searches (barring hacks like storing the REVERSE of a string and indexing that).
Oh, you can't use the wild card on both? My bad. Maybe I should reread the chapter I wrote on fulltext indexes.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
May 18, 2017 at 12:14 pm
Yeah, they've been saying it's high on their list for a while now (see https://connect.microsoft.com/SQLServer/feedback/details/758588/full-text-leading-wildcard-suffix-search, for example), but hasn't been done yet as far as I know.
The documentation is careful to say that prefix search is supported while quietly omitting any mention of the lack of suffix search.
May 18, 2017 at 12:16 pm
Please read a book on RDBMS. The identity column is not a column! It's a table property imposed by physical storage. If you use that as a key. It means you do not have a valid relational design and we should shoot you. Okay, send you to reeducation to learn the right terms.
By definition, let me repeat that by definition, a key has to be a subset of columns in the entity that is modeled by the table. In your case it looks like a thing called him "item_code" is the expected key. But I am only guessing since you didn't bother to follow form rules and the etiquette of the last 30 years of SQL forums and post DDL.
Based on 40+ years of programming; first get the logical design right then worry about the implementation. Getting the wrong answer very fast is much, much, much worse than getting the right answer a little slower. You'll poison everything in your system.
>> Right now the table is a heap and I have a query that does a wildcard against the item_code column (varchar(250)): <<
Oh dear God in heaven, I certainly hope not! First of all, you don't know what a "code" is or the ISO 11179 standards; it's what's called an "attribute property" and it has a specific meaning as to the kind of attribute it is. What you should have had for a key was an "item_id"; an identifier for each item, not an attribute that puts it in the category.
Also, I've been at this for over 40 years and have never seen a variable length code of 250 characters. In fact I cannot find anything like that in the ISO standards. Essentially, your problem really is that you don't know how to design a table, how to model data or anything else related to abstraction and standards. If I'm wrong, please post an example of a VARCHAR(250) encoding scheme for your items. I'd love to use it in one of my books is a bad example of design.
In a properly designed schema, there is an identifier for each entity (usually the key), and the encoding schemes for the various attributes are "x CHAR(n) NOT NULL CHECK (x LIKE '...')" or or if it is quantity or magnitude "x <numeric data type> [NOT NULL] CHECK (x <numeric predicate>) " or "x <temporal data type> [NOT NULL] CHECK (x <temporal predicate>) " to validate the data.
>> I really want to get this table over to cluster but doing so slows the query that is used. <<
When I'm teaching classes, one of the things I stress that you need to design data. I have a horrible feeling that your insanely long item_code is a total mess. Let me give an example that you understand, if you been to a library. As you ever consider how libraries organize their shelves before there was Dewey Decimal Classification? Anyway they wanted to and is personnel changed, so did the classifications. Every library was different. Having been in the bookstore business. I actually ran into one new age feminist bookstore in Atlanta in the late 1970s that classified their books but the color of the binding. No, really! It made the shelves looks pretty.
You probably need to sit down and actually design your item_code. If that is the main search criteria. I've got a whole book and a lot of articles on how to design encoding schemes. I happen to like hierarchical encoding schemes like Dewey, because it's really easy to use simple string matches on them (I know that "5%" is science, "51%" is mathematics within science,, etc.).
We don't have any details or DDL or anything else about your real database. All your caring about is how to get the best performance out of a really crappy design. This is not professional. Do you want to be a database professional or just a code monkey?
Please post DDL and follow ANSI/ISO standards when asking for help.
May 18, 2017 at 3:04 pm
josh-1127203 - Tuesday, May 16, 2017 10:21 AM
Please read a book on RDBMS. The identity column is not a column! It's a table property imposed by physical storage. If you use that as a key. It means you do not have a valid relational design and we should shoot you. Okay, send you to reeducation to learn the right terms.
By definition, let me repeat that by definition, a key has to be a subset of columns in the entity that is modeled by the table. In your case it looks like a thing called him "item_code" is the expected key. But I am only guessing since you didn't bother to follow form rules and the etiquette of the last 30 years of SQL forums and post DDL.
Based on 40+ years of programming; first get the logical design right then worry about the implementation. Getting the wrong answer very fast is much, much, much worse than getting the right answer a little slower. You'll poison everything in your system.
>> Right now the table is a heap and I have a query that does a wildcard against the item_code column (varchar(250)): <<
Oh dear God in heaven, I certainly hope not! First of all, you don't know what a "code" is or the ISO 11179 standards; it's what's called an "attribute property" and it has a specific meaning as to the kind of attribute it is. What you should have had for a key was an "item_id"; an identifier for each item, not an attribute that puts it in the category.
Also, I've been at this for over 40 years and have never seen a variable length code of 250 characters. In fact I cannot find anything like that in the ISO standards. Essentially, your problem really is that you don't know how to design a table, how to model data or anything else related to abstraction and standards. If I'm wrong, please post an example of a VARCHAR(250) encoding scheme for your items. I'd love to use it in one of my books is a bad example of design.
In a properly designed schema, there is an identifier for each entity (usually the key), and the encoding schemes for the various attributes are "x CHAR(n) NOT NULL CHECK (x LIKE '...')" or or if it is quantity or magnitude "x <numeric data type> [NOT NULL] CHECK (x <numeric predicate>) " or "x <temporal data type> [NOT NULL] CHECK (x <temporal predicate>) " to validate the data.
>> I really want to get this table over to cluster but doing so slows the query that is used. <<
When I'm teaching classes, one of the things I stress that you need to design data. I have a horrible feeling that your insanely long item_code is a total mess. Let me give an example that you understand, if you been to a library. As you ever consider how libraries organize their shelves before there was Dewey Decimal Classification? Anyway they wanted to and is personnel changed, so did the classifications. Every library was different. Having been in the bookstore business. I actually ran into one new age feminist bookstore in Atlanta in the late 1970s that classified their books but the color of the binding. No, really! It made the shelves looks pretty.
You probably need to sit down and actually design your item_code. If that is the main search criteria. I've got a whole book and a lot of articles on how to design encoding schemes. I happen to like hierarchical encoding schemes like Dewey, because it's really easy to use simple string matches on them (I know that "5%" is science, "51%" is mathematics within science,, etc.).
We don't have any details or DDL or anything else about your real database. All your caring about is how to get the best performance out of a really crappy design. This is not professional. Do you want to be a database professional or just a code monkey?
Wow, just wow! Don't fall when you step down from your soap box. I appreciate your feedback and you have some good info in there if you cut through the brow beating. By design, I typically will use an integer value to identify an item in a table. E.G.
CUSTOMER table
CustomerID
FName
LName
ITEM table
ItemID
ItemCode
etc.
ORDER table
OrderID
CustomerID
etc.
ORDER_ITEM
OrderItemID
OrderID
ItemID
etc.
I've always read that is GOOD design and that when choosing a clustered index, it's good to go with something incremental as to reduce fragmentation. Could I cluster on ItemCode, sure, but we are adding constantly which would cause page splitting and constant fragmentation, not to mention that the integer joins I have always found are much faster that string joins.
May 18, 2017 at 3:25 pm
Joe Celko
Please read a book on RDBMS. The identity column is not a column! It's a table property imposed by physical storage...
[The Identity property] Creates an identity column in a table. This property is used with the CREATE TABLE and ALTER TABLE Transact-SQL statements.
... Identity columns can be used for generating key values. The identity property on a column guarantees the following:
- Each new value is generated based on the current seed & increment.
- Each new value for a particular transaction is different from other concurrent transactions on the table.
Using the identity column as a key is another story...
-- Itzik Ben-Gan 2001
May 22, 2017 at 5:44 am
josh-1127203 - Thursday, May 18, 2017 3:04 PMI've always read that is GOOD design and that when choosing a clustered index, it's good to go with something incremental as to reduce fragmentation. Could I cluster on ItemCode, sure, but we are adding constantly which would cause page splitting and constant fragmentation, not to mention that the integer joins I have always found are much faster that string joins.
They do make a good clustered index key. That's it. Keep doing what you're doing.
May 24, 2017 at 4:01 pm
Thanks Ed for the nod.
Quick update, so I ended up modifying our application to default to "starts with" when searching and using a non-clustered index for that column. That should cover 95% of the queries. For the rest, we will just have to go with the LIKE with the wildcard prefix. It was running a little slow on my dev machine, but on the server it returns fast.
Also, the ItemCode column is currently a varchar(250), but I am changing that over to varchar(50). The longest value we have in there now is 30 characters long so 50 should be ample. I obviously over sized this field when I built the table.
I noticed if I make the change in SSMS, it wants to drop and recreate the table. If I run an ALTER statement it handles OK on my dev machine.
Any preferences on resizing columns in a larger table (ALTER versus Drop and Recreate)?
May 25, 2017 at 5:37 am
If you can use an ALTER, do that. It won't work in every situation and then you'll need to CREATE the new table, migrate all the data and constraints as well as security, then DROP the old table. However, ALTER is almost always the first choice because it's faster & safer.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
May 25, 2017 at 10:33 am
Just adding a clustered index shouldn't slow it down that much, unless perhaps the fillfactor is (way) too low. Be sure to explicitly specify something like 95+% for the fillfactor when creating the clustered index.
If you have a more natural key that is used for searches, such as BrandID, cluster on it first, then on identity (just to make the clus key unique).
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
May 25, 2017 at 10:45 am
ScottPletcher - Thursday, May 25, 2017 10:33 AMJust adding a clustered index shouldn't slow it down that much, unless perhaps the fillfactor is (way) too low. Be sure to explicitly specify something like 95+% for the fillfactor when creating the clustered index.If you have a more natural key that is used for searches, such as BrandID, cluster on it first, then on identity (just to make the clus key unique).
Or, leave the identity in place and create an NCI on BrandID and whatever other always-used, highly-selective columns being the leading columns. Creating the clustering key on (BrandID, identity) would result in CI key values that are not ever-increasing. It also means that each NCI would inherit more bytes from the clustering key than necessary, increasing the number of pages to store each NCI.
Viewing 15 posts - 16 through 30 (of 37 total)
You must be logged in to reply to this topic. Login to reply