Index on varchar with non English data

  • Dear all

    I created a table with this fields:

    ID bigint not null

    fname varchar(50) not null

    lname varchar(50) not null

    and with this indexes:

    clustered index on ID with fill factor of 100

    non clustered index on fname

    and the problem:

    When I write query like this:

    SELECT * FROM table WHERE fname='john'

    every thing is good but if write a query like this (with arabic or persian texts)

    SELECT * FROM table WHERE fname='علی'

    SQL Server doesn't use index and do a full table scan!

    Why? (my database collation is Persian_100_CI_AI and I'm using SQL 2008 R2)

    Thanks anyway

  • Indexed column and string used in WHERE clause must use the same collation. Otherwise index will not be selected - server will do full scan and compare values row by row.

    I think it is SSMS problem, not server itself.

    Try:

    SELECT * FROM table WHERE fname='???' COLLATE Persian_100_CI_AI

    to enforce the same collation to be used to represent '???' string in WHERE clause.

    Regards,

    Slawek

  • emdadgar2 (9/8/2010)


    Dear all

    I created a table with this fields:

    ID bigint not null

    fname varchar(50) not null

    lname varchar(50) not null

    and with this indexes:

    clustered index on ID with fill factor of 100

    non clustered index on fname

    and the problem:

    When I write query like this:

    SELECT * FROM table WHERE fname='john'

    every thing is good but if write a query like this (with arabic or persian texts)

    SELECT * FROM table WHERE fname='???'

    SQL Server doesn't use index and do a full table scan!

    Why? (my database collation is Persian_100_CI_AI and I'm using SQL 2008 R2)

    Thanks anyway

    Not sure, when I tried this I got the index seek for both. Could you post your full table definition and sample data?

  • You may also want to have a look at the xml plan for your query.

    ( SET SHOWPLAN_XML ON)

    Search for implicit conversions as they may cause an index not to be used !

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Also I observed that default collation for the database is significant for the part written in single quotes..

  • Really thanks

    I tested all of them!

    But sql server (2008 R2 - x64) doesn't use index on about 6 millions of records,

    but when records omitted to under 1 millions, use index!!!!

    I can force sql to use index with hints but I don't release why?

  • It is all about statistics and index selectivity.

    You issued:

    SELECT * FROM table WHERE fname='???'

    It means that for every row found in index sever must perform lookup to clustered index to retrieve two other columns.

    Apparently '???' lands in bucket having many values.

    For table having 1 mil records SQL calculates (using statistics) that cost (time spent) doing index seek + lookups will be still better than doing full scan. With 6+ mil row table it calculates that there will be so many hits in index, and so many lookups that it is better to do full scan (lookups are very expensive, and if more than 3-4% of table is going to be returned server selects full scan instead index seek + lookups)

    You may:

    1. Update your statistics for index based on fname

    2. Create covering index on fname, including two other columns (which will result in duplicating your data)

    3. make index starting from fname clustered, while still leaving column 1 as primary key.

    4. Use Full Text index to index fname, lname columns, and use CONTAINS or CONTAINSTABLE predicates

    (SELECT * FROM table WHERE CONTAINS(fname, '???')

    I would personally use 3rd option, providing that most of your queries will be searching for fname, and you application issuing SELECT * cannot be changed to use different query

    If application logic can be changed the best in my opinion will be 4, however mixing texts in English and Persian in one full-text index can be tricky and can lead to unpredictable results.

  • Slawek Guzek (9/12/2010)


    It is all about statistics and index selectivity.

    You issued:

    SELECT * FROM table WHERE fname='???'

    It means that for every row found in index sever must perform lookup to clustered index to retrieve two other columns.

    Apparently '???' lands in bucket having many values.

    For table having 1 mil records SQL calculates (using statistics) that cost (time spent) doing index seek + lookups will be still better than doing full scan. With 6+ mil row table it calculates that there will be so many hits in index, and so many lookups that it is better to do full scan (lookups are very expensive, and if more than 3-4% of table is going to be returned server selects full scan instead index seek + lookups)

    You may:

    1. Update your statistics for index based on fname

    2. Create covering index on fname, including two other columns (which will result in duplicating your data)

    3. make index starting from fname clustered, while still leaving column 1 as primary key.

    4. Use Full Text index to index fname, lname columns, and use CONTAINS or CONTAINSTABLE predicates

    (SELECT * FROM table WHERE CONTAINS(fname, '???')

    I would personally use 3rd option, providing that most of your queries will be searching for fname, and you application issuing SELECT * cannot be changed to use different query

    If application logic can be changed the best in my opinion will be 4, however mixing texts in English and Persian in one full-text index can be tricky and can lead to unpredictable results.

    Too many thanks, and :

    For solution 1: My statics is updated and table reindexed, but full scan occurred

    For solution 2:Database size is not matter, but Does duplicating data with covering indexes affect speed (slow down)?

    For solution 3: I don't understand!!!:w00t:

    For solution 4: It's good but when I want to query "John%" in this sample, fulltext don't cover item no.2:

    data sample:

    no. 1: Jane

    no. 2: John Smith

    no. 3: Johnnie

    no. 4: Bill

    and I want to get both no.1,2

    Thanks for your time

  • I guess you mean you want no2 and no3 ! (both startin with john)

    btw what's the distribution of your value '???'. In what kind of percentage does it occur ?

    If more than a certain percentage sql may switch to a scan operation as well.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Hi,

    Ad 2:) It does. Inserts will be slower, but how much slower - I can't tell. Probably you will not even notice difference with single or dozens of inserts.

    Ad 3:)

    The idea to reorder logical layout of your data in the table, and keep it ordered in order defined by fname, not ID column.

    SQL will not have to make any lookups to retrieve data when using fname based index, because all data will be stored on fname index pages.

    CREATE TABLE dbo.Table (

    ID bigint NOT NULL PRIMARY KEY NONCLUSTERED ,

    fname varchar](50) NOT NULL,

    lname varchar](50) NOT NULL

    )

    CREATE UNIQUE CLUSTERED INDEX MyClustered Idx ON dbo.Table

    (

    fname ASC,

    ID ASC

    )

    Re-load content to the table above and run your query.

    Ad 4) "It's good but when I want to query "John%" in this sample, fulltext don't cover item no.2:"

    Why ?

    "John%" = SELECT * FROM Table WHERE CONTAINS (fname, '"john*"')

    Regards,

    Slawek

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply