Hints for searching text?

  • @orbit,

    As to the index, searching for the CHECKSUM value that is indexed will certainly work well, and maybe the primary question is how large the rest of the table is; as compared to one row in the new index; at least as to the overall I/O efficiency goes.   However, as long as one can at least ask that question, and actually use a CHECKSUM value, why not use normalization for the Message column, and instead have a table of ALL POSSIBLE message values, each with it's own Primary Key value, and then tie that key value to the table in question instead of having to search the large table, you only need to search the Message table based on the Primary Key.   You can also index the first 900 or so characters of the Message column as well, and then have a reasonably quick search tool to find the necessary Primary Key value to search the main table for...

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Interesting idea with a message-lookup but I think I want to make the smallest possible changes at the moment so I think this'll have to do at the moment.

     

  • CreateIndexNonclustered wrote:

    Right - but only the checksum column is keyed. With the number of collisions in checksum I would think you would hit a point of maximum entropy in index fragmentation pretty quickly unless there is a very high rate of new record inserts. Maybe I am overestimating the number of collisions that would occur?

    I really don't like the idea of starting out day 1 increasing the amount of physical i/o by 20%. I had a system in the past that used a clustered uniqueIdentifier that served to intentionally fragment the clustered index across file groups on separate spindles when it still ran on HDD, (and also to accommodate empty record inserts with dozens or hundreds of updates per record before the record was 'closed') and even after 10 years it was less than 20% fragmented. The only time it ever had issues with bad page splits was after having to back fill several months of data

    As always, "It Depends".

    And, to ask the question, do you do index maintenance and, if you do, do you use REORGANIZE?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It depends on the server, but the basic starting point is typically only to do statistics updates every two weeks. Some go more often than that. Dynamics servers I do an update every day. (they don't have parameter sniffing disabled but nearly all the queries are hinted with 'optimize for unknown')

    I typically only do index rebuilds manually to 100% fill factor using a report that generates the commands to rebuild them, ignoring anything smaller than a 1 Mb or less than ~30% fragmented, so it doesn't blow up the log or kill the secondaries.

    I do use reorg in some cases but always with a full scan statistics update following, usually in cases where I don't have enterprise edition or if it's a just a huge index and don't want dozens or more Gb of logs.

  • Grant Fritchey wrote:

    Full text indexes are an option. Although, in all honesty, I hate 'em. Hard to maintain and frequently don't actually speed things up much, if at all. Extra storage. Ugh. Anyhow, it is something to test for large text searches.

    In general I agree with you but having done a few FTS I never had maintenance issues. Yes extra storage would be an issue here. As for speed it is how you write the t-sql that counts but I admit I have not done it with millions of rows. As for starts with, FTS can easily do that.

    Why has no one mentioned that hashing cannot guarantee a unique value.

    Far away is close at hand in the images of elsewhere.
    Anon.

  • They way he/she is doing it, it doesn't need to be unique. It only needs to reduce the number of records that need to be scanned at low cost.

    This is a common technique for doing seeks on encrypted columns. You get lots of collisions, but it's still better than having to decrypt every record.

  • CreateIndexNonclustered wrote:

    They way he/she is doing it, it doesn't need to be unique. It only needs to reduce the number of records that need to be scanned at low cost.

    This is a common technique for doing seeks on encrypted columns. You get lots of collisions, but it's still better than having to decrypt every record.

    Understood just thought it was worth mentioning. It would only be a matter of filtering the returned rows.

    Far away is close at hand in the images of elsewhere.
    Anon.

  • This was removed by the editor as SPAM

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply