RE: 108,000,000 row table - check for matches

Ten Centuries

Points: 1323

September 23, 2013 at 6:52 am

L' Eomot Inversé (9/23/2013)
duncanburtenshaw (9/23/2013)
Erland Sommarskog (9/21/2013)
Before we start talking about sargable arguments, maybe we should see if there is anything to sarg for. From what Duncan says, there does not seem to be any.
We have keys that should be 1001231231231 or 1007897897897, but in fact are only 1231231231 and 7897897897. Unless there is a pattern for the beginning of these short keys, there is nothing to sarg for.
What Duncan could do is to insert all 13-character keys into a temp table, in this way he only needs to scan the big table once. Which still takes a long time, but it should not take the entire weekend.
Thank you all for your inputs - I have failed to explain the problem correctly, it is the large table which has a mix of 13 and 16 char TOIDs, these are the ones I am looking to find by adding '100'+toid and '500'+toid to my search and seeing if any records in the smaller table match - I will add two more columns to the bigger table, add in the '100'+toid and '500'+toid to these, create an index and see if I get any matches
I think that any other option will inevitably produce a table scan on the large table for each row of the smaller table, so this is the way I am going now.
thanks anyway..
D.
I'd be interested to know how many rows there have to be in the small table before the code i used produces a scan of the large table. In theory the optimiser should only do this when it thinks that the number of index lokups would be sufficientl large for it to be advantageous touse a scan; I find it hard to believe that something as low as .000324% of the rows implies a scan rather than individual lookups when what is being looked up is an exact match of the 13 character string.
Of course it might be better if the 16 character matches were elimnated first and a new test table constructed from what wasn't a 16-character match, with the first three characters discarded (and the row eliminated if they were not one of the two interesting 3-character strings) and provided with a clustered index on the string column, because then the check for those values is very obviously a check for matches in the two indexes with no mesing about with appending prefixes - that would certainly take a lot less time than adding two new indexes to the big table and then checking all the possible matches, as eliminating the 16 character matches and truncating what is left will be a lot cheaper than creating two new indexes on the big table (about 2000 times fewer logical writes) and once you've done that you have only 1 index (the one you already have on the big table) to match against instead of 3.

Thank you for your comments. Perhaps I am completely missing the point. Let me reiterate:

1. I have a large table (lets call it BIGa)

2. I have a smaller table, (lets call it Smallb)

3. The BIGa table has 108,000,000 rows. Out of these, approx 27,000,000 rows have a TOID (unique number) which is 13 chars in length instead of 16 (which all of the other rows are).

4. I have a small update table (Smallb) with 370,000 odd rows. None of the rows in Smallb are 13 chars long, they are all correct length of 16 chars.

5. I am trying to match any rows from Smallb with a matching row in BIGa based on any TOIDs in BIGa which are a) 13 chars long, and where the row matches based on '100'+TOID OR '500'+toid

Based on the assumption above, surely all rows in the big table (27,000,000 rows based on len(TOID=13)) will be scanned?

This was my original question - apart from adding a couple of columns persisting the '100'+TOID and '500'+TOID /indexing them, and then looking for a match - was there a better way of seeing if there are any matches between Smallb and BIGa that the code I originally quoted.

I have, quite obviously, failed completely to explain this problem satisfactorily and for that I apologise.

I have added new columns on Biga now with '500' and '100'+TOID, and will check for any direct matches after this has completed - I just wanted to know if I was missing anything which would have saved me the time in creating these columns and would allow me to detect possible matches.

thank you for all your suggestions - next time I will attach execution plans as well.