Indexing question

  • I have a single column table with 140 Million rows. The single column is the Primary Key. This table is joined to others to do updates. Would the updates benefit from applying an index to this table?


  • Hard to believe, but PK is an index.

    _____________
    Code for TallyGenerator

  • I thought that was the case but originally the table had no index or PK. I added the PK and it did nothing to speed up my query. The table is truncated and bulkloaded each week from a text file and the PK slowed the process and was removed. Bad form, I know, but the benefit was there.


  • Index is not a magic wand.

    Sometimes quality of design and queries is important as well.

    😎

    _____________
    Code for TallyGenerator

  • One more thing to consider here, what is the cost of creating the index and what is the cost of not having in for the period between reloads.

    As Sergiy said, there's something to be said for having the right design. But that is a call you'll have to make because we don't have enough info to make that call for ourselves.

  • One thing I can say for sure:

    if the system requires 140 million rows downloads every week just to check if there are any changes then it was designed mindlessly.

    _____________
    Code for TallyGenerator

  • Agreed, but we don't have enough details to call bullshit... yet.

  • The cost to have it in during the import is about an hour of extra load time I am told. I haven't seen this for myself. I will test it when I can. The cost to build the PK was about 40 Minutes. The update queries that use it take about 12 minutes regardless of the PK. I have tested writing the query few different ways.

    1----------------------

    select uid2,invalidflag from tb_leadbox

    WHERE (HomePhone IN

    (SELECT dbo.tb_LeadBox.Homephone

    FROM dbo.tb_DNCNAT INNER JOIN

    dbo.tb_LeadBox ON dbo.tb_DNCNAT.DNC_PHONE = dbo.tb_LeadBox.HomePhone and invalidflag = 0))

    and invalidflag = 0 and sourcecode not in (SELECT SourceCode

    FROM tb_LB_SourceCode

    WHERE (Nat_DNC = 1))

    2------------------------

    select uid2,invalidflag from tb_leadbox

    WHERE (HomePhone IN

    (SELECT DNC_PHONE

    FROM dbo.tb_DNCNAT ))

    and invalidflag in (0,7) and sourcecode not in (SELECT SourceCode

    FROM tb_LB_SourceCode

    WHERE (Nat_DNC = 1))

    3-----------------------

    select uid2,invalidflag from tb_leadbox l

    join tb_dncnat on homephone = dnc_phone

    join tb_lb_sourcecode s on l.sourcecode = s.sourcecode

    where (nat_dnc <> 1 or nat_dnc is null) and invalidflag in (0,7)

    tables------------------------

    tb_leadbox has indexes on uid2(PK), invalidflag, and homephone. 2M records

    lb_sourcecode has indexes on sourccode(PK)

    150 Records


  • Ninja's_RGR'us (10/2/2007)


    Agreed, but we don't have enough details to call bullshit... yet.

    It is the National Do Not Call List designeed by a Federal Agency that we are forced to use and keep current. Call it what you will.


  • Indexes are not really usefull when you need to scan the whole table like that. Since you report no performance inprovement, I'd just drop the matter.

    The real question is can you get a list of only modifications? (new numbers on the list and numbers removed from the list). That would considerably cut down on the time it takes to run this process...

    Maybe you could even write your own... maybe it's less expansive to extract the changes from 2 different imports and then update the rest of the data than doing the 3 queries you run now. I'm just throwing this idea out there, I've never had to deal with that much data in a scenario like that.

  • Unfortunately the list only comes one way, in bulk. Since there are updates and adds in both the DNC and the customer table it was easier to just compare the two against each other rather than only look for adds and modifies on both sides. Note I said easier, not necessarily smarter. I had a huge breakthrough from an earlier post so now I am reviewing every step of the process to see what else can be done. Also it pains me to have a table with no PK.


  • I've never seen this file, can you clear something up for me?

    You have a DNC list, hence (in my head anyways), you have a list of # to NOT call. So the numbers can only be added and removed from the list (even if they are updated, how the heck can you track that with only 1 column).

    Am I missing something obvious from the info you gave me, or is there something else I don't know.

    If I'm not missing anything, it should be pretty easy to only track differences :

    List of deletes = Exists in old but not in new

    List of inserts = Exists in new but not in old

    This could easily be done with a full join and track what's missing. You'd scan each table only once. I don't see how much faster than this you can get!!!

    Now I'm assuming there's a catch for the matching to your list, but I can only presume at this point.

  • Another trick for the large import is to set the DB in BULK_LOGGED mode before the import and if possible run bcp/bulkinsert in parallel (if at all possible)

    I am also confused by a "single column" table "design" 😉


    * Noel

  • How does this bulk load work? Is this only a parameter you set or do you have to manually split the data file and call imports in parallel?

  • Ninja's_RGR'us (10/2/2007)


    How does this bulk load work? Is this only a parameter you set or do you have to manually split the data file and call imports in parallel?

    instead of giving you the "short" answer here you can have the full story


    * Noel

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply