Join on a Substring

  • I have a proc that runs nightly that moves data into another table. 
    The procedure joins on multiple tables one of the joins involves two columns in  tables that  have id numbers. 
    I have no control of the source of these tables but basically I have been doing a join using a substring

    tbl1.Id =  98789
    tbl2.Id = x98789

     

      LEFT JOIN tbl1 ON SUBSTRING(tbl2.Id, 2, LEN(tb2.id) = tbl1.Id
        

    I am concerned about performance and looking for suggestions on how to improve.
    This nightly only involves a couple thousand records.

  • Joe flynn - Wednesday, October 25, 2017 11:59 AM

    I have a proc that runs nightly that moves data into another table. 
    The procedure joins on multiple tables one of the joins involves two columns in  tables that  have id numbers. 
    I have no control of the source of these tables but basically I have been doing a join using a substring

    tbl1.Id =  98789
    tbl2.Id = x98789

     

      LEFT JOIN tbl1 ON SUBSTRING(tbl2.Id, 2, LEN(tb2.id) = tbl1.Id
        

    I am concerned about performance and looking for suggestions on how to improve.
    This nightly only involves a couple thousand records.

    I have an idea but I need to know the datatypes of the two ID columns, please.  Also, is the "x" a literal or can the value of "x" be something other than "x"?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • jjf410 - Wednesday, October 25, 2017 11:59 AM

    I have a proc that runs nightly that moves data into another table. 
    The procedure joins on multiple tables one of the joins involves two columns in  tables that  have id numbers. 
    I have no control of the source of these tables but basically I have been doing a join using a substring

    tbl1.Id =  98789
    tbl2.Id = x98789

     

      LEFT JOIN tbl1 ON SUBSTRING(tbl2.Id, 2, LEN(tb2.id) = tbl1.Id
        

    I am concerned about performance and looking for suggestions on how to improve.
    This nightly only involves a couple thousand records.

    You could create a computed, persisted column containing that formula and index it appropriately. That would speed things up, but for a couple of thousand rows, I'm not sure it's worth the effort.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Datatype is varchar and the field always begins with the x

  • jjf410 - Wednesday, October 25, 2017 12:20 PM

    Datatype is varchar and the field always begins with the x

    And the datatype for the field in the other table is ?   I know Jeff would ask...

    Steve (aka sgmunson) πŸ™‚ πŸ™‚ πŸ™‚
    Rent Servers for Income (picks and shovels strategy)

  • sgmunson - Thursday, October 26, 2017 8:39 AM

    jjf410 - Wednesday, October 25, 2017 12:20 PM

    Datatype is varchar and the field always begins with the x

    And the datatype for the field in the other table is ?   I know Jeff would ask...

    To clarify what Steve posted, it's not clear if you mean that both columns are VARCHAR or not.  The one without the "x" could be one of the numeric datatypes and that would mean that we'd have to consider a mis-match in datatypes for the sake of performance. 

    To ask the question with a bit more clarity, are both columns VARCHAR(n)?  We also need to know what the value of "n" is for both columns.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Both fields are varchar

  • jjf410 - Monday, October 30, 2017 10:36 AM

    Both fields are varchar

    Just curious, and it may not be the best solution, have you tried this:
    LEFT JOIN tbl1 ON tbl2.Id = 'x' + tbl1.Id

  • Lynn Pettis - Monday, October 30, 2017 10:43 AM

    jjf410 - Monday, October 30, 2017 10:36 AM

    Both fields are varchar

    Just curious, and it may not be the best solution, have you tried this:
    LEFT JOIN tbl1 ON tbl2.Id = 'x' + tbl1.Id

    Putting the non-SARGable expression on the join column of the other table has a 50% chance of being faster πŸ™‚ absolutely worth a try

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I guess my take on it would be to brow beat someone into letting me add a persisted, indexed, computed column based on the integer datatype.  No matter what else you do, one or the other table is going to suffer a scan.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Tuesday, October 31, 2017 8:18 AM

    I guess my take on it would be to brow beat someone into letting me add a persisted, indexed, computed column based on the integer datatype.  No matter what else you do, one or the other table is going to suffer a scan.

    Exactly where I was going with the asking of the question...  Great minds think alike...

    Steve (aka sgmunson) πŸ™‚ πŸ™‚ πŸ™‚
    Rent Servers for Income (picks and shovels strategy)

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply