Join on field of varying length

  • Hi Folks,

    Looking for advice on the following query I'm trying to alter:

    SELECT DISTINCT ContractNumber as ContractNumber

    ,p.PolicyNumber as PolicyNumber

    ...

    FROM dbo.ExtVESEnrollmentCoverage   ec

    INNER JOIN ...

    inner join ExtPCMPolicySchemaCoverage psc on pscc.PolicySchemaCoverageContainerId= psc.PolicySchemaCoverageContainerIdand pscc.PolicySchemCoverageContainerID= Right(ec.PCMRetrievalCode, 4

    INNER JOIN ...

    Now, the PCMRetrievalCode has the following format: sample retrieval code:

    PO607CON1324<-- Format

      PO = Non-Package    PA = Package

     607 = SchemaId

     CON  = Container

     1324 = ContainerId

    I need to account for the SchemaId, which could potentially be more than 3 numbers, as the schemas increase, and also the ContainerID, which could be 3-5 numbers. I've tried using LEN and SUBSTRING, but it's giving me the red squigglies:

    inner join ExtPCMPolicySchemaCoverage psc on pscc.PolicySchemaCoverageContainerId = psc.PolicySchemaCoverageContainerId

    Where Left(ec.PCMRetrievalCode,2) in ('PO','PA')

    AND (LEN(SUBSTRING(ec.PCMRetrievalCode,3)) <= 5 And LEN(SUBSTRING(ec.PCMRetrievalCode,3)) >= 3)

    AND pscc.PolicySchemaCoverageContainerId = Right(ec.PCMRetrievalCode, 4)

    I don't believe SUBSTRING would apply, as I can't specify the length, as one of its required arguments, due to it being a varying value as I mentioned. Any input would be greatly appreciated. Thanks!
     

  • I'd use CROSS APPLYs to determine the lengths (this helps keeps the other code "cleaner" by pushing calcs/computations into CROSS APPLYs).  For example:


    SELECT PCMRetrievalCode, LenOfSchemaId, LenOfContainer, LenOfContainerId
    FROM (
      VALUES('PO607CON1324'),('PO60712CO124'),('PO6071CONT12345')
    ) AS ec(PCMRetrievalCode)
    CROSS APPLY (
        SELECT PATINDEX('%[^0-9]%', SUBSTRING(ec.PCMRetrievalCode, 3, 100)) - 1 AS LenOfSchemaId,
            PATINDEX('%[^0-9]%', REVERSE(ec.PCMRetrievalCode)) - 1 as LenOfContainerId
    ) AS aliases1
    CROSS APPLY (
        SELECT LEN(ec.PCMRetrievalCode) - 2 - LenOfSchemaId - LenOfContainerId AS LenOfContainer
    ) AS aliases2
    WHERE ec.PCMRetrievalCode LIKE 'P[AO]%' /*better technique than LEFT(ec.PCMRetrievalCode, 2) IN ...*/
    --AND ...other comparisions here, using calc'd lengths...

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Of course what you're struggling with here is a denormalized table design, specifically a multi-part policy code column where attributes must be parsed at runtime. I wouldn't remove the original column, because it may break existing applications that reference it. However, if you have ownership of the tables, then I would add indexed computed columns to contain the attributes you commonly use to join on. That makes for SQL coding that easier to read and better performing.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • ScottPletcher - Wednesday, April 18, 2018 9:17 AM

    I'd use CROSS APPLYs to determine the lengths (this helps keeps the other code "cleaner" by pushing calcs/computations into CROSS APPLYs).  For example:


    SELECT PCMRetrievalCode, LenOfSchemaId, LenOfContainer, LenOfContainerId
    FROM (
      VALUES('PO607CON1324'),('PO60712CO124'),('PO6071CONT12345')
    ) AS ec(PCMRetrievalCode)
    CROSS APPLY (
        SELECT PATINDEX('%[^0-9]%', SUBSTRING(ec.PCMRetrievalCode, 3, 100)) - 1 AS LenOfSchemaId,
            PATINDEX('%[^0-9]%', REVERSE(ec.PCMRetrievalCode)) - 1 as LenOfContainerId
    ) AS aliases1
    CROSS APPLY (
        SELECT LEN(ec.PCMRetrievalCode) - 2 - LenOfSchemaId - LenOfContainerId AS LenOfContainer
    ) AS aliases2
    WHERE ec.PCMRetrievalCode LIKE 'P[AO]%' /*better technique than LEFT(ec.PCMRetrievalCode, 2) IN ...*/
    --AND ...other comparisions here, using calc'd lengths...

    Thanks, Scott. So would I embed the cross apply within the Join clauses of the original query?

  • Eric M Russell - Wednesday, April 18, 2018 9:30 AM

    Of course what you're struggling with here is a denormalized table design, specifically a multi-part policy code column where attributes must be parsed at runtime. I wouldn't remove the original column, because it may break existing applications that reference it. However, if you have ownership of the tables, then I would add indexed computed columns to contain the attributes you commonly use to join on. That makes for SQL coding that easier to read and better performing.

    Hi Eric,
    You're probably right, being that it's a data warehouse environment, which tends to be denormalized. That's a good suggestion, and I might see about at least trying that change on my sandbox environment, as any higher environment would require a whole change request process with approvals and such. Thanks.

  • daniness - Wednesday, April 18, 2018 10:28 AM

    Eric M Russell - Wednesday, April 18, 2018 9:30 AM

    Of course what you're struggling with here is a denormalized table design, specifically a multi-part policy code column where attributes must be parsed at runtime. I wouldn't remove the original column, because it may break existing applications that reference it. However, if you have ownership of the tables, then I would add indexed computed columns to contain the attributes you commonly use to join on. That makes for SQL coding that easier to read and better performing.

    Hi Eric,
    You're probably right, being that it's a data warehouse environment, which tends to be denormalized. That's a good suggestion, and I might see about at least trying that change on my sandbox environment, as any higher environment would require a whole change request process with approvals and such. Thanks.

    Even a data warehouse, multi-valued or multi-attribute columns are to be avoided. That's like the first and most basic level of normalization. It's understandable that there are situations where users end up querying the database in ways that were not anticipated when the database was modeled, but if a code contains things like line of business, region, or category, then those attributes need to be separated.
    https://en.wikipedia.org/wiki/First_normal_form

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • daniness - Wednesday, April 18, 2018 10:24 AM

    ScottPletcher - Wednesday, April 18, 2018 9:17 AM

    I'd use CROSS APPLYs to determine the lengths (this helps keeps the other code "cleaner" by pushing calcs/computations into CROSS APPLYs).  For example:


    SELECT PCMRetrievalCode, LenOfSchemaId, LenOfContainer, LenOfContainerId
    FROM (
      VALUES('PO607CON1324'),('PO60712CO124'),('PO6071CONT12345')
    ) AS ec(PCMRetrievalCode)
    CROSS APPLY (
        SELECT PATINDEX('%[^0-9]%', SUBSTRING(ec.PCMRetrievalCode, 3, 100)) - 1 AS LenOfSchemaId,
            PATINDEX('%[^0-9]%', REVERSE(ec.PCMRetrievalCode)) - 1 as LenOfContainerId
    ) AS aliases1
    CROSS APPLY (
        SELECT LEN(ec.PCMRetrievalCode) - 2 - LenOfSchemaId - LenOfContainerId AS LenOfContainer
    ) AS aliases2
    WHERE ec.PCMRetrievalCode LIKE 'P[AO]%' /*better technique than LEFT(ec.PCMRetrievalCode, 2) IN ...*/
    --AND ...other comparisions here, using calc'd lengths...

    Thanks, Scott. So would I embed the cross apply within the Join clauses of the original query?

    Yes,  Code the CROSS APPLYs immediately after the table containing the PCMRetrievalCode column, then code the JOINs after that.  Since the CROSS APPLY has already been processed, you can use the new column names in the joins.

    If you'd prefer, we can adjust the CROSS APPLY to pull out the column values rather than the lengths.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Hi Scott,
    Thanks for your reply. I'm not following your code. In this portion:
    SELECT PCMRetrievalCode, LenOfSchemaId, LenOfContainer, LenOfContainerId
    FROM (
    VALUES('PO607CON1324'),('PO60712CO124'),('PO6071CONT12345')
    ) AS ec(PCMRetrievalCode) 

    You're selecting 4 fields, but only 3 values. Could you please explain what your code does? I haven't had much exposure to PATINDEX or cross applies, other than reading up on it today, in order to hopefully tackle this issue. Also, I'm not sure how it'd fit in the original query:

    SELECT DISTINCT ContractNumber as ContractNumber

    ,p.PolicyNumber as PolicyNumber

    ...

    FROM dbo.ExtVESEnrollmentCoverage   ec

    INNER JOIN ExtPCMPolicy p  ON p.PolicyId = ec.PCMPolicyId  

    inner join ...

    inner join ExtPCMPolicySchemaCoverage psc on pscc.PolicySchemaCoverageContainerId= psc.PolicySchemaCoverageContainerIdand pscc.PolicySchemCoverageContainerID= Right(ec.PCMRetrievalCode, 4)   

    INNER JOIN ...


    I appreciate your feedback. Thank you!

  • daniness - Wednesday, April 18, 2018 1:59 PM

    Hi Scott,
    Thanks for your reply. I'm not following your code. In this portion:
    SELECT PCMRetrievalCode, LenOfSchemaId, LenOfContainer, LenOfContainerId
    FROM (
    VALUES('PO607CON1324'),('PO60712CO124'),('PO6071CONT12345')
    ) AS ec(PCMRetrievalCode) 

    You're selecting 4 fields, but only 3 values. Could you please explain what your code does? I haven't had much exposure to PATINDEX or cross applies, other than reading up on it today, in order to hopefully tackle this issue. Also, I'm not sure how it'd fit in the original query:

    SELECT DISTINCT ContractNumber as ContractNumber

    ,p.PolicyNumber as PolicyNumber

    ...

    FROM dbo.ExtVESEnrollmentCoverage   ec

    INNER JOIN ExtPCMPolicy p  ON p.PolicyId = ec.PCMPolicyId  

    inner join ...

    inner join ExtPCMPolicySchemaCoverage psc on pscc.PolicySchemaCoverageContainerId= psc.PolicySchemaCoverageContainerIdand pscc.PolicySchemCoverageContainerID= Right(ec.PCMRetrievalCode, 4)   

    INNER JOIN ...


    I appreciate your feedback. Thank you!

    My code just determines the actual length of each element in the combined column.  For example, if the value was:
    PO607CON1324
    my code would say that the SchemaId (607) is 3 bytes, the Container (CON) is 3 bytes and the ContainerId (1324) is 4 bytes.
    With that info, presumably you could easily code a SUBSTRING that would let you compare any element -- SchemaId / Container / ContainerId -- to any other column/value you wanted to.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • daniness - Wednesday, April 18, 2018 1:59 PM

    Hi Scott,
    Thanks for your reply. I'm not following your code. In this portion:
    SELECT PCMRetrievalCode, LenOfSchemaId, LenOfContainer, LenOfContainerId
    FROM (
    VALUES('PO607CON1324'),('PO60712CO124'),('PO6071CONT12345')
    ) AS ec(PCMRetrievalCode) 

    You're selecting 4 fields, but only 3 values. Could you please explain what your code does?

    It's just test data that's being used instead of your table and they forget to label it as such for newbies.

    Shifting gears a bit, please see the article at the first link in my signature line below under "Helpful Links" to help us help you in the future.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ScottPletcher - Wednesday, April 18, 2018 2:37 PM

    daniness - Wednesday, April 18, 2018 1:59 PM

    Hi Scott,
    Thanks for your reply. I'm not following your code. In this portion:
    SELECT PCMRetrievalCode, LenOfSchemaId, LenOfContainer, LenOfContainerId
    FROM (
    VALUES('PO607CON1324'),('PO60712CO124'),('PO6071CONT12345')
    ) AS ec(PCMRetrievalCode) 

    You're selecting 4 fields, but only 3 values. Could you please explain what your code does? I haven't had much exposure to PATINDEX or cross applies, other than reading up on it today, in order to hopefully tackle this issue. Also, I'm not sure how it'd fit in the original query:

    SELECT DISTINCT ContractNumber as ContractNumber

    ,p.PolicyNumber as PolicyNumber

    ...

    FROM dbo.ExtVESEnrollmentCoverage   ec

    INNER JOIN ExtPCMPolicy p  ON p.PolicyId = ec.PCMPolicyId  

    inner join ...

    inner join ExtPCMPolicySchemaCoverage psc on pscc.PolicySchemaCoverageContainerId= psc.PolicySchemaCoverageContainerIdand pscc.PolicySchemCoverageContainerID= Right(ec.PCMRetrievalCode, 4)   

    INNER JOIN ...


    I appreciate your feedback. Thank you!

    My code just determines the actual length of each element in the combined column.  For example, if the value was:
    PO607CON1324
    my code would say that the SchemaId (607) is 3 bytes, the Container (CON) is 3 bytes and the ContainerId (1324) is 4 bytes.
    With that info, presumably you could easily code a SUBSTRING that would let you compare any element -- SchemaId / Container / ContainerId -- to any other column/value you wanted to.

    Thanks, Scott...I appreciate the input...after consideration, I think I'm going to try to switch gears in my approach...that'll be another post, to hopefully simplify it more.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply