Searching strings

  • Hi all

    We've got a 3rd-party application and an associated database.
    We need to join two tables together but, unfortunately, there's no common field.

    What we've been advised to do do is use the following ON clause as part of the join:-
    ON M.PcsAssmntDataID LIKE '%' + I.VisitID +'%';

    I've tested that clause and the performance is (understandably) terrible.

    I've also tried variations on the above, such as:-

    M.PcsAssmntDataID LIKE '%' + I.VisitID +'%'
    left(M.PcsAssmntDataID,len(I.VisitID)) = I.VisitID
    SUBSTRING(M.PcsAssmntDataID,1,len(I.VisitID)) = I.VisitID
    PATINDEX('%' + I.VisitID + '%',M.PcsAssmntDataID) > 0

    with varying degrees of success.  None of the above give a full result-set in less than 3 hours (that's for 795522 records).

    I can't think of any other way of performing the join, and we can't do anything with the base tables (or even add any to make a proper link table).

    Has anyone got any ideas?

    Happy to post anything else that might be useful (within reason).

  • Have you considered using Full-Text Search?

    John

  • I've never used Full Text Search.

    This is my current query:-
    SELECT
        M.PcsAssmntDataID
      ,I.VisitID
      ,M.IdentifierID
      ,I.InterventionUrnID
      ,I.InterventionStatus
    FROM
        PcsAcct_Interventions AS I
        INNER JOIN PcsAssmntData_Main AS M
            ON M.PcsAssmntDataID LIKE '%' + I.VisitID + '%';

    I've had a look at your link but I can't see how to feed the 2nd part of the CONTAINS clause with a list from the 2nd table.

  • Could you post DDL and some sample data, this may help.others to help you.

    ...

  • DDL for each table as follows:-
    CREATE TABLE [dbo].[PcsAcct_Interventions](
        [SourceID] [varchar](3) NOT NULL,
        [VisitID] [varchar](30) NOT NULL,
        [InterventionUrnID] [varchar](83) NOT NULL,
        [RowUpdateDateTime] [datetime] NULL,
        [Intervention_PcsInterventionID] [varchar](23) NULL,
        [InterventionStatus] [varchar](15) NULL,
        [InterventionInitializeDateTime] [datetime] NULL,
        [InterventionInitializeClient_UnvClientID] [varchar](45) NULL,
        [InterventionInitializeUser_UnvUserID] [varchar](30) NULL,
        [InterventionStartDateTime] [datetime] NULL,
        [InterventionCompleteDateTime] [datetime] NULL,
        [InterventionCompleteUser_UnvUserID] [varchar](30) NULL,
        [InterventionPriorityZold] [int] NULL,
        [InterventionSource] [varchar](5) NULL,
        [InterventionItemRid] [varchar](1250) NULL,
        [InterventionKeyIndicator] [varchar](2) NULL,
        [InterventionOnWorklist] [varchar](2) NULL,
        [InterventionEdmType] [varchar](15) NULL,
        [InterventionEdmOnTriage] [varchar](2) NULL,
        [InterventionEdmAddTriage] [varchar](2) NULL,
        [InterventionEdmLastChange] [varchar](2) NULL,
        [InterventionEdmSendToPcs] [varchar](2) NULL,
        [InterventionEdmOnWorklist] [varchar](2) NULL,
        [InterventionEdmAddWorklist] [varchar](2) NULL,
        [InterventionEdmOnDischarge] [varchar](2) NULL,
        [InterventionEdmAddDischarge] [varchar](2) NULL,
        [InterventionDischarge] [varchar](2) NULL,
        [InterventionMoveable] [varchar](2) NULL,
        [InterventionEdmOnPre] [varchar](2) NULL,
        [InterventionEdmAddPre] [varchar](2) NULL,
        [InterventionOnHold] [varchar](2) NULL,
        [InterventionExpirationHold] [datetime] NULL,
        [InterventionSurCaseUrn] [varchar](30) NULL,
        [InterventionSurCaseChartMode] [varchar](30) NULL,
        [InterventionSurSendToPcs] [varchar](2) NULL,
        [InterventionRegulatoryInterventionTriggeredZold] [varchar](83) NULL,
        [InterventionRegulatoryInterventionTriggeredByZold] [varchar](83) NULL,
    CONSTRAINT [mtpk_fc60001897] PRIMARY KEY CLUSTERED
    (
        [SourceID] ASC,
        [VisitID] ASC,
        [InterventionUrnID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]

    CREATE TABLE [dbo].[PcsAssmntData_Main](
        [SourceID] [varchar](3) NOT NULL,
        [PcsAssmntDataID] [varchar](250) NOT NULL,
        [IdentifierID] [varchar](1509) NOT NULL,
        [RowUpdateDateTime] [datetime] NULL,
        [DateTimePerformed] [datetime] NULL,
        [DateTimeScheduled] [datetime] NULL,
        [SourceType] [varchar](150) NULL,
        [Deleted] [varchar](2) NULL,
        [Source] [varchar](5) NULL,
        [Type] [varchar](3) NULL,
        [TypeOid] [varchar](15) NULL,
    CONSTRAINT [mtpk_fc60002009] PRIMARY KEY CLUSTERED
    (
        [SourceID] ASC,
        [PcsAssmntDataID] ASC,
        [IdentifierID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]

    I'll post some data shortly.

  • Why do you have IDs that are that long? If an ID is 1509 characters long, it might have lost its purpose.
    Instead of going for Full Text Search, which might not be what you need, you could index on a persisted computed column. You should decompose those long identifiers to comply with the first normal form and you'll have better performance and easier coding.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares - Friday, January 5, 2018 7:41 AM

    Why do you have IDs that are that long? If an ID is 1509 characters long, it might have lost its purpose.
    Instead of going for Full Text Search, which might not be what you need, you could index on a persisted computed column. You should decompose those long identifiers to comply with the first normal form and you'll have better performance and easier coding.

    +100 agree entirely, but needed to see the DDL first.

    ...

  • -- If VisitID is a string within PcsAssmntDataID bounded by spaces,
    -- then you could use a string-splitter followed by LEN or something
    -- else to collect the VisitID from the output.
    -- You could then do this:
    SELECT
        M.PcsAssmntDataID
      ,I.VisitID
      ,M.IdentifierID
      ,I.InterventionUrnID
      ,I.InterventionStatus
    FROM PcsAssmntData_Main AS M
    CROSS APPLY dbo.GetVisitID(M.PcsAssmntDataID) x
    INNER JOIN PcsAcct_Interventions AS I
     ON x.NewVisitID = I.VisitID;

    where dbo.GetVisitID incorporates the functionality of the string splitter and the row chooser.

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Luis Cazares - Friday, January 5, 2018 7:41 AM

    Why do you have IDs that are that long? If an ID is 1509 characters long, it might have lost its purpose.
    Instead of going for Full Text Search, which might not be what you need, you could index on a persisted computed column. You should decompose those long identifiers to comply with the first normal form and you'll have better performance and easier coding.

    I'm thinking that this is the best option for this case.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sorry folks, can't even add in a computed column (persisted or otherwise).
    I know this is a bit of hassle, just wanted to check I didn't have any more option in my join clause.

  • richardmgreen1 - Monday, January 8, 2018 2:58 AM

    Sorry folks, can't even add in a computed column (persisted or otherwise).
    I know this is a bit of hassle, just wanted to check I didn't have any more option in my join clause.

    No, with a design like this you're stuck with bad performance. Any option will still need to read the index or table completely.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • I was afraid of that, but thanks anyway.

    Looks like an "improvement suggestion" is heading back to the vendor.

  • richardmgreen1 - Monday, January 8, 2018 7:28 AM

    I was afraid of that, but thanks anyway.

    Looks like an "improvement suggestion" is heading back to the vendor.

    While you're waiting for that to happen, consider the possibility of using an "Indexed View", which is actually a "Materialized View".  It won't need any triggers or stored procedures to maintain it.  It's a little tricky to write because of the rules that you need to follow (especially when it comes to writing a deterministic function to support it) and it IS a duplication of data, but it can solve this performance problem provided that you can change your code to select from the view instead of the table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I'll have to do some research into those, but that might be the best idea.
    Unfortunately, because it will affect one of the tables of our main database (and the database itself), I'll have to get permission to even attempt it.

  • richardmgreen1 - Friday, January 5, 2018 6:59 AM

    I'll post some data shortly.

    still waiting ๐Ÿ™‚

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply