Why doesn't the substring work?

  • I have a stored procedure which is supposed to check if a part of a datatype text field (Textdescription) has an already existing projectnumber. However the field Textdescription can also have normal text in it (thus not starting with 'Int.nr.TXT:'). Normally I would expect that because the where clause is evaluated before the select that system will only try to retrieve those records I specify using my where. However it now fails because of error: Cannot convert varchar to int.

    So if my data is:

    ProjectID Textfield

    26 normal text in this field

    26 special text in this field

    26 Int.nr.TXT: 69000

    System will give error: Cannot convert varchar 'n this fie' to int'. When I expect it only to retrieve those records starting with 'Int.nr.TXT'. Why does SQL still try to evaluate those other lines. I now solved it by casting the TTT_ID field also as varchar. But what is the cause? My code is:

    UPDATE tblInvoicesProject

    SETTTT_Factureren = 0,

    TTT_FactuurStatus = 0,

    TTT_FactuurNummer = 0

    WHERETTT_ProjectID = 26

    AND TTT_FactuurStatus = 1

    AND TTT_ID NOT IN(

    SELECT SUBSTRING(CAST(X.TextDescription AS VARCHAR(100)),13,10)

    FROM [500].dbo.frstx X

    WHERE (LEFT(CAST(X.TextDescription AS VARCHAR(100)), 11) = 'Int.nr.TXT:'))

  • I suspect that the cause of the problem is that you have a row in the data where TextDescription begins with 'Int.nr.TXT:' but there is no valid integer after.

    This is a perfect time to use one of the new functions in SQL 2012: TRY_CAST()

    TRY_CAST will return null if there was a problem with the cast instead of generating an error.

    Your code could then be rewritten as:

    UPDATEtblInvoicesProject

    SETTTT_Factureren = 0,

    TTT_FactuurStatus = 0,

    TTT_FactuurNummer = 0

    WHERETTT_ProjectID = 26

    AND TTT_FactuurStatus = 1

    AND TTT_ID NOT IN(

    SELECT isnull(TRY_CAST(SUBSTRING(CAST(X.TextDescription AS VARCHAR(100)),13,10) as int), -1)

    FROM [500].dbo.frstx X

    WHERE (LEFT(CAST(X.TextDescription AS VARCHAR(100)), 11) = 'Int.nr.TXT:')

    )

    Note the use of isnull to convert any failed casts from null to -1. I assume that -1 is not a valid project number.

    If you have a null value in NOT IN () the results will probably not be what you expect.

  • Hi, Thank you for your response. However the goal is to only retrieve those lines starting with Int.nr.TXT:. I checked the data, all lines have a valid integer after the :. However I still do not get why, despite the where clause, SQL is trying to fetch and evaluate all rows, also those not starting with the Int.nr.TXT..

  • marc.eilander 13301 (7/31/2013)


    Hi, Thank you for your response. However the goal is to only retrieve those lines starting with Int.nr.TXT:. I checked the data, all lines have a valid integer after the :. However I still do not get why, despite the where clause, SQL is trying to fetch and evaluate all rows, also those not starting with the Int.nr.TXT..

    I think Stefan could be right. Try this test harness, then try query 2 on your data:

    DROP TABLE #frstx

    CREATE TABLE #frstx (ProjectID INT, TextDescription text)

    INSERT INTO #frstx (ProjectID, TextDescription)

    SELECT 26, 'normal text in this field' UNION ALL

    SELECT 26, 'special text in this field' UNION ALL

    SELECT 26, 'Int.nr.TXT: 69000'

    -- original subquery

    SELECT SUBSTRING(CAST(X.TextDescription AS VARCHAR(100)),13,10)

    FROM #frstx X

    WHERE (LEFT(CAST(X.TextDescription AS VARCHAR(100)), 11) = 'Int.nr.TXT:')

    -- fixed for deferred expression evaluation

    SELECT b.TTT_ID, X.TextDescription

    FROM #frstx X

    CROSS APPLY (SELECT ChosenRow = CASE WHEN X.TextDescription LIKE 'Int.nr.TXT:%' THEN 1 ELSE 0 END) a

    CROSS APPLY (

    SELECT TTT_ID = CASE WHEN ChosenRow = 1 THEN CAST(SUBSTRING(CAST(X.TextDescription AS VARCHAR(100)),13,10) AS INT) END) b

    WHERE ChosenRow = 1

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hi,

    I followed your instructions. The results are:

    Original subquery gives: 69000

    Proposed query containing cross apply: 69000,Int.nr.TXT: 69000

    So the original query works fine as long at is not part as a nested query for the rest of the stored procedure.

  • Have you tried both subqueries against your actual data, Marc?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Sorry, bout that. I checked it now against the actual data. The orignal query only gave back valid data (only numbers). The new proposed query gave the same by first displaying the number and secondly displaying the prefix text and then the same number.

  • How many rows are returned?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Both (the orignal query and the new proposed query) gave 57 rows.

  • marc.eilander 13301 (7/31/2013)


    Hi, Thank you for your response. However the goal is to only retrieve those lines starting with Int.nr.TXT:. I checked the data, all lines have a valid integer after the :. However I still do not get why, despite the where clause, SQL is trying to fetch and evaluate all rows, also those not starting with the Int.nr.TXT..

    The query optimizer is free to evaluate expressions and predicates in any order to get maximum performance.

    It seems that for some reason the generated execution plan results in the scalar expression being evaluated before the WHERE predicate.

    If you really want to know exactly what is happening, you can post an estimated query plan of your original query as an .sqlplan file (right click an execution plan and select Save Execution Plan as ...)

  • Ok - try these:

    SELECT

    TTT_Factureren = 0,

    TTT_FactuurStatus = 0,

    TTT_FactuurNummer = 0

    FROM tblInvoicesProject

    WHERE TTT_ProjectID = 26

    AND TTT_FactuurStatus = 1

    AND TTT_ID NOT IN (

    SELECT SUBSTRING(CAST(X.TextDescription AS VARCHAR(100)),13,10)

    FROM [500].dbo.frstx X

    WHERE (LEFT(CAST(X.TextDescription AS VARCHAR(100)), 11) = 'Int.nr.TXT:')

    )

    SELECT

    TTT_Factureren = 0,

    TTT_FactuurStatus = 0,

    TTT_FactuurNummer = 0

    FROM tblInvoicesProject

    WHERE TTT_ProjectID = 26

    AND TTT_FactuurStatus = 1

    AND TTT_ID NOT IN (

    SELECT b.TTT_ID

    FROM [500].dbo.frstx X

    CROSS APPLY (SELECT ChosenRow = CASE WHEN X.TextDescription LIKE 'Int.nr.TXT:%' THEN 1 ELSE 0 END) a

    CROSS APPLY (

    SELECT TTT_ID = CASE WHEN ChosenRow = 1 THEN CAST(SUBSTRING(CAST(X.TextDescription AS VARCHAR(100)),13,10) AS INT) END) b

    WHERE ChosenRow = 1

    )

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thank you both very much. Since this issue occurs after migrating from sql 2000 to sql 2008 I suspect the performance issues which were adressed by Microsoft in SQL 2008 probably changed the execution plan. Unfortunately I have no data on SQL 2000 to compare this. Thanks again for getting more insight in this.

  • Since you are on SQL2008, you can forget about TRY_CAST since this is a SQL2012 feature only.

    On SQL2008 you could use a CASE expression to guard the expression evaluation to avoid this issue.

    Like this:

    UPDATE#t2

    SETTTT_Factureren = 0,

    TTT_FactuurStatus = 0,

    TTT_FactuurNummer = 0

    WHERETTT_ProjectID = 26

    AND TTT_FactuurStatus = 1

    AND TTT_ID NOT IN(

    SELECT

    CASE WHEN X.TextDescription LIKE 'Int.nr.TXT:%'

    THEN CAST(SUBSTRING(CAST(X.TextDescription AS VARCHAR(100)),13,10) as int)

    ELSE -1

    END

    FROM #t1 X

    WHERE X.TextDescription LIKE 'Int.nr.TXT:%'

    )

    Note the use of LIKE which gives much better performance than SUBSTRING in the predicate

    This method is also more efficient than casting TTT_ID to a varchar since it is much faster to compare integers than strings.

    Good luck

  • Stefan_G (7/31/2013)


    Since you are on SQL2008, you can forget about TRY_CAST since this is a SQL2012 feature only.

    On SQL2008 you could use a CASE expression to guard the expression evaluation to avoid this issue.

    Like this:

    UPDATE#t2

    SETTTT_Factureren = 0,

    TTT_FactuurStatus = 0,

    TTT_FactuurNummer = 0

    WHERETTT_ProjectID = 26

    AND TTT_FactuurStatus = 1

    AND TTT_ID NOT IN(

    SELECT

    CASE WHEN X.TextDescription LIKE 'Int.nr.TXT:%'

    THEN CAST(SUBSTRING(CAST(X.TextDescription AS VARCHAR(100)),13,10) as int)

    ELSE -1

    END

    FROM #t1 X

    WHERE X.TextDescription LIKE 'Int.nr.TXT:%'

    )

    Note the use of LIKE which gives much better performance than SUBSTRING in the predicate

    This method is also more efficient than casting TTT_ID to a varchar since it is much faster to compare integers than strings.

    Good luck

    That's pretty much how I saw it too, Stefan 😉

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thanks again. Learned a lot.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply