Alternate Way to Write query: ISNULL function causing issues

  • Dear All,

    Our dev has given some deployment and it has a query that I could point to which looks non-sargeable. Here is a sample query:

    select top 10 pm.pdenddate from GlobalPD.PDDataTimeSeriesPE_tmp_01 pt

    inner join pddata.Globalpd.PdDataMarketSignals_Recal_tbl pm(nolock)

    on pm.pddataid=pt.pddataid

    and pm.asofdate=pt.asofdate

    and isnull(pm.pdenddate,'01/01/1900')<>isnull(pt.pdenddate,'01/01/1900')

    where pm.asofdate>='2000-01-01';

    One of the tables in question are fairly big(400 GB) and another is relatively small,at 500 MB. Earlier both were plain heaps, I created some indexes on them but I suspect that the ISNULL function used above renders them almost useless. Is there away I can advise them to re-write this and not break the logic at the same time. Attached are table schemas in question..

    Regards

    Chandan Jha

  • Quick thought, the isnull statement is redundant ad null is never equal to null. It will on the other hand impede performance as the resulting execution plan will use scans rather than seeks.

    😎

  • Eirikur Eiriksson (6/6/2016)


    Quick thought, the isnull statement is redundant ad null is never equal to null. It will on the other hand impede performance as the resulting execution plan will use scans rather than seeks.

    😎

    It's not in this case. If the pm.pdenddate is NULL and the pt.pdenddate is not NULL or '1900-01-01' (or vice versa) this expression will evaluate to TRUE whereas removing the ISNULLs it will evaluate to UNKNOWN.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • So is there no way out of this mess? Is it possible to have a new table with same structure with a computed column where we do a bulk load and let this computed column take a value based on the incoming column.

    I remember reading it somewhere here in one of the forum questions that NULL has no meaning as such and to avoid issues when loading such a table from some source, NULLs should be converted to an year like 1900 or something like that so that when this value is encountered, we know that the value was unknown for this.

    Thanks

    Chandan

  • chandan_jha18 (6/6/2016)


    So is there no way out of this mess? Is it possible to have a new table with same structure with a computed column where we do a bulk load and let this computed column take a value based on the incoming column.

    I remember reading it somewhere here in one of the forum questions that NULL has no meaning as such and to avoid issues when loading such a table from some source, NULLs should be converted to an year like 1900 or something like that so that when this value is encountered, we know that the value was unknown for this.

    Thanks

    Chandan

    Unfortunately, that advice is based on a gross oversimplification. It does make sense to assign default values in SOME cases, but not ALL cases. It makes sense to use 1900-01-01 in SOME cases, but not ALL cases. Specifically, it makes sense to use 1900-01-01 for START dates as long as that start date is the earliest possible date in your database. If your data contains records with start dates before 1900-01-01, then you should use some other value as your start date.

    On the other hand, you should NEVER use 1900-01-01 for END dates. The default end date should always be the latest possible date in your database. Suggested values include 9999-12-30, 9999-01-01, and 9000-01-01, but any date will work as long as it is the latest possible date.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • and NOT (pm.pdenddate=pt.pdenddate

    OR (pm.pdenddate is NULL and pt.pdenddate is NULL)

    )

    _____________
    Code for TallyGenerator

  • And notice that Drew did NOT suggest using 9999-12-31. That's because the classic method of isolating date ranges requires a "AND SomeDTColumn < @EndDate +1" and all similar variations. I personally use a literal of "9999", which translates to 9999-01-01, which also leaves room for adding a month to the end date.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • drew.allen (6/6/2016)


    Eirikur Eiriksson (6/6/2016)


    Quick thought, the isnull statement is redundant ad null is never equal to null. It will on the other hand impede performance as the resulting execution plan will use scans rather than seeks.

    😎

    It's not in this case. If the pm.pdenddate is NULL and the pt.pdenddate is not NULL or '1900-01-01' (or vice versa) this expression will evaluate to TRUE whereas removing the ISNULLs it will evaluate to UNKNOWN.

    Drew

    You are right, thanks for the correction.

    😎

  • Post the query plan (Actual) please. We don't know what your data looks like so we don't know what indexes will be used.

    Also I notice there's no 'ORDER BY' on your TOP - is this deliberate?

  • Sergiy (6/6/2016)


    and NOT (pm.pdenddate=pt.pdenddate

    OR (pm.pdenddate is NULL and pt.pdenddate is NULL)

    )

    This logic doesn't quite match what's given - the pm.pdenddate=pt.pdenddate part excludes all lines where one side only of the = operator is NULL.

    An alternative would be to replace line (in the original query)

    and isnull(pm.pdenddate,'01/01/1900')<>isnull(pt.pdenddate,'01/01/1900')

    with

    AND (

    pm.pdenddate <> pt.pdenddate -- This excludes all lines with a NULL entry in either column, as would the "=" operator

    OR

    (pm.pdenddate IS NULL AND pt.pdenddate IS NOT NULL)

    OR

    (pm.pdenddate IS NOT NULL AND pt.pdenddate IS NULL)

    )

    I've assumed that none of the data has a pdenddate of '01/01/1900'.

    Forum Etiquette: How to post data/code on a forum to get the best helpMake sure that you include code in the appropriate IFCode tags, e.g. <your code here>. You can find the IFCode tags under the INSERT options when you are writing a post.

  • I think both the original code and all the alternatives suggested would prevent the use of an index to resolve the clause using pdenddate. This is due to the <> operator or the use of OR operators.

    I suggest you look at an index on both tables that will work with pddataid and asofdate, and have pdenddate as an included column.

    Also, as DouglasH said, you have a TOP clause without an ORDER BY, which means you could get any 10 of however many rows that qualify. If you need a specific TOP 10, then add an ORDER BY and consider adding those columns to the key fields in the index.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • colin.frame (6/8/2016)


    Sergiy (6/6/2016)


    and NOT (pm.pdenddate=pt.pdenddate

    OR (pm.pdenddate is NULL and pt.pdenddate is NULL)

    )

    This logic doesn't quite match what's given - the pm.pdenddate=pt.pdenddate part excludes all lines where one side only of the = operator is NULL.

    You're missing NOT in front of the opening bracket.

    _____________
    Code for TallyGenerator

  • a) both the things DouglasH said

    b) you mentioned they were heaps and now have indexes; but do they now have clustered indexes? If not then they're still heaps. This may or may not continue to affect performance

    c) I've often found the analyzer to give up and go home after the first date seek on custom-built temporal tables, but you may wish to read up on the various implementations available and try some of them out with your data to see if you can increase read performance. Nothing to lose if you're at the end of your tether; I actually spend most of my.development time tweaking the structures, indexes and queries around each other until I get the most acceptable balance between storage and performance (though I may be guilty of overdoing it just to bleed the last few milliseconds out)

    d) I'm a recent convert to INTERSECT for such cases due to the equal evaluation of NULL pointers and have seen improvements on some (not all) plans; give the below snippet a go in case it's an easy win:

    isnull(pm.pdenddate,'01/01/1900')<>isnull(pt.pdenddate,'01/01/1900')

    to

    not exists (select pm.pdenddate intersect select pt.pdenddate)

    Forgot to mention e) as a best practice, avoid such date formats in code. 'yyyy-MM-dd' and 'yyyyMMdd' are both ANSI-standard (I believe) and should.be interpreted correctly and unambigously by the engine no matter what your regional or language settings.

  • Forgot to mention e) as a best practice, avoid such date formats in code. 'yyyy-MM-dd' and 'yyyyMMdd' are both ANSI-standard (I believe) and should.be interpreted correctly and unambigously by the engine no matter what your regional or language settings.

    You are correct about the short ISO format (yyyyMMdd). But unfortunately the ODBC canonical format 'yyyy-MM-dd' can still be ambiguous.

    The longer ISO8601 format -- yyyy-mm-ddThh:mi:ss.mmm or yyyy-mm-ddThh:mi:ss.mmmZ -- is interpreted consistently. The following example was derived from this answer on Stack Overflow

    [font="Courier New"]declare @Foo DATETIME

    SET DATEFORMAT YMD

    -- this will be the 1st of March in MDY

    SET @Foo = '2012-03-01'

    SELECT 'YMD: not ISO', @Foo

    SET DATEFORMAT MDY

    -- this will be the 1st of March in MDY

    SET @Foo = '2012-03-01'

    SELECT 'MDY: not ISO', @Foo

    SET DATEFORMAT DMY

    -- this will be the 3rd of january in DMY

    SET @Foo = '2012-03-01'

    SELECT 'DMY: Not ISO', @Foo

    SET DATEFORMAT YDM

    -- this will be the 3rd of january in YDM

    SET @Foo = '2012-03-01'

    SELECT 'YDM: not ISO', @Foo

    /* returns

    YMD: not ISO 2012-03-01 00:00:00.000

    MDY: not ISO 2012-03-01 00:00:00.000

    DMY: Not ISO 2012-01-03 00:00:00.000

    YDM: not ISO 2012-01-03 00:00:00.000

    */

    [/font]

  • simon.barnes (6/8/2016)


    a) both the things DouglasH said

    b) you mentioned they were heaps and now have indexes; but do they now have clustered indexes? If not then they're still heaps. This may or may not continue to affect performance

    c) I've often found the analyzer to give up and go home after the first date seek on custom-built temporal tables, but you may wish to read up on the various implementations available and try some of them out with your data to see if you can increase read performance. Nothing to lose if you're at the end of your tether; I actually spend most of my.development time tweaking the structures, indexes and queries around each other until I get the most acceptable balance between storage and performance (though I may be guilty of overdoing it just to bleed the last few milliseconds out)

    d) I'm a recent convert to INTERSECT for such cases due to the equal evaluation of NULL pointers and have seen improvements on some (not all) plans; give the below snippet a go in case it's an easy win:

    isnull(pm.pdenddate,'01/01/1900')<>isnull(pt.pdenddate,'01/01/1900')

    to

    not exists (select pm.pdenddate intersect select pt.pdenddate)

    Forgot to mention e) as a best practice, avoid such date formats in code. 'yyyy-MM-dd' and 'yyyyMMdd' are both ANSI-standard (I believe) and should.be interpreted correctly and unambigously by the engine no matter what your regional or language settings.

    This suffers the same exact problem that I mentioned here http://www.sqlservercentral.com/Forums/FindPost1792490.aspx

    DECLARE @comparisons TABLE (

    pm_end_date DATE,

    pt_end_date DATE

    )

    INSERT @comparisons(pm_end_date, pt_end_date)

    VALUES(NULL, NULL),

    (NULL, '1900'),

    ('1900', NULL),

    ('1900', '1900')

    SELECT *

    FROM @comparisons c

    SELECT *

    FROM @comparisons c

    WHERE ISNULL(c.pm_end_date,'1900') <> ISNULL(c.pt_end_date, '1900')

    SELECT *

    FROM @comparisons c

    WHERE NOT EXISTS ( SELECT c.pm_end_date INTERSECT SELECT c.pt_end_date )

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply