empty DB columns are not ''NOT NULL'' so search for not null fails

  • How do I determin if a column has USABLE data in it?

    I have a database with columns that apear to have been filled with invisible data, so a search for 'NOT NULL' doesn't work.

    my code is something  like:

     select ...where field IS NOT NULL

    but it is bringing up everything because the fields have become 'NOT NULL' even tho nothing has ever been entered.

  • Hi Darkmunk,

    I have this problem in my third party application. If a user clears a field, it leaves an empty string in a VARCHAR field, which is not the same as a NULL value.

    For example

    CREATE TABLE blackhole (singularity VARCHAR(5))

    go

    INSERT INTO blackhole(singularity) VALUES('Blair')

    GO

    INSERT INTO blackhole(singularity) VALUES('Brown')

    GO

    INSERT INTO blackhole(singularity) VALUES('')

    GO

    INSERT INTO blackhole(singularity) VALUES(NULL)

    GO

    SELECT * FROM blackhole WHERE singularity IS NOT NULL

    GO

    this will return 'Blair', 'Brown' and '' (empty string)

    SELECT * FROM blackhole WHERE singluarity  IS NOT NULL AND DATALENGTH(singularity)>0

    GO

    This will return only 'Blair' and 'Brown' from the singularity. (Beware however of using functions for SELECT criteria, as they can affect performance by preventing the query optimizer from using making use of indexes.)

    Even worse in my application, clearing a DATETIME field puts a -1 in the field, returning 31-Dec-1899 to the application

    If this is your problem, you could make sure you are putting NULL values in the columns, either manually or by using an AFTER INSERT,UPDATE trigger

    David

    If it ain't broke, don't fix it...

  • Thanks David, that appears to be right on the money.

    My DB is small so functions should be OK, I'll let you know how I get on.

    Mark

  • This can solve this without modifing the data...

    SELECT ...

    FROM table

    WHERE NULLIF(field,'') IS NOT NULL

    ---------------------------------------------
    [font="Verdana"]Nothing is impossible.
    It is just a matter of time and money.[/font]

  • You mean IsNull(field,'') rather than NullIf(field,'').

    NullIf(Field1,Field2) returns Null if the fields are equal.  IsNull(Field1, Field2) returns the second parameter if the first is null.

  • SELECT ...

    FROM table

    WHERE NULLIF(field,'') IS NOT NULL

    or

    SELECT ...

    FROM table

    WHERE ISNULL(field,'') <> ''

    I'm gonna have a beer now...on second thoughts, think I'll have a beer instead, just for a change.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • In this case I think NULLIF() would be used rather than ISNULL() because it seems likely that the VARCHARs are empty strings (i.e. a known value), which is not the same as being NULL (unknown).

    I think I need a beer as well, but since it's only 8am, maybe I'll wait a couple of hours

    As for modifying the data, maybe functions will not impact query performance in a small database, so using ISNULL(), NULLIF() OR DATALENGTH() will not cause any problems unless you are planning to grow the database substantially.

    David

    If it ain't broke, don't fix it...

  • Just a minor point on this.  Using a scalar function in your where clause like this will force a table scan.  Since the query optimizer cannot anticipate the order of items in an index after they have been run through a function like ISNULL, it has to scan every record.  If you have an index on the field that has NULLS and empty strings, you will get better performance with something like:

    SELECT * FROM MyTable WHERE MyField IS NULL OR MyField = ''

     

  • No, David... JacekO did it just right... try it and see...

    --===== Declare and populate a test table

    DECLARE @test-2 TABLE (RowNum INT IDENTITY(1,1), SomeString VARCHAR(10))

     INSERT INTO @test-2 (SomeString)

     SELECT 'A' UNION ALL

     SELECT 'B' UNION ALL

     SELECT ''  UNION ALL

     SELECT ' ' UNION ALL

     SELECT '     ' UNION ALL

     SELECT NULL UNION ALL

     SELECT 'G'

    --===== Show what's in the test table

     SELECT *

       FROM @test-2

    --===== Do the NULLIF test that JacekO suggested

     SELECT *

       FROM @test-2

      WHERE NULLIF(SomeString,'') IS NOT NULL

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Well, maybe an Index Scan... but you're (almost) exactly correct... the OR in this case will certainly allow an Index SEEK to occur if the correct index is available.  

    I say "almost" only because the OP wanted to return data, not rows missing data and your query should probably look more like this ...

     SELECT *

       FROM YourTable

      WHERE YourColumn <> ''

        AND YourColumn IS NOT NULL

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi,

    We have the same issue.  Notice that

    SELECT * FROM blackhole WHERE DATALENGTH(singularity)>0

    will give you the same result since DATALENGTH(NULL) is not greater than 0

    If you also want to eliminate fields containing only blanks ( this is our situation ) you can use the following query where the expression you are testing is a string expression

    SELECT * FROM blackhole WHERE LEN(singularity)>0

     



    Terri

    To speak algebraically, Mr. M. is execrable, but Mr. C. is
    (x+1)-ecrable.
    Edgar Allan Poe
    [Discussing fellow writers Cornelius Mathews and William Ellery Channing.]

  • That too, will work, but as Chris Morris pointed out, any time you do a calculation/function on a column in the WHERE clause, you will decrease performance (A LOT!!) if a useable index is available because the best you will be able to do is an Index SCAN... it will never do an Index SEEK.  Note, that in the examples below, the Index Scans take just as long as the Table Scans so going for the Index Seek becomes even more important.

    Further, if SELECT * is used instead of SELECT indexedcolumnname with a calc/function on a column in the WHERE clause, you end up forcing table scans where the method that Chris Morris pointed out STILL uses an Index Seek (although a Bookmark Lookup does get involved which is a twice as slow as just an Index Seek).  That's also "loose proof" as to why you should never use SELECT *... always select just targeted columns.

    Here's the code to backup all of what I've said above... run this with the "Show Execution Plan" turned on...

    --===== Declare and populate a test table

     CREATE TABLE #Test (RowNum INT IDENTITY(1,1), SomeString VARCHAR(10))

     INSERT INTO #Test (SomeString)

     SELECT 'A' UNION ALL

     SELECT 'B' UNION ALL

     SELECT ''  UNION ALL

     SELECT ' ' UNION ALL

     SELECT '     ' UNION ALL

     SELECT NULL UNION ALL

     SELECT 'G'

    --===== Create an index to demo the solutions/problems

     CREATE INDEX Test_SomeString ON #Test (SomeString)

    --===== Show what's in the test table

     SELECT *

       FROM #Test

    --===============================================================================

    --      Test for index usage for "targeted" columns

    --===============================================================================

    --===== NULLIF test for JacekO's code (INDEX SCAN)

     SELECT SomeString

       FROM #Test

      WHERE NULLIF(SomeString,'') IS NOT NULL

    --===== LEN test for Terri's code (INDEX SCAN)

     SELECT SomeString

       FROM #Test

      WHERE LEN(SomeString) > 0

    --===== The correct way (INDEX SEEK)

     SELECT SomeString

       FROM #Test

      WHERE SomeString <> ''

        AND SomeString IS NOT NULL

    --===============================================================================

    --      Test for index usage for SELECT *

    --===============================================================================

    --===== NULLIF test for JacekO's code (TABLE SCAN)

     SELECT *

       FROM #Test

      WHERE NULLIF(SomeString,'') IS NOT NULL

    --===== LEN test for Terri's code (TABLE SCAN)

     SELECT *

       FROM #Test

      WHERE LEN(SomeString) > 0

    --===== The correct way (Index Seek/BookMark Lookup)

     SELECT *

       FROM #Test

      WHERE SomeString <> ''

        AND SomeString IS NOT NULL

    DROP TABLE #Test

    KEYWORDS:

    WHY SELECT * IS BAD

    USING INDEXES CORRECTLY

    FORMULAS IN WHERE CLAUSES

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • You give me way too much credit Mr Moden, and on a wednesday too, when I get to work from home

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Jeff,

    Thanks for the insight on Index usage and formulas in where clauses.  I guess I know better than to use SELECT * in production code.

    If we are comparing a non-indexed field then the table scan will happen in any case so would there be a benefit to not using the function?



    Terri

    To speak algebraically, Mr. M. is execrable, but Mr. C. is
    (x+1)-ecrable.
    Edgar Allan Poe
    [Discussing fellow writers Cornelius Mathews and William Ellery Channing.]

  • Nope... if you don't have an index, using the function or not has no effect on speed... both are the same for the most part.  Only difference is if you are planning for future scalability...

    ... guess that's the key, though.. "planning for future scalability"... it doesn't cost more time if there's no index and you write it without the function... it does cost more time to find everywhere a function has been used and needs to be "split" if and when you do add an index.  And, going one step further, you may add an index to "cover" a process that's totally unrelated... if it just happens to be right for an unrelated process that doesn't use functions as we discussed, it will "auto-magically" begin to use the index and will suddenly begin performing better.  It's kind of like winning the lottery... "You can't win if you don't play"

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply