June 22, 2007 at 7:39 am
How do I determin if a column has USABLE data in it?
I have a database with columns that apear to have been filled with invisible data, so a search for 'NOT NULL' doesn't work.
my code is something like:
select ...where field IS NOT NULL
but it is bringing up everything because the fields have become 'NOT NULL' even tho nothing has ever been entered.
June 22, 2007 at 8:19 am
Hi Darkmunk,
I have this problem in my third party application. If a user clears a field, it leaves an empty string in a VARCHAR field, which is not the same as a NULL value.
For example
CREATE TABLE blackhole (singularity VARCHAR(5))
go
INSERT INTO blackhole(singularity) VALUES('Blair')
GO
INSERT INTO blackhole(singularity) VALUES('Brown')
GO
INSERT INTO blackhole(singularity) VALUES('')
GO
INSERT INTO blackhole(singularity) VALUES(NULL)
GO
SELECT * FROM blackhole WHERE singularity IS NOT NULL
GO
this will return 'Blair', 'Brown' and '' (empty string)
SELECT * FROM blackhole WHERE singluarity IS NOT NULL AND DATALENGTH(singularity)>0
GO
This will return only 'Blair' and 'Brown' from the singularity. (Beware however of using functions for SELECT criteria, as they can affect performance by preventing the query optimizer from using making use of indexes.)
Even worse in my application, clearing a DATETIME field puts a -1 in the field, returning 31-Dec-1899 to the application
If this is your problem, you could make sure you are putting NULL values in the columns, either manually or by using an AFTER INSERT,UPDATE trigger
David
If it ain't broke, don't fix it...
June 22, 2007 at 8:29 am
Thanks David, that appears to be right on the money.
My DB is small so functions should be OK, I'll let you know how I get on.
Mark
June 22, 2007 at 1:28 pm
This can solve this without modifing the data...
SELECT ...
FROM table
WHERE NULLIF(field,'') IS NOT NULL
---------------------------------------------
[font="Verdana"]Nothing is impossible.
It is just a matter of time and money.[/font]
June 25, 2007 at 10:06 am
You mean IsNull(field,'') rather than NullIf(field,'').
NullIf(Field1,Field2) returns Null if the fields are equal. IsNull(Field1, Field2) returns the second parameter if the first is null.
June 25, 2007 at 10:29 am
SELECT ...
FROM table
WHERE NULLIF(field,'') IS NOT NULL
or
SELECT ...
FROM table
WHERE ISNULL(field,'') <> ''
I'm gonna have a beer now...on second thoughts, think I'll have a beer instead, just for a change.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 26, 2007 at 1:20 am
In this case I think NULLIF() would be used rather than ISNULL() because it seems likely that the VARCHARs are empty strings (i.e. a known value), which is not the same as being NULL (unknown).
I think I need a beer as well, but since it's only 8am, maybe I'll wait a couple of hours
As for modifying the data, maybe functions will not impact query performance in a small database, so using ISNULL(), NULLIF() OR DATALENGTH() will not cause any problems unless you are planning to grow the database substantially.
David
If it ain't broke, don't fix it...
June 26, 2007 at 6:07 am
Just a minor point on this. Using a scalar function in your where clause like this will force a table scan. Since the query optimizer cannot anticipate the order of items in an index after they have been run through a function like ISNULL, it has to scan every record. If you have an index on the field that has NULLS and empty strings, you will get better performance with something like:
SELECT * FROM MyTable WHERE MyField IS NULL OR MyField = ''
June 26, 2007 at 6:10 pm
No, David... JacekO did it just right... try it and see...
--===== Declare and populate a test table
DECLARE @test-2 TABLE (RowNum INT IDENTITY(1,1), SomeString VARCHAR(10))
INSERT INTO @test-2 (SomeString)
SELECT 'A' UNION ALL
SELECT 'B' UNION ALL
SELECT '' UNION ALL
SELECT ' ' UNION ALL
SELECT ' ' UNION ALL
SELECT NULL UNION ALL
SELECT 'G'
--===== Show what's in the test table
SELECT *
FROM @test-2
--===== Do the NULLIF test that JacekO suggested
SELECT *
FROM @test-2
WHERE NULLIF(SomeString,'') IS NOT NULL
--Jeff Moden
Change is inevitable... Change for the better is not.
June 26, 2007 at 6:21 pm
Well, maybe an Index Scan... but you're (almost) exactly correct... the OR in this case will certainly allow an Index SEEK to occur if the correct index is available.
I say "almost" only because the OP wanted to return data, not rows missing data and your query should probably look more like this ...
SELECT *
FROM YourTable
WHERE YourColumn <> ''
AND YourColumn IS NOT NULL
--Jeff Moden
Change is inevitable... Change for the better is not.
June 27, 2007 at 7:21 am
Hi,
We have the same issue. Notice that
SELECT * FROM blackhole WHERE DATALENGTH(singularity)>0
will give you the same result since DATALENGTH(NULL) is not greater than 0
If you also want to eliminate fields containing only blanks ( this is our situation ) you can use the following query where the expression you are testing is a string expression
SELECT * FROM blackhole WHERE LEN(singularity)>0
Terri
To speak algebraically, Mr. M. is execrable, but Mr. C. is
(x+1)-ecrable.
Edgar Allan Poe
[Discussing fellow writers Cornelius Mathews and William Ellery Channing.]
June 27, 2007 at 7:49 am
That too, will work, but as Chris Morris pointed out, any time you do a calculation/function on a column in the WHERE clause, you will decrease performance (A LOT!!) if a useable index is available because the best you will be able to do is an Index SCAN... it will never do an Index SEEK. Note, that in the examples below, the Index Scans take just as long as the Table Scans so going for the Index Seek becomes even more important.
Further, if SELECT * is used instead of SELECT indexedcolumnname with a calc/function on a column in the WHERE clause, you end up forcing table scans where the method that Chris Morris pointed out STILL uses an Index Seek (although a Bookmark Lookup does get involved which is a twice as slow as just an Index Seek). That's also "loose proof" as to why you should never use SELECT *... always select just targeted columns.
Here's the code to backup all of what I've said above... run this with the "Show Execution Plan" turned on...
--===== Declare and populate a test table
CREATE TABLE #Test (RowNum INT IDENTITY(1,1), SomeString VARCHAR(10))
INSERT INTO #Test (SomeString)
SELECT 'A' UNION ALL
SELECT 'B' UNION ALL
SELECT '' UNION ALL
SELECT ' ' UNION ALL
SELECT ' ' UNION ALL
SELECT NULL UNION ALL
SELECT 'G'
--===== Create an index to demo the solutions/problems
CREATE INDEX Test_SomeString ON #Test (SomeString)
--===== Show what's in the test table
SELECT *
FROM #Test
--===============================================================================
-- Test for index usage for "targeted" columns
--===============================================================================
--===== NULLIF test for JacekO's code (INDEX SCAN)
SELECT SomeString
FROM #Test
WHERE NULLIF(SomeString,'') IS NOT NULL
--===== LEN test for Terri's code (INDEX SCAN)
SELECT SomeString
FROM #Test
WHERE LEN(SomeString) > 0
--===== The correct way (INDEX SEEK)
SELECT SomeString
FROM #Test
WHERE SomeString <> ''
AND SomeString IS NOT NULL
--===============================================================================
-- Test for index usage for SELECT *
--===============================================================================
--===== NULLIF test for JacekO's code (TABLE SCAN)
SELECT *
FROM #Test
WHERE NULLIF(SomeString,'') IS NOT NULL
--===== LEN test for Terri's code (TABLE SCAN)
SELECT *
FROM #Test
WHERE LEN(SomeString) > 0
--===== The correct way (Index Seek/BookMark Lookup)
SELECT *
FROM #Test
WHERE SomeString <> ''
AND SomeString IS NOT NULL
DROP TABLE #Test
KEYWORDS:
WHY SELECT * IS BAD
USING INDEXES CORRECTLY
FORMULAS IN WHERE CLAUSES
--Jeff Moden
Change is inevitable... Change for the better is not.
June 27, 2007 at 7:58 am
You give me way too much credit Mr Moden, and on a wednesday too, when I get to work from home
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 27, 2007 at 10:09 am
Jeff,
Thanks for the insight on Index usage and formulas in where clauses. I guess I know better than to use SELECT * in production code.
If we are comparing a non-indexed field then the table scan will happen in any case so would there be a benefit to not using the function?
Terri
To speak algebraically, Mr. M. is execrable, but Mr. C. is
(x+1)-ecrable.
Edgar Allan Poe
[Discussing fellow writers Cornelius Mathews and William Ellery Channing.]
June 27, 2007 at 4:52 pm
Nope... if you don't have an index, using the function or not has no effect on speed... both are the same for the most part. Only difference is if you are planning for future scalability...
... guess that's the key, though.. "planning for future scalability"... it doesn't cost more time if there's no index and you write it without the function... it does cost more time to find everywhere a function has been used and needs to be "split" if and when you do add an index. And, going one step further, you may add an index to "cover" a process that's totally unrelated... if it just happens to be right for an unrelated process that doesn't use functions as we discussed, it will "auto-magically" begin to use the index and will suddenly begin performing better. It's kind of like winning the lottery... "You can't win if you don't play"
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply