Check the Variable is Empty or Null

  • TomThomson (9/11/2014)


    Eirikur Eiriksson (9/11/2014)


    Lets add some spanners into the mix

    Although the first two chunks indicate clearly that NULL is ordered low by ORDER BY, they don't indicate anything else.

    The "spanners" added for the third chunk are nothing to do with (T-)SQL but with the presentation chosen by SSMS. The strings char(32), char(0), and '' are all given the same visible presentation form (absence of any visible character). That tells us absolutely nothing new about ordering of strings including nulls.

    Thank you for this Tom, my hasty responses aren't favouring my argument.

    NULL is never compared to a value of any type in SQL Server, applying a NULLable column on either side of a logical operator can therefore be considered as a shorthand for " and col is not null". As an example, when inspecting an execution plan for a trivial statement of "where x > 1 and x <= 1", the statement is translated to "where x is not null", very logical as NULL is not a value. No argument there.

    As for the OP goes, the statement "WHERE X > ''" is a logical equivalent to "WHERE X > '' AND X IS NOT NULL".

    The discrepancies in SQL Server NULL handling are evident when using ranking functions and window specifications in an over clause, don't think this thread is the right place for that discussion.

    😎

  • Luis mentioned that in a WHERE clause, something like the following would be non-SARGable:

    WHERE ISNULL(MyNullableCol,'') != ''

    Is that non-SARGable purely because it contains a function (in this case the ISNULL() function)? If so, would the following be SARGable?

    WHERE MyNullableCol IS NOT NULL AND MyNullableCol != ''

    What is the easiest way of confirming whether a query was SARGable? Do you have to look at something in the execution plan?

    By the way, regarding the debate about nulls just being something really small:

    DECLARE @user varchar(30)

    SET @user = NULL

    IF @user <= 'something'

    BEGIN

    SELECT 'those saying null is regarded as the smallest value by SQL Server are correct (you will not see this result)'

    END

    ELSE

    BEGIN

    SELECT 'many people write risky code that can easily be broken by a colleague trying to modify it, such as attempting to invert the results of an IF'

    END

  • ...

    What is the easiest way of confirming whether a query was SARGable? Do you have to look at something in the execution plan?

    ...

    Create a temp table with single column (varchar, for example)

    Then create an index on this column.

    Then write simple select query with where clause using this column

    Select "show plan" (Ctrl+M) and execute one.

    Without ISNULL, you will see that Index Seek will be performed

    Using ISNULL will endup using scan.

    Someone, somewhere, claimed that SQL Server might be smart enough to disregard using INSULL and translate query in such way that it will still use index.

    I have tried few different variations, but failed to see that happening. As soon as you use ISNULL - you get scan

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Thanks Eugene, I just tried your test method and confirmed that this method is SARGable:

    WHERE MyNullableCol IS NOT NULL AND MyNullableCol != ''

    so there are clearly situations where it is better to use the syntax above than the function ISNULL(MyNullableCol,'') != ''

    Perhaps internally SQL Server treats ISNULL() like any other scalar function, so it prevents the optimiser from being really efficient and has to do more stuff separately for each row.

  • gward 98556 (9/12/2014)


    Thanks Eugene, I just tried your test method and confirmed that this method is SARGable:

    WHERE MyNullableCol IS NOT NULL AND MyNullableCol != ''

    so there are clearly situations where it is better to use the syntax above than the function ISNULL(MyNullableCol,'') != ''

    Perhaps internally SQL Server treats ISNULL() like any other scalar function, so it prevents the optimiser from being really efficient and has to do more stuff separately for each row.

    Sure, but I still prefer a single check. Note that the code should include the comments to keep clarity on what it is doing.

    WHERE MyNullableCol != '' --This will avoid NULLs and empty strings

    If you want to do the opposite (look for NULLs and empty strings), you need both conditions.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • On tenuously related subjects (the use of 'elegant' code, implicit conversions and SARGable queries), one of my favorite bugs is the following code;

    SELECT ...

    FROM <Table 1> AS a WITH (nolock) INNER JOIN

    <Table 2> AS b WITH (nolock) ON a.GUID_Field = b.VARCHAR_Field

    WHERE a.GUID_Field = '<GUID>'

    This is used because <Table 2> has a field that was originally intended to store pointers to different types of data depending on other information and so it is using a VARCHAR field which can store GUIDs or Integers as needed. However, in order to link to the GUID_field in <Table 1>, it has to convert the entirety of VARCHAR_Field to a UNIQUEIDENTIFIER type which usually means it will fail if there is any non GUID compatible data in that field and, even if it works (because the developer changed their mind and so only GUID data is actually stored in there), it will take FOREVER to get the data since it has to do a Table Scan since it cannot use the Index built on the field.

    In practice, you can eliminate the problem by selecting from <Table 2> instead of <Table 1>, if that is an available option, so that the VARCHAR_Field is identified and then the GUID_Field records are selected based on that since it can then use the Index on GUID_Field.

    Still, it would have been better for the original developer to have used multiple fields for the different types of data that might be used as links. Not quite as elegant or flexible, but more reliable and efficient.

    Nowadays, when I am explaining my code to someone and I find myself saying "... so I had a clever idea and decided to ...", I start hearing alarm bells and have to review the code to see if there is a less clever solution instead !

  • SimonHolzman (9/12/2014)


    This is used because <Table 2> has a field that was originally intended to store pointers to different types of data depending on other information and so it is using a VARCHAR field which can store GUIDs or Integers as needed. However, in order to link to the GUID_field in <Table 1>, it has to convert the entirety of VARCHAR_Field to a UNIQUEIDENTIFIER type which usually means it will fail if there is any non GUID compatible data in that field and, even if it works (because the developer changed their mind and so only GUID data is actually stored in there), it will take FOREVER to get the data since it has to do a Table Scan since it cannot use the Index built on the field.

    That's just bad design and as you said, it will only give performance problems even if it creates "simplified code".

    SimonHolzman (9/12/2014)


    In practice, you can eliminate the problem by selecting from <Table 2> instead of <Table 1>, if that is an available option, so that the VARCHAR_Field is identified and then the GUID_Field records are selected based on that since it can then use the Index on GUID_Field.

    Changing the order of the tables won't do anything for performance. Remember that SQL is a declarative language and SQL Server will decide the order to process the query depending on the costs.

    SimonHolzman (9/12/2014)


    Nowadays, when I am explaining my code to someone and I find myself saying "... so I had a clever idea and decided to ...", I start hearing alarm bells and have to review the code to see if there is a less clever solution instead !

    Reviewing the code and testing is important with any development (not only "clever ideas").

    Finally, you might want to stop using NOLOCK hints, unless your users don't care about data consistency. It's not a "go-fast" option and can cause major issues. Some references:

    http://blogs.msdn.com/b/davidlean/archive/2009/04/06/sql-server-nolock-hint-other-poor-ideas.aspx

    http://www.jasonstrate.com/2012/06/the-side-effect-of-nolock/

    http://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/11/10/1280.aspx

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (9/12/2014)


    SimonHolzman (9/12/2014)


    In practice, you can eliminate the problem by selecting from <Table 2> instead of <Table 1>, if that is an available option, so that the VARCHAR_Field is identified and then the GUID_Field records are selected based on that since it can then use the Index on GUID_Field.

    Changing the order of the tables won't do anything for performance. Remember that SQL is a declarative language and SQL Server will decide the order to process the query depending on the costs.

    It is not the order of the tables... it is which table one uses to select the data from. Sometimes you cannot change this, especially if the database is heavily normalized, but sometimes you can. For example, in this case... if you already know the value of the GUID, you can use

    WHERE VARCHAR_Field = '<GUID>'

    instead of

    WHERE GUID_Field = '<GUID>'

    Luis Cazares (9/12/2014)


    Finally, you might want to stop using NOLOCK hints, unless your users don't care about data consistency. It's not a "go-fast" option and can cause major issues.

    Whether we like it or not, most "reporting" gets done from the production database even if you have a reporting database - after all, who wants to know their status as of the end of business yesterday ? I know that there are options such as log shipping (which either disables the reporting database frequently or provides data with a lag time) and replication (which is more promising but is not as well supported by older applications), but each time a customer's information is pulled up, that is basically a report and generally comes from Production. Especially when pulling large data sets, locking CAN be a serious issue since the standard assumption that SQL makes until the transaction is complete is that the user may wish to modify the data that they have selected. On a large report, that can affect a lot of records which slows down the query (both because of the locking activity and because of interference from pre-existing locks) and causes the query to slow down other users because they are waiting for this report to release its locks.

    The (nolock) hint or the READ UNCOMMITTED transaction level ("Dirty Reads" for us old codgers) allows the database to get the data without caring about any existing locks or about whether the data is completely consistent or not. IT IS VERY DANGEROUS if it is used thoughtlessly (which is why I actually prefer the hint so that it has to be actively used on any query it is used on rather than being automatically imposed). However, if it is being used purely to produce information that is accurate as of this moment but which is understood to be possibly changed a second later, it is actually huge benefits - Much faster queries in many cases and many fewer failed updates affecting other users because of spurious locks.

    In reality, the data received from the database with the nolock hint enabled is usually accurate anyway... in most systems, the data is updated because it needs to be updated. As such, if the customer's gender is being changed from Male to Female at the moment someone else is producing a list of all female customers, it does not really matter which value the query returns since it would be different a minute earlier or later anyway. Yes, when transferring money from one account to another, you want the entire process wrapped in a transaction and, when querying how much is in each account, you should be using a SERIALIZABLE query but, for most purposes, this is overkill and causes more issues than it resolves.

    Since most queries do not necessarily result in the need to change any data, it can be argued that it is foolish to have a lock that lasts from when you read the data until you leave it (whether or not you perform an update) since this stops other users performing updates which might be irrelevent to yours and since a system issue (especially on remote systems) can cause the lock to be kept long after the original user stops needing it.

    Instead, if you are performing an update based on data returned by any query, including one with nolock hints, you should basically only perform the update if the current data in the database matches what your session thinks the original data was anyway.

    ie:

    UPDATE <Table>

    SET <Data Field> = <New Value>,

    <Last Update Field> = CASE WHEN <Data Field> = <Old Value> THEN GETDATE()

    ELSE <Last Update Field>

    END

    WHERE <ID Field> = @ID_Field

    AND <Data Field> IN (<New Value>, <Old Value>)

    This avoids the issue of having to maintain a lock just in case an update is needed. It also handles multiple users performing updates much more elegantly since one might be updating data that is irrelevant to the other and, if not, it allows the second user to know what change has already been made if it differs from the change that they were attempting. They can then be prompted to force the update to be made or not to.

  • I guess I confused what you were trying to say with changing the table you select the information.

    As for the second part, we seem to be on the same page. NOLOCK will give consistency errors in the reports and can be very dangerous when used to modify data. As long as the users understand that the same report might not give accurate data, and you avoid it when exact data is required (clients balance or status), then it's up to you. There are other method to avoid blocking and some reports should be fine with information up to yesterday end of business.

    My point was simply not to use it without understanding the consequences.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2

Viewing 9 posts - 31 through 38 (of 38 total)

You must be logged in to reply to this topic. Login to reply