indexing a converted datetime column

  • Jeff Moden (7/31/2011)


    Eric, tell the folks who made the request that you need to make an alternate suggestion because of the dangers of users having access to formatted dates.

    Consider the following alternative solution, please...

    Let's say you're table looks like the following... it only takes a couple of seconds to build if you want to try this. I suggest you do this testing in TempDB.

    /**********************************************************************************************************************

    Purpose:

    Create a voluminous test table with various types of highly randomized data.

    --Jeff Moden

    **********************************************************************************************************************/

    --===== Conditionally drop the test table to make reruns easier

    IF OBJECT_ID('dbo.JBMTest','U') IS NOT NULL

    DROP TABLE dbo.JBMTest

    ;

    --===== Create and populate a 1,000,000 row test table.

    -- "SomeID" has a range of 1 to 1,000,000 unique numbers

    -- "SomeInt" has a range of 1 to 50,000 non-unique numbers

    -- "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings

    -- "SomeMoney has a range of 0.00 to 100.00 non-unique numbers

    -- "SomeDateTime" has a range of >=01/01/2000 and <01/01/2020 non-unique date/times

    -- "SomeDate" has a range of >=01/01/2000 and <01/01/2020 non-unique "whole dates" as date format 101

    -- "SomeName" contains random characters at random lengths from 2 to 20 characters

    SELECT TOP 1000000

    SomeID = IDENTITY(INT,1,1),

    SomeInt = ABS(CHECKSUM(NEWID())) % 50000 + 1,

    SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID())) % (ASCII('Z')-ASCII('A')+1) +ASCII('A'))

    + CHAR(ABS(CHECKSUM(NEWID())) % (ASCII('Z')-ASCII('A')+1) +ASCII('A')),

    SomeMoney = CAST(RAND(CHECKSUM(NEWID())) * 100 AS DECIMAL(9,2)), --Note rounding

    SomeDateTime = RAND(CHECKSUM(NEWID())) * DATEDIFF(dd,'2000','2020') + CAST('2000' AS DATETIME),

    SomeDate = CONVERT(CHAR(10), ABS(CHECKSUM(NEWID())) % DATEDIFF(dd,'2000','2020') + CAST('2000' AS DATETIME), 101),

    SomeName = RIGHT(NEWID(),ABS(CHECKSUM(NEWID())) % 19 + 2)

    INTO dbo.JBMTest

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    ;

    ALTER TABLE dbo.JBMTest

    ADD CONSTRAINT PK_JBMTest PRIMARY KEY CLUSTERED (SomeID) WITH FILLFACTOR = 90

    ;

    Now, ask the user which columns they're going to want to return along with the searchable date and make a deterministic view of those columns following the rules for indexable views. Here's where I chose 3 columns (you can chose more). The index needs to be unique and clustered so I put the date first for the clustering and the SomeID column in for uniqueness... the SET commands follow all the rules for making an indexed view, as well.

    SET ANSI_NULLS ON;

    SET ANSI_PADDING ON;

    SET ANSI_WARNINGS ON;

    SET CONCAT_NULL_YIELDS_NULL ON;

    SET QUOTED_IDENTIFIER ON;

    SET NUMERIC_ROUNDABORT OFF;

    --DROP VIEW dbo.JBMTest01

    GO

    CREATE VIEW dbo.JBMTest01 WITH SCHEMABINDING AS

    SELECT TheDate = CONVERT(DATETIME, SomeDate, 101), SomeID, SomeMoney

    FROM dbo.JBMTest;

    GO

    CREATE UNIQUE CLUSTERED INDEX IXV_JBMTest01_TheDate

    ON dbo.JBMTest01 (TheDate, SomeID);

    Then, write a query to use the view and teach the user how to use it. Don't forget to GRANT SELECT privs on the view to the group the user is in (or the user if you don't use groups but you really should be using groups)

    The side benefit of using an indexed view like this is that the user is never exposed to the dangers of that bloody formatted date column because, if your security is setup correctly, the user can't currently see the table.

    Since the view get's you out of the woods on the request, you'll be tempted to forget the rest of what needs to be done. Don't. The next thing to do is to find out what the impact of changing the datatype on that formatted date column would be and see if you can change it. You'll get some resistance because that's human nature but, if it were me, there'd be a full-court press to get it changed because it'll also cause problems down the road. It's what DBA's do. 😉

    Wow! Why introduce an indexed view here? The list is long of restrictions and limitations. I won't go into all of them but the big ones are the restrictions they impose towards making schema changes to the underlying tables (i.e. lookup WITH SCHEMABINDING) and the fact that they are not optimized as first-class citizens in Standard Edition. IMHO a solution leveraging an indexed persisted computed column and a [regular] view is a much better choice here.

    Consider that with an indexed persisted computed column a view, indexed or otherwise, may not be necessary at all if:

    1. It is OK for the user to see both columns.

    2. The user MUST see both columns.

    3. We use column-level security to prevent the user from accessing some columns.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • I'll throw in another potential reason why a persisted indexed computed column would be good for this.

    Let's take your test table Jeff and see what the user's current query will do:

    SELECT *

    FROM dbo.JBMTest

    WHERE CONVERT(CHAR(10), SomeDateTime, 101) = '10/22/2011' ;

    Clustered Index Scan (i.e. table scan), no surprise there.

    Now let's add just the computed column to it:

    ALTER TABLE dbo.JBMTest

    ADD SomeDateTime_mmddyyyy AS (CONVERT(CHAR(10),SomeDateTime,101)) ;

    After adding just the computed column the plan still has a scan, but now we get a missing index recommendation:

    Interesting...

    Now let's add our index:

    CREATE INDEX [ix_dbo.JBMTest.SomeDateTime_mmddyyyy]

    ON dbo.JBMTest (SomeDateTime_mmddyyyy) ;

    Now let's see what happens when we run that same SELECT...notice we still have not changed the query to reference the computed column:

    We still have a lookup because the index is not covering our query, but we got a seek! How?!?

    If our hands are really tied and the user cannot change ANYTHING about the query, including the column they're referencing, then using an index on a computed column can obviate the need for a query change. This technique has lots of uses. It can come in handy for solving issues with third party apps, stubborn users who cannot be made to change and many other things.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • I believe you used the wrong column in your examples for this problem. If we're still talking about the original problem, then SomeDate in JBMTest is the notorious formatted date column... not SomeDateTime.

    And, yes, the computed column thing would work quite nicely but I wouldn't convert TO a formatted date... remember that the original problem was to try to put an index on a formatted date column. Instead, I'd covert FROM the formatted date column TO a DATETIME computed column.

    Also, aren't you presuming you can change the table by adding a computed column?

    So, here's the line up and we've got all the bases covered between the two of us...

    First choice... convert the column in the table from a formatted date column to a datetime.

    Second choice... add a computed column that does the conversion FROM a formatted date TO a DATETIME and index that.

    Third choice... if absolutely no changes can be made to the table, create an indexed view.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • opc.three (8/1/2011)


    Wow! Why introduce an indexed view here? The list is long of restrictions and limitations.

    As you said, "I'm just trying to solve the problem at hand." and protect the user. The indexed view would be a "way out" if you truly couldn't make ANY changes to the original table including adding a computed column. As I also stated, it absolutely guarantees that the user will not be exposed to a formatted date column.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (8/1/2011)


    I believe you used the wrong column in your examples for this problem. If we're still talking about the original problem, then SomeDate in JBMTest is the notorious formatted date column... not SomeDateTime.

    Hmm, all my examples use SomeDateTime :ermm:

    And, yes, the computed column thing would work quite nicely but I wouldn't convert TO a formatted date... remember that the original problem was to try to put an index on a formatted date column.

    Adding an index to a non-persisted computed column does this, in effect. See my post with the execution plans.

    Also, aren't you presuming you can change the table by adding a computed column?

    Indeed I am. That's why I said I wished SQL Server supported "Expression Indexes". It gives us the "effect" I referenced above, but without having to add a column to the table.

    So, here's the line up and we've got all the bases covered between the two of us...

    First choice... convert the column in the table from a formatted date column to a datetime.

    As I understood it the column in the table per the OP is already a DATETIME. I would leave it that way.

    Second choice... add a computed column that does the conversion FROM a formatted date TO a DATETIME and index that.

    Somewhere along the line I think you zigged and I zagged. With an index on a computed column a "formatted date column" is only persisted to the index. The original DATETIME column ("column_name" in the OP) stays as a DATETIME.

    Third choice... if absolutely no changes can be made to the table, create an indexed view.

    +1 if schema changes are not allowed.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • apparently you are living in a world where users have a fixed UI date time representation.

    ( so far for "personal" computer )

    Alter these queries so you don't need the function on the datetime column and put an index on that datetime column.

    Where datetime_column >= @yourdate00h and datetime_column < dateadd(dd,1, @yourdate00h)

    Stick with the original datatype as long as possible ! ( in any case )

    Especially with datetime datatype, it brings you very good functions, ...

    Don't put a function on a column in a where clause, if you can easily ship it to the other side of the equation.

    because it may even not use available indexes at all.

    If this is a vendor delivered app. ... please forward all these listed arguments of this thread.;-)

    No excuse.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • ALZDBA (8/1/2011)


    apparently you are living in a world where users have a fixed UI date time representation.

    ( so far for "personal" computer )

    Alter these queries so you don't need the function on the datetime column and put an index on that datetime column.

    Where datetime_column >= @yourdate00h and datetime_column < dateadd(dd,1, @yourdate00h)

    Stick with the original datatype as long as possible ! ( in any case )

    Especially with datetime datatype, it brings you very good functions, ...

    Don't put a function on a column in a where clause, if you can easily ship it to the other side of the equation.

    because it may even not use available indexes at all.

    If this is a vendor delivered app. ... please forward all these listed arguments of this thread.;-)

    No excuse.

    All valid points I agree with 100% except for the last two words. All things being the way we want them, agred 100%, but "it depends" is more appropriate at large.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (8/1/2011)


    ALZDBA (8/1/2011)


    ...

    No excuse.

    All valid points I agree with 100% except for the last two words. All things being the way we want them, agred 100%, but "it depends" is more appropriate at large.

    I guess I lost focus and got to harsh with these last two words. :blush:

    Indeed it is not our habit to think strictly right/wrong.

    It is and will always be a matter of education and development invest to think in terms of performance at production time of the data system.

    I would expect any software development company to deliver state of the art systems, not only front end, but for sure at back end level.

    A dream to be pursued.:rolleyes:

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • opc.three (8/1/2011)


    Somewhere along the line I think you zigged and I zagged.

    BWAAA-HAAA!!! No... it's not you. I just went back and reread the original post and see that, based on the title of the post, I totally missed the original intent. I read it as having only a formatted column in the table.

    Don't mind the wierdo with the beer popsicle quiving in the corner... it's just me. How embarassing... :blush:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 16 through 24 (of 24 total)

You must be logged in to reply to this topic. Login to reply