Table Defaults

  • DRI Transition Constraints (http://www.dbazine.com/ofinterest/oi-articles/celko35) - Are the way to go!

    I've used this concept for years and everyone gets it from BAs to BI developers. It's beauty for me is that systems never stand still, and as sure as eggs are eggs, someone going to change it and then all you need is a another row in a table (i.e. no 'coding'). They are also create for populating drop down lists for valid transitions.

     

     

  • Apart from creation dates, I tend to use default constraints for bit flags, like this:

    Active bit NOT NULL DEFAULT 1

    When used in this way, a new record begins life as an active record. Later, when the record is ready to be deleted by the audit process, the flag gets set to 0.

  • I'm working with an app that was designed by developers who are a pretty skilled bunch of guys and who know T-SQL, however their schema leaves a lot to be desired, my view is that they brought me on too late ( I'm essentially here to tune the database and provide effective indexing ) so i can't agree that one size fits all - I think the database guy , me, has a totally different outlook to the developers, I don't claim to be an expert or near, but i see almost every rule being broken - now it's too late to change the schema of course. I have potentially a next contract back where I had a similar situation - they're now rewriting and redesigning, like I said they needed to do cos the app didn't scale, they were the ones with the nulls and defaults of 0 or ''. Know this has got away from defaults a bit - but still interesting.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Great article, thanks for posting!

    I use a DateTimeStamp column on most of my tables too, usually just to know when a record was entered for problem solving purposes and sometimes for other obvious reasons, I'm always torn between using defualts or forcing the t-SQL coder to insert the date at the stored proc level.

    I usually go for the latter simply becuase when entering data say from a txt file into a table you may want all the rows entered for that batch to have the same DateTime value.

    eg

    DECLARE @DateTimeStamp DATETIME

    SET @DateTimeStamp = GETDATE()

    INSERT INTO....

    SELECT x, y...@DateTimeStamp

    FROM Source

    I've never tested it but I've always assumed that if it takes 5 seconds to insert data from the source then the first column might have [19-10-2007 10:00] and the last column inserted might have [19-10-2007 10:05], making a tad more difficult to identify all rows entered for a batch... (I know what you're thinking, maybe I should use a BatchId column... so am I :-)!!

    Feedback welcome!

  • I tend to use defaults for a couple of reasons.

    On a column like "DateCreated", use a default and use "default" in insert statements. Having "default" in there helps document the column since that means the column name can be found in source control docs in procs that insert to that table.

    On other columns that are almost always initially a certain value, it can be useful to have that as a default. If the default value is more than a few characters, I'm lazy, so setting it as a default means I don't have to include it in all of my procs. The ones that will override the value have it, the others don't, and I save myself a few keystrokes on those ones. Not necessarily a good idea, but I do it anyway.

    On columns with complex initial values, using a UDF as the default can save coding and make for more maintainability. For example, if a column referencing a separate table could have one of five initial values, based on moderately complex business rules, a UDF might be able to figure out which value it should be, and then insert procs can just have "default" in them and the UDF will fill in the default. "If an order is started on a Saturday, it starts with value X, unless it's the day after a holiday, in which case it's Y, whereas if an order is started on a Monday in the summer, it has value A, but the rest of the summer it's value B, and in the winter Fridays are M, everything else is S, unless it's a holiday, which is always Q regardless of time of year or day of week". Something like that would be a major pain to have to build into every proc that creates a record in the orders table, but a UDF based on a calendar table and a lookup table would be able to handle it with ease. Of course, having such built into a business layer might be better, if you can absolutely guarantee that every application that accesses the database will always have the current business layer. (That can be a problem if some salespeople use laptops for order entry, and aren't always connected to the network when they do so.)

    So, your mileage may vary, but I find defaults useful.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Relating to default values and NULL's: in the areas where I do most of my work (performance data, DW, and reporting) I have a couple of scenarios where I need to indicate where an elements state (that produces data) is valid, as well as joining these elements (and their state) back to its performance data. In the schema for the element table, I generally use 2 DATETIME fields, dtFrom NOT NULL (populated from the ETL / data source), and dtTo NULL. At some later point when new element state has changed (and is collected), dtTo (for the earlier element state) is updated with the new entries dtFrom.

    When any reporting is done against the element states (which always involves a DATETIME parameter, @dtRpt) my WHERE clause will use @dtRpt BETWEEN dtFrom and COALESCE(dtTo, GETDATE()). Similar code is also used to JOIN the measurement data (with a DATETIME field) back to the element state table.

    In addition to limiting my dataset (in the reporting) to what I need, the dtTo (being NULL) acts as a "current state" flag.

    So IMHO, default values definitely are needed, and NULL is not a bad thing.

    Beer's Law: Absolutum obsoletum
    "if it works it's out-of-date"

  • But do you realize that your statement hurts performance of the query?

    This part: @dtRpt BETWEEN dtFrom and COALESCE(dtTo, GETDATE())

    In order to perform that query, SQL has to scan the entire dtTo column to determine which values it must evaluate as getdate()? If the table is large, the performance impact will be noticeable. This means that even if there is an index on dtTo, SQL can't use it because the information in it may not be valid.

    This would result in much better performance over the long run:

    Where @dtRpt >= dtFrom

    And (@dtRpt <= dtTo Or dtTo is Null)

    Now, if there is an index on dtTo, it can use the index twice.

    You'll get even better performance if you add two computed columns to your table to list the dates as integer values and persist the columns and index them. Your query will practically fly!!

    Like so:

    Alter Table YourTable Add iFrom As Cast(Convert(varchar, dtFrom, 112) As int) Persisted;

    Alter Table YourTable Add iTo As Cast(Convert(varchar, dtTo, 112) As int) Persisted;


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

  • Robert, I like your solution of casting dates as integers (I'd cast the datetime as float, but integer works too). Is there any reason not to use an isnull on the iTo calculated column? Perhaps with the maximum possible Int value. Then the Where clause in a query on it doesn't need an "or" statement, which means it can use the indexes more efficiently.

    add column

    fFrom as cast(dtFrom as float) persisted,

    fTo as cast(isnull(dtTo, 1048576) as float) persisted

    (That number allows for all dates up to and including 27 Nov 4077. I picked it because it's 2 to the 20th power, which is an easy number to remember.)

    declare @fRptDate float

    select @fRptDate = cast(@dtRpt as float)

    select *

    from dbo.Table

    where @fRptDate between fFrom and fTo

    And, of course, this is more than a little off the subject of using Defaults in tables, but persisted calculated columns are logically similar (to justify the segue). 🙂

    Of course, using this creates a sort of Y2K situation, in another 2070 years, assuming the application and database engine are still in use at that time. (Which, I must say, would be horribly disappointing in its own way. But I'll never have to suffer that disappointment.)

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Not really off topic, but I attended SQL-Fest (a Microsoft event) and there was an excellent segment on how to deal with very large databases in a dta warehouse by Kevin Cox. One thing he listed as very important was storing dates as integers. I was delighted to hear some validation from a highly respected professional on what I had been preaching for years.

    That's a good idea about using a default value for null. But rather than using an actual date, I would use a very large number such as the maximum number for the data type (2147483647 for int or whatever the max is for float). In the past, I have simply used 99991231 in place of null.

    I would prefer to use a number like this that is obviously not a valid date so that it is obvious. And when making comparisons of the integer/float value to a date value, I would always convert the date value to integer/float first so that I am making comparisons on integer/float values rather than date values.


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

  • I haven't had the chance to test performance (BETWEEN versus >= and <=...), I'm lucky if I get any chance to go back and "cleanup" stuff that I write.

    I am curious why an INT representation of DATETIME data better. I tried that in some earlier work and it just kept getting harder to work with; always converting from one format to another for various reports or queries. I would like to know why DATETIME (at 4 bytes) is any worse than INT (at 4 bytes)? If you are doing anything with it (WHERE, functions, ORDER BY, etc) , I would think that the query engine should handle it "just about the same".

    In reference to adding additional fields to the table, keep in mind that this is a DW environment, and space can become an issue. Adding a couple of extra fields (with an average size of 4 bytes each) to each table that has a couple of hundred fields in it already, with millions of records, has an impact on space. To give you an idea, I hold 7 days worth of data (hourly statistical) and the last backup was > 80 GB.

    However, all of this is off topic: default values. Is the idea of using NULL as a DEFAULT value wrong? I know a number of people seem to avoid NULL like the plague, but it accurately represents (I think) what is represented by the field (dtTo), since the time duration of the elements state is unknown, NULL seems to fit.

    Beer's Law: Absolutum obsoletum
    "if it works it's out-of-date"

  • Nice article, Andy.

    We have a LastModifiedBy column on most of the tables where inserts and updates come from the GUI's... they capture the login of the person making the entry. If it's NULL, the machine name is captured along with a bit of other info. Of course, all of this is done using defaults. Now, every once in a while, we have to do a "data cleanup"... it's not really a cleanup... it's just that someone decided that something needs to be done to one or more rows because of a "business mistake". We keep track of this things through a ticketing system and that's where the default override comes in... for those items updated or inserted do to a ticket, we use both the Ticket # and the Developer's "handle" for the UserID. Same thing with the CreatedBy where we have it... if a the insert was due to a ticket, we use the ticket information to override the default.

    On some very critical tables where we must absolutely know the user/machine name of the person doing the insert/update, we have triggers that will reject all overrides to both unless the user name follows the particular ticket marking pattern that we've established.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I surely hope to God that you are not using IDENTITY or other non-relational auto-increments in tables.

    Heh... Joe, you should write an article for SQLServerCentral and explain why you think that way. It would probably one of the most read, most hotly contested articles ever to appear on this site. Just think of all the fun you'd have... 😉

    Seriously... I'm not pokin' fun at you or daring you to do anything... I think it would be a great article and lot's of folks would jump in on the discussion. You'd also have the chance to "cite" a couple of your books...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 12 posts - 31 through 41 (of 41 total)

You must be logged in to reply to this topic. Login to reply