Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««2324252627»»»

Just curious, what are your SQL pet peeves ? Expand / Collapse
Author
Message
Posted Sunday, June 15, 2014 2:14 PM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Today @ 8:48 AM
Points: 300, Visits: 3,364
Koen Verbeeck (6/14/2014)
Typically you break a datetime into a date part and a time part and store it in two dimensions.
This allows you to solve analytical questions on the date part (with attributes stored in the date dimensions), such as influences of weekdays, holidays, seasonality and so on. In the time dimension, you can store attributes such as opening hours, part of the day (morning, midday, afternoon, evening, night) and so on. So the seperation is done for a reason

If you need the actual datetime for some reason (and you don't want to join two times), you can indeed store it along in the fact table as a degenerate dimension.


Plus about a million


I'm a DBA.
I'm not paid to solve problems. I'm paid to prevent them.
Post #1580957
Posted Sunday, June 15, 2014 2:23 PM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Today @ 8:48 AM
Points: 300, Visits: 3,364
Koen Verbeeck (6/14/2014)
andrew gothard (6/13/2014)

" store the dates as integers, like 20140612, and then join to a 'Date' table with this 'key' to get a real date field" - this still is a strong design smell, but if you're using a DateDim (or calendar table) in a DataWarehouse (or other) environment, ish. I think storing the integer format date, rather than an identity for the join is plain stupid. But a link into a DateDim / Calendar table is a lot less hassle and a bloody sight easier if you want to know working days, financial weeks, months, years.


What is the difference between having an integer key that has the value 12575 or the value 20140513?
The first one is just a sequence, while the second one is a sequence as well but with some gaps once in a while.
The second one helps debugging (no extra join needed when you quickly want to check something) and also helps with partitioning if necessary.

Why is the second one plain stupid? I'd like to be enlightened


Badly phrased, sorry. In a column that should be a date, it's plain stupid (if you do that in your Order Taken timestamp column in your SOP system, you're an imbecile and your keyboard should be confiscated) - in a DateDim, ok-ish, but personally I'd prefer an identity. If it looks like a date, someone's going to do something stupid with it at some point.


I'm a DBA.
I'm not paid to solve problems. I'm paid to prevent them.
Post #1580958
Posted Monday, June 16, 2014 7:32 AM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Today @ 7:46 AM
Points: 281, Visits: 1,066
Evil Kraig F (6/13/2014)
Offhand, even in my warehouses, I never really saw a significant value in storing the date as a four byte int instead of the 8 byte datetime. Yes, I have used many a calendar table, and just keyed it off the datetime field. With a constraint to make sure the value equaled itself without time no 'bad' data would become included in either the fact or the dimension, and it allowed us to skip, completely, moving to the dimension table for date range based information that didn't require the subdata and allowed for more effective indexing against the fact.

I'm sorry, but I can't agree with the idea of surrogate keying a date. There's too much value to bringing the dimension in later in the execution plan to bother, especially if you're running ROLAP cubes.

I think that they set it up to be used for cubes. To have one 'Date' table and every place you would have had a date in a table it is a 'date ID' to point ot this 'Date' table. I was looking at one table and it has at least 4 'date ID' fields, that means 4 joins just to get the real date values. This just seems to be creating a lot of extra joins for each table. Like I said before I think it may be fast now for the amount of data they have loaded, only about 4 or 5 years of our data. But I would think this is only going to get slower the more data that is loaded.
Post #1581120
Posted Monday, June 16, 2014 7:44 AM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Today @ 7:46 AM
Points: 281, Visits: 1,066
andrew gothard (6/15/2014)
[quote]Koen Verbeeck (6/14/2014)


Badly phrased, sorry. In a column that should be a date, it's plain stupid (if you do that in your Order Taken timestamp column in your SOP system, you're an imbecile and your keyboard should be confiscated) - in a DateDim, ok-ish, but personally I'd prefer an identity. If it looks like a date, someone's going to do something stupid with it at some point.


That's what I 'm afraid someone outside of IS is going to look at that 'date ID' field and say, 'Well that's the date.' and try and do "something stupid with it".
Post #1581127
Posted Monday, June 16, 2014 9:23 AM
Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Thursday, September 18, 2014 12:39 AM
Points: 3,105, Visits: 11,494
Koen Verbeeck (6/14/2014)
Michael Valentine Jones (6/13/2014)

Something no one has mentioned is DATETIME in the fact table for situations where it may be important. I have seen people break date and time into different columns. That just makes for horrible coding for queries when you need to select data for a DATETIME range. Example: from 10:00 am on the second day of the month through 6:00 pm yesterday. I believe Kimball called a dimensional column like this with no dimension table a "degenerate" dimension. Of course, you could also have Date and time of day dimensions in addition to the "degenerate" datetime dimension.


Typically you break a datetime into a date part and a time part and store it in two dimensions.
This allows you to solve analytical questions on the date part (with attributes stored in the date dimensions), such as influences of weekdays, holidays, seasonality and so on. In the time dimension, you can store attributes such as opening hours, part of the day (morning, midday, afternoon, evening, night) and so on. So the seperation is done for a reason

If you need the actual datetime for some reason (and you don't want to join two times), you can indeed store it along in the fact table as a degenerate dimension.


Yes, of course you can have the Date and Time dimensions, which is why I mentioned them in my post, and didn't imply that you would not want them.

The point I was making was that sometimes you want the datetime together, particularly for selection of datetime ranges. Notice the complexity of the second query without the datetime dimension when only the Date and Time dimensions are available.

-- With degenerate datetime dimension 
select
*
from
SalesFact sf
where
SaleDatetime >= '2014-06-02 10:00:00' and
SaleDatetime < '2014-06-16 18:00:00'


-- Without degenerate datetime dimension 
select
*
from
SalesFact sf
inner join
DateDim dd
on dd.Date = sf.Date
inner join
TimeDim td
on td.Time = sf.Time
where
dd.Date between '2014-06-02' and '2014-06-16'
and
case
when dd.Date = '2014-06-02' and td.Time < '10:00:00'
then then 0
when dd.Date = '2014-06-16' and td.Time >= '18:00:00'
then then 0
else 1 end = 1






Post #1581218
Posted Monday, June 16, 2014 10:44 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 8:23 AM
Points: 12,954, Visits: 10,724
Michael Valentine Jones (6/16/2014)
Koen Verbeeck (6/14/2014)
Michael Valentine Jones (6/13/2014)

Something no one has mentioned is DATETIME in the fact table for situations where it may be important. I have seen people break date and time into different columns. That just makes for horrible coding for queries when you need to select data for a DATETIME range. Example: from 10:00 am on the second day of the month through 6:00 pm yesterday. I believe Kimball called a dimensional column like this with no dimension table a "degenerate" dimension. Of course, you could also have Date and time of day dimensions in addition to the "degenerate" datetime dimension.


Typically you break a datetime into a date part and a time part and store it in two dimensions.
This allows you to solve analytical questions on the date part (with attributes stored in the date dimensions), such as influences of weekdays, holidays, seasonality and so on. In the time dimension, you can store attributes such as opening hours, part of the day (morning, midday, afternoon, evening, night) and so on. So the seperation is done for a reason

If you need the actual datetime for some reason (and you don't want to join two times), you can indeed store it along in the fact table as a degenerate dimension.


Yes, of course you can have the Date and Time dimensions, which is why I mentioned them in my post, and didn't imply that you would not want them.

The point I was making was that sometimes you want the datetime together, particularly for selection of datetime ranges. Notice the complexity of the second query without the datetime dimension when only the Date and Time dimensions are available.

-- With degenerate datetime dimension 
select
*
from
SalesFact sf
where
SaleDatetime >= '2014-06-02 10:00:00' and
SaleDatetime < '2014-06-16 18:00:00'


-- Without degenerate datetime dimension 
select
*
from
SalesFact sf
inner join
DateDim dd
on dd.Date = sf.Date
inner join
TimeDim td
on td.Time = sf.Time
where
dd.Date between '2014-06-02' and '2014-06-16'
and
case
when dd.Date = '2014-06-02' and td.Time < '10:00:00'
then then 0
when dd.Date = '2014-06-16' and td.Time >= '18:00:00'
then then 0
else 1 end = 1



You are 100% correct.
However, and this may be colored by my experience, I believe such queries are rather uncommon in data warehouses where the typical query is more analytical in nature. Correct me if I'm wrong




How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at LessThanDot.

MCSA SQL Server 2012 - MCSE Business Intelligence
Post #1581270
Posted Monday, June 16, 2014 5:11 PM
Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Thursday, September 18, 2014 12:39 AM
Points: 3,105, Visits: 11,494
Koen Verbeeck (6/16/2014)
Michael Valentine Jones (6/16/2014)
Koen Verbeeck (6/14/2014)
Michael Valentine Jones (6/13/2014)

Something no one has mentioned is DATETIME in the fact table for situations where it may be important. I have seen people break date and time into different columns. That just makes for horrible coding for queries when you need to select data for a DATETIME range. Example: from 10:00 am on the second day of the month through 6:00 pm yesterday. I believe Kimball called a dimensional column like this with no dimension table a "degenerate" dimension. Of course, you could also have Date and time of day dimensions in addition to the "degenerate" datetime dimension.


Typically you break a datetime into a date part and a time part and store it in two dimensions.
This allows you to solve analytical questions on the date part (with attributes stored in the date dimensions), such as influences of weekdays, holidays, seasonality and so on. In the time dimension, you can store attributes such as opening hours, part of the day (morning, midday, afternoon, evening, night) and so on. So the seperation is done for a reason

If you need the actual datetime for some reason (and you don't want to join two times), you can indeed store it along in the fact table as a degenerate dimension.


Yes, of course you can have the Date and Time dimensions, which is why I mentioned them in my post, and didn't imply that you would not want them.

The point I was making was that sometimes you want the datetime together, particularly for selection of datetime ranges. Notice the complexity of the second query without the datetime dimension when only the Date and Time dimensions are available.

-- With degenerate datetime dimension 
select
*
from
SalesFact sf
where
SaleDatetime >= '2014-06-02 10:00:00' and
SaleDatetime < '2014-06-16 18:00:00'


-- Without degenerate datetime dimension 
select
*
from
SalesFact sf
inner join
DateDim dd
on dd.Date = sf.Date
inner join
TimeDim td
on td.Time = sf.Time
where
dd.Date between '2014-06-02' and '2014-06-16'
and
case
when dd.Date = '2014-06-02' and td.Time < '10:00:00'
then then 0
when dd.Date = '2014-06-16' and td.Time >= '18:00:00'
then then 0
else 1 end = 1



You are 100% correct.
However, and this may be colored by my experience, I believe such queries are rather uncommon in data warehouses where the typical query is more analytical in nature. Correct me if I'm wrong


As always, it depends on the application.

If your warehouse shades closer to an operational data store where you may be looking for particular items or exceptions, then you would be more likely to need the datetime dimension.


Post #1581992
Posted Wednesday, June 18, 2014 5:06 AM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Today @ 8:48 AM
Points: 300, Visits: 3,364
below86 (6/16/2014)
andrew gothard (6/15/2014)
[quote]Koen Verbeeck (6/14/2014)


Badly phrased, sorry. In a column that should be a date, it's plain stupid (if you do that in your Order Taken timestamp column in your SOP system, you're an imbecile and your keyboard should be confiscated) - in a DateDim, ok-ish, but personally I'd prefer an identity. If it looks like a date, someone's going to do something stupid with it at some point.


That's what I 'm afraid someone outside of IS is going to look at that 'date ID' field and say, 'Well that's the date.' and try and do "something stupid with it".


Outside IS?


I'm a DBA.
I'm not paid to solve problems. I'm paid to prevent them.
Post #1582817
Posted Wednesday, June 18, 2014 7:19 AM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Today @ 7:46 AM
Points: 281, Visits: 1,066
andrew gothard (6/18/2014)
below86 (6/16/2014)
andrew gothard (6/15/2014)
[quote]Koen Verbeeck (6/14/2014)


Badly phrased, sorry. In a column that should be a date, it's plain stupid (if you do that in your Order Taken timestamp column in your SOP system, you're an imbecile and your keyboard should be confiscated) - in a DateDim, ok-ish, but personally I'd prefer an identity. If it looks like a date, someone's going to do something stupid with it at some point.


That's what I 'm afraid someone outside of IS is going to look at that 'date ID' field and say, 'Well that's the date.' and try and do "something stupid with it".


Outside IS?


Yes, we have certain departments that have a few people that know just enough SQL to write some queries, most of the ones I've seen would be perfect examples for this post, on what not to do. We currently don't have enough programmers(developers) to fill all the request from everyone, so some areas have found people in their departments that can run queries to answer some questions. The amount of manual work some of these people go thru is amazing. Spending days to put together reports that if we had the time we could automate and have it built in minutes. But what can you do, only so many hours in a day.
Post #1582907
Posted Wednesday, June 18, 2014 7:29 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 11:16 AM
Points: 35,216, Visits: 31,671
below86 (6/18/2014)
andrew gothard (6/18/2014)
below86 (6/16/2014)
andrew gothard (6/15/2014)
[quote]Koen Verbeeck (6/14/2014)


Badly phrased, sorry. In a column that should be a date, it's plain stupid (if you do that in your Order Taken timestamp column in your SOP system, you're an imbecile and your keyboard should be confiscated) - in a DateDim, ok-ish, but personally I'd prefer an identity. If it looks like a date, someone's going to do something stupid with it at some point.


That's what I 'm afraid someone outside of IS is going to look at that 'date ID' field and say, 'Well that's the date.' and try and do "something stupid with it".


Outside IS?


Yes, we have certain departments that have a few people that know just enough SQL to write some queries, most of the ones I've seen would be perfect examples for this post, on what not to do. We currently don't have enough programmers(developers) to fill all the request from everyone, so some areas have found people in their departments that can run queries to answer some questions. The amount of manual work some of these people go thru is amazing. Spending days to put together reports that if we had the time we could automate and have it built in minutes. But what can you do, only so many hours in a day.


I once worked for a company with a similar problem. The solution was pretty simple... I spent some time with the users to find out what kind of reports they were typically interested in and then held some Lunch'n'Learn classes that taught them all about Date functions, Crosstabs, how to write the right kind of dynamic SQL for the CrossTabs, how to do pre-aggregation for wicked performance, and how to do a little Divide'n'Conquer querying for even better performance. The classes each had a handout associated with it that became a part of a "book" in Word complete with a table of contents and a bit of an index and a glossary. The ROI for the time I spent was huge and the users got very smart about their repointing queries very quickly. It was absolutely amazing at how "hungry" and appreciative they were for knowledge.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1582916
« Prev Topic | Next Topic »

Add to briefcase «««2324252627»»»

Permissions Expand / Collapse