Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


A Stumbling Count


A Stumbling Count

Author
Message
gijsvanekert
gijsvanekert
Grasshopper
Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)

Group: General Forum Members
Points: 15 Visits: 62
Comments posted to this topic are about the item A Stumbling Count
SQLZ
SQLZ
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1054 Visits: 940
Interesting approach although I'm not sure I would have done it in the same way.

It looks like you're describing fact and dimension tables so you're looking at large fact tables. 12 seconds for a 300,000 row table (albeit on a test system) isn't great. Imagine a 300 million row table (or billion rows) and then try and run that CTE (joined twice) against the fact table.

Firstly, the number of production days between two dates can be calculated without the use of a CTE:

[code]
select count(*) from dbo.TstCalendar where Date between @delivery_day and @ship_date and ProdDay ='Y'
[code]

Secondly, in this instance it would make more sense to store the delivery_time as a fact within the fact table. It's additive so it fits in there quite nicely and can allow the business to ask questions like (what was our average delivery time for orders in Jan, what was our lowest, highest, etc...). Also, by storing it in the table you remove the need to perform an unnecessary join between a very large fact table and the calendar table.

This does of course mean that you need to perform this calculation during the ETL load process but it's more efficient to do so with a day's (assuming you run the load once a day worth of data than against a large fact table.

On another note, you should consider having a seperate order and shipment fact tables. A fact table should never really contain any nulls - the presence of nulls is often a clue that something is not right with the design. In this instance, it's fair to say that a ship_date is not really a fact related to orders.

Karl
source control for SQL Server
gijsvanekert
gijsvanekert
Grasshopper
Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)

Group: General Forum Members
Points: 15 Visits: 62
Karl,

Thanks for your reply. Well, actually, I think I should have mentioned that I am not describing a full blown data warehouse implementation. Building and maintaining a data warehouse is too expensive for the small and medium sized companies I work for. There are numerous companies like these and they also have their reporting needs and their problems to be solved.
I think you are right when you say that it makes a lot of sense to have delivery times stored in a data warehouse. That is, if you have one... The same is true for your remarks about null values for ship_date and a separate fact table for shipment facts. But discussing those issues would lead us to talking about details of an order management data mart and, although potentially very interesting, that is not what I had in mind at this point.

I have a question:
Your code
[code]
select count(*) from dbo.TstCalendar where Date between @delivery_day and @ship_date and ProdDay ='Y'
[code]
gives the correct number of production days between two dates, but how would you implement this code, when you want to report on, let's say, 50,000 records? (given a poor man's data mart consisting of a denormalized invoice-orders table with order_date and ship_date on the same row)

Gijs
SQLZ
SQLZ
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1054 Visits: 940
Gijs,

I assumed that it was a data warehouse because of your description of the role-play dimension but I can understand how sometimes a full-blown data warehouse is too much to consider.

With respect to calculating the delivery time using my code I would probably do something like amend your function dbo.WorkDayDiff to perform the select count(*) from the Calendar table. The only problem with this is that you'll end up with a nested loop join, which might not perform efficiently. That said, even with 10 years worth of data in this table you're still only talking of a table with 3600+ rows so it shouldn't be too bad.

However, I would still give consideration to including this field in the actual table so that you wouldn't need to perform the calculation when trying to query many thousands of rows. Instead you'd only need to perform the calculation whenever you insert a record into the table.

Karl
source control for SQL Server
sknox
sknox
SSCrazy
SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)

Group: General Forum Members
Points: 2320 Visits: 2787
SQLZ (5/31/2011)
However, I would still give consideration to including this field in the actual table so that you wouldn't need to perform the calculation when trying to query many thousands of rows. Instead you'd only need to perform the calculation whenever you insert a record into the table.


Horses for courses. This is a good principle, but some environments simply won't allow it -- whether it's because the database is owned by a third-party app that won't allow changes to tables, or because it's a hybrid transactional/analytical database and a delay of 12 seconds in running a report is preferable to a delay of .1 second when adding/updating a transaction.

If it weren't for non-standard requirements, we'd all be out of a job. :-P
thisisfutile
thisisfutile
SSC Veteran
SSC Veteran (287 reputation)SSC Veteran (287 reputation)SSC Veteran (287 reputation)SSC Veteran (287 reputation)SSC Veteran (287 reputation)SSC Veteran (287 reputation)SSC Veteran (287 reputation)SSC Veteran (287 reputation)

Group: General Forum Members
Points: 287 Visits: 980
SQLZ (5/31/2011)

If it weren't for non-standard requirements, we'd all be out of a job. :-P


I'll drink to that! Gotta love job security.
Antares686
Antares686
SSCrazy Eights
SSCrazy Eights (9.5K reputation)SSCrazy Eights (9.5K reputation)SSCrazy Eights (9.5K reputation)SSCrazy Eights (9.5K reputation)SSCrazy Eights (9.5K reputation)SSCrazy Eights (9.5K reputation)SSCrazy Eights (9.5K reputation)SSCrazy Eights (9.5K reputation)

Group: Moderators
Points: 9478 Visits: 780
You know I have done this myself but later came to wonder why people want to store all information for dual logic scenarios when they just need the smaller set. In the case of a production work date versus off dates typically the off dates are the smaller set. I would suggest just have a date table for non-production dates only which would hold the values for weekends and other non-production dates with the date column clustered indexed. These means storing around 110 dates a year for dates you don't work (could be saturdays, sundays, holidays or other days that are non-production in your environment) which is insignificant. So your query can be simply something like this

select 
TOD.Ordernr
,TOD.Orderline
,datediff(dd, TOD.OrderDate, TOD.ShipDate) - count(NPD.DateColumn) as diff
from
dbo.TestOrderDetails TOD
inner join
dbo.NonProductionDates NPD
ON
NPD.DateColumn BETWEEN TOD.OrderDate AND TOD.ShipDate
GROUP BY
TOD.Ordernr
,TOD.Orderline
,TOD.OrderDate
,TOD.ShipDate



Of course you need to manage logic of nulls and if day is inclusive (20 to 27 is 8 days) or exclusive (20 to 27 is 7 days). But it keeps logic very simple.



sloppypoet
sloppypoet
SSC-Enthusiastic
SSC-Enthusiastic (100 reputation)SSC-Enthusiastic (100 reputation)SSC-Enthusiastic (100 reputation)SSC-Enthusiastic (100 reputation)SSC-Enthusiastic (100 reputation)SSC-Enthusiastic (100 reputation)SSC-Enthusiastic (100 reputation)SSC-Enthusiastic (100 reputation)

Group: General Forum Members
Points: 100 Visits: 156
Interesting use of the date dimension table to calculate analytics. Expanding the model a little to add accounting weeks (in a 4,4,5 or similar rotation), quarters, years, etc. and following a similarly flagged (y,n or 0,1) field design provides a dynamic transition into data warehousing for the small to medium businesses that do not have the budget for a full blown data warehouse application. Then, you simply pair the flattened warehouse structures with SSRS and voila....basic management reporting 101. Nice post!
izblank
izblank
Forum Newbie
Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)

Group: General Forum Members
Points: 6 Visits: 10
Gijs,

in my company, we had to solve a similar problem - given the order date and number of production days, calculate the ship date. We did come up with a Calendar table, with the addition of a couple of (admittedly, denormalized) columns "prodDayNumber" and "nextProdDayNumber" . For production days, their values are equal, fro non-production days, "prodDayNumber" is NULL, "nextProdDayNumber" equals "prodDayNumber" of the next production day. Whenever we add more rows to the Calendar table (which happens once a year at least, or when we add more facilities), we populate those columns as well. After that, the query, similar to your last one, runs really fast.
izblank
izblank
Forum Newbie
Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)

Group: General Forum Members
Points: 6 Visits: 10
To Antares686:

The Calendar table is not very big, so you're not saving much space by storing exceptions. Besides, having a full-blown Calendar table gives you much more than just counting date difference. For example, try to solve a reverse problem - given the order date and number of production days, calculate shipping date. Or, given an employee's time sheet, calculate compensation, keeping in mind that different days can have different pay rates. Possibilities are endless.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search