RE: Recursive Queries in SQL:1999 and SQL Server 2005

Hall of Fame

Points: 3532

April 28, 2005 at 8:48 am

I apologize in advance if this set of questions takes the topic too far off track. We can easily take the conversations to a different thread if appropriate.

Excellent article. It is very clear that 2005 will greatly simplify traversing hierarchies and networks, at least from a syntax perspective.

Most of my work uses multiple orthoganal hierarchies and networks as a way to aggregate and analyze large volumes of data. For example, consider an organization hierarchy that is related to but not dependent on a geographical network. Using 2005, what happens to performance as we add this additional real world complexity?

In data warehouse terminology, the organization tree is a slowly changing dimension. The geographic tree and the time tree are static. Data is aggregated and analyzed along the organization tree, along the geographic tree, and along the "time" tree.

How might one change the examples to walk two or more trees at the same time? For example, suppose one wanted to stop in towns along the way and buy a bottle of wine. One would need to store the cost of a bottle of wine in each town, for each day, over the last few months. The "best" route then would consider not only the physical distance, but also the average cost of a bottle of wine over the last two weeks. This example requires us to walk the geography tree and the time tree at the same time in order to calculate the "best" route.

In my real work, there are actually nine dimensions of interest. Three of the dimensions are slowly changing dimensions, two are static dimensions, and four are flat dimensions. The dimensions are a mix of hierarchies and networks. Heterogeneous devices submit data on time intervals varying from sub-second to daily. My current approach is to use recursive algorithms to populate what Kimball refers to as bridge tables, and then associate the aggregate values with nodes in the bridge. Once collected, the data is stored in the warehouse and available for ad hoc queries for two years. Yes, there are multiple terabytes of data online for ad hoc query and analysis.

Given this problem space, how might one modify the examples to efficiently aggregate the data "on demand" rather than storing redundant data for each node in the hierarchies and networks?

Thanks in advance for any thoughts

Wayne