Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Consideration for Azure for Bisness intelligence Expand / Collapse
Author
Message
Posted Monday, March 17, 2014 5:29 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Wednesday, November 19, 2014 10:10 AM
Points: 79, Visits: 463
Im just started a new position as BI developer and planning out my strategie for implementing a BI solution. This will as I see it most definitely involve a dimensional Data warehouse populated with SSIS with SSRS and SSAS running reports and cubes with possibly powerpivot and clikview etc.

They company are now looking to move there database to the cloud and are getting a consultant in next week to disscuss the options. As I know nothing really about Azure im looking for things I should be aware of that will impact my plans regarding the DW and Azure.

is this straight forward or are there issues I should be aware of

Thanks
Post #1551687
Posted Monday, March 17, 2014 5:39 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Monday, July 21, 2014 2:56 AM
Points: 2,603, Visits: 2,061
What will be the size of the database?

---------------------------------------------------
"Thare are only 10 types of people in the world:
Those who understand binary, and those who don't."
Post #1551693
Posted Monday, March 17, 2014 5:43 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Wednesday, November 19, 2014 10:10 AM
Points: 79, Visits: 463
Hard to be specific as our systems are changing state just now but I would definitely say on the smaller side 10-20gb
Post #1551695
Posted Tuesday, March 18, 2014 7:43 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Friday, November 14, 2014 8:26 AM
Points: 2,894, Visits: 3,278
The future for data warehousing is column-store and in-memory tables. Most organisations with a BI store of under 2TB will not need to bother with complexities such as SSAS because in-memory can do the aggregations on the fly within acceptable response times.

All of the big 3 vendors now have variations of column-store and in-memory tables, and there are a number of enterprise-quality BI niche databases also with these functions.

This adds up to saying that any data warehouse based on a tabular database is no longer fit for purpose. This does not mean that it has suddenly broken. It means that tabular is to steam trucks what column-store is to diesel trucks. Both can carry stuff. But diesel can carry an order of magnitude more than steam for an order of magnitude less overhead cost.

One of the impacts we will see over (say) the next 4 years to 2018 is a drastic simplification of BI stacks. Most of the work we have traditionally done to give end users the performance they need is no longer necessary. In-memory can take the load and give the aggregates on demand. Another impact is likely to be puncturing the big-data bubble, as organisations realise that 2TB is too small to count as big data, and the shift to these techniques is not really needed.

Relating all this to the OP, as SQL2014 goes into RTM look out for what Microsoft announces for Azure. I have no inside track to Microsoft plans, but anyone who has had the time has seen what Hekaton can do within SQL2014, and can work out what capabilities this will add to Azure. Regarding whether you should host your data in your normal DC or in the cloud, treat both options as equally worthy from a technical viewpoint and look at what is most cost effective for you.


Original author: SQL Server FineBuild 1-click install and best practice configuration of SQL Server 2014, 2012, 2008 R2, 2008 and 2005. 18 October 2014: now over 31,000 downloads.
Disclaimer: All information provided is a personal opinion that may not match reality.
Concept: "Pizza Apartheid" - the discrimination that separates those who earn enough in one day to buy a pizza if they want one, from those who can not.
Post #1552178
Posted Tuesday, March 18, 2014 9:04 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Wednesday, November 19, 2014 10:10 AM
Points: 79, Visits: 463
Informative response thanks

When you say "column-store and in-memory tables" what exactly do you mean, im somewhat new to DW.
Post #1552233
Posted Tuesday, March 18, 2014 9:42 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Friday, November 14, 2014 8:26 AM
Points: 2,894, Visits: 3,278
Column-store, in-memory, and tabular are all about how the data is physically stored. You may also see people talking about shared-nothing and shared-disk, which is also about how data is stored but at a higher level.

Tabular format is where data is physically stored in rows. In a table with 10 columns and 10 rows, you will have 100 data items physically stored. Data compression can reduce the amount of space needed to store this data, but you still have 100 data items stored.

Column format is where the row is broken down into columns, and only unique values for each column are stored. So with the table with 10 columns and 10 rows, is all values are unique you still get 100 data items stored, but if all rows have the same value then you only have 10 data items stored. You also get the same compression capabilities as row format on top of this to save space.

On space savings alone you typically see a 3:1 saving for compressed tabular format data but typically you get a 10:1 saving for column format data. This means that if it took 10 separate disk read requests to get your uncompressed tabular data, it would take only 3 read requests to get compressed tabular data and only 1 read request to get compressed column data. In other words, column format data typically gets your data faster even without any further cleaver bits.

However, column format data is (for almost all types of column format DBMSs) automatically indexed on every column as part of the data storage process. This is needed so that the DBMS can quickly reassemble rows for presentation to the user. This means that you no longer have to worry about which columns to index to improve performance - they are all indexed.

For a simple SELECT * FROM table you do not see the benefits of column format storage, but as soon as you add a WHERE clause things start to hum. Add a SUM(column) and a GROUP BY and column format means you no longer get a coffee break between starting your query on a 50 million row table and getting the result.

At my place we are still in the early stages of exploiting column format but typically get a simple SUM(column) with a GROUP BY query on a 25 million row table returning in 1 to 2 seconds. More complex stuff takes a bit longer, but our users are happy about waiting 15 seconds for a detailed breakdown of our market penetration month by month over the past 2 years. When this data was in tabular format they had to plan other work to do while the query was running.

In-memory takes things a step further. All the big 3 (SQL, Oracle, DB2) take the same approach, which is also shared by Hana, Redshift, Postgres and many other DBMSs, in that only column format tables are held in memory. Holding a table in memory is different to having a portion of it in the bufferpool, as the access mechanisms are completely different and have a shorter CPU instruction path.

A cleaver guy in SAP worked out in the mid noughties that if you make your data transfer size the same as your CPU cache size then transfer between memory and cache can be done in a single CPU instruction. It also takes less CPU instructions to query the data if everything that is needed for a given compare is held in the CPU cache. This led to the Hana database, and the other vendors have taken this idea on for their own products. (Don't ask me about patents and licensing, if anything was being ripped off we would know).

The end result is that if you combine column-format with in-memory you typically get a response time between 1 and 2 orders of magnitude faster than you get from tabular format. The impact of this is that aggregation can almost always be done at query time while producing results fast enough to avoid the need to pre-aggregate into a cube.


Original author: SQL Server FineBuild 1-click install and best practice configuration of SQL Server 2014, 2012, 2008 R2, 2008 and 2005. 18 October 2014: now over 31,000 downloads.
Disclaimer: All information provided is a personal opinion that may not match reality.
Concept: "Pizza Apartheid" - the discrimination that separates those who earn enough in one day to buy a pizza if they want one, from those who can not.
Post #1552276
Posted Wednesday, March 19, 2014 5:28 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Wednesday, November 19, 2014 10:10 AM
Points: 79, Visits: 463
This is very interesting


Im doing some reading now on this now, i hope you don't mind answering any other questions this produces. Bu wanted to thank you anyway for the info so far.
Post #1552561
Posted Monday, June 30, 2014 6:35 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, October 30, 2014 3:37 AM
Points: 112, Visits: 89
Thanks EdVassie for that response. Very interesting stuff.
Post #1587609
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse