Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

A Quick Look at the MPP World

By David Poole,

Introduction and overview of the MPP market place.

Over the past few months I have been looking at data warehouse appliances such as Neteeza and Teradata. Teradata and Neteeza have been around for a long time and are used by some of the largest commercial companies in the world. Green Plum is a relatively new kid on the block but shows incredible promise and as they have recently been acquired by EMC2 they look like an interesting alternative. In fact the Gartner report, Magic Quadrant for Data Warehouse Database Management Systems, is available on the Green Plum web site.

Dataupia have an interesting proposition in that their product parallelises customer existing platforms whether it is SQL Server, Oracle, DB2, etc. DATAllegro was acquired by Microsoft which resulted in Project Madison which we now know as SQL2008R2 Parallel Processing Edition.

So the Massive Parallel Processing market is looking very interesting at the moment.

Microsoft's acquisition of DATAllegro and the subsequent release of SQL2008R2 Parallel Processing Edition is potentially a very significant step in the data warehousing world. The main thrust of development for SQL2008 was the BI stack and the SQL2008R2 MPP has to be viewed as a weighty foundation stone underpinning that stack.

What are Massive Parallel Processing appliances?

In a nutshell the end user sees a single database server but under the hood it is many database servers operating in tandem (parallel) to deliver blistering performance and massive scalability. Products like Kognitio WX2 and Neteeza are delivered as a populated rack of blade servers.

The different appliances use a variety of methods to achieve the same ends.

  • Some offer shared-nothing architecture where each "node" within the appliance is unaware of the other nodes participating in the appliance.
  • Some offer nodes that are fully aware of each other.
  • Others offer a system where there are master nodes that parcel out the work to other worker nodes.

In fact, when I first read about soft Numa (non uniform memory access) within SQL2008 my first reaction was that MPP appliances do this at the macro level. The appliance is not simply a bunch of servers co-operating with each other. They also manage their own storage so in some respects they act a bit like a SAN.

What are MPP appliances used for?

It would be a mistake to regard these devices as big boys toys for those organisations with more money than sense. I think in the next 5 years we can expect these devices to filter down into medium sized enterprises.

In a nutshell these appliance are aimed at heavy duty analytical work rather than fast production of stock reports. They obviously can support such stock reports but that would be like buying a Ferrari and only driving 5 miles to the shops.

Speed

Throughout my career a common bug-bear has been the user who fires off an horrendous query against the database and while the DB server sinks to its knees that user goes off on their lunch break. When they come back from said lunch break they take one look at the results and cannot remember why they wrote the query in the first place!

These people have been the bane of a DBA's existence. As a DBA I have been in that position but now realise that it was me that was at fault and not the user writing the query.

What that user was trying to do, and what MPP boxes make it possible to do, is to fire off a train of thought query

  1. Query => I wonder what is going on in product line 'x'
  2. Answer leads to further query =>Ah so that mean that activity 'y' must be happening
  3. Answer leads to further query =>Not quite but this unexpected behaviour reveals a number of data hot spots
  4. Answer leads to further query =>Are their enough customers in those hots spots to represent a business opportunity?
  5. Answer leads to significant business insight, increase in profits, bonuses and job security all round!

The speed at which the MPP boxes bring back the answer to these queries allows these question and answers to be chained together. In a traditional database the end user might not get beyond stage 2.

Another point to bear in mind is that these train of thought queries are one-off queries. They will probably not be run again which means the following

  • The queries will never be optimised because they are run only once
  • A cached execution plan is next to useless because they are run only once

 

Size

We have gone from the age of data being generated by human beings and is therefore limited to the speed and capacity that humans generate data to an age where data is generated from machines.

Any serious web site will be analysing its web traffic possibly in real time. At the SQL Bits conference in London one of the presenters was demonstrating the use of Stream Insight to produce cubes of data that are there to reveal the performance of data cubes used by their business. Their business is incredibly time sensitive so if the data cubes are not available millions of pounds can be lost in a few minutes.

This sort of telemetry represents a massive jump in the volumes of data so systems that scale to the petabyte scale rapidly become necessary.

 

Analytics Integration

Teradata, Neteeza & HP Neoview have the facility to accept pass-thru queries from SAS Analytics. This means that SAS can recognise that these platforms support its core functionality and therefore leverage the intrinsic power of the MPP platform.

SAS can throw some rather nasty queries at SQL Server so formal support for SAS pass-thru queries in the Microsoft stack is something I would very much like to see.

Green Plum supports the R open source analytics language.

What are the criteria for choosing an MPP appliance?

Requirements, requirements, requirements

You really need to be clear on what business problem you are trying to solve.

  • What data volumes do you have?
  • Is this data volume made up of well structured, high quality data or a mash of disparate data imported from legacy systems?
  • What is the projected growth of the data?
  • How up-to-date does the data need to be?
  • What query response times are needed and/or desired?

In short, know your business, know your data. Once you have a clear idea of this you have a firm basis to get meaningful information out of the vendors.

 

Integration

Investing in an MPP platform is a major investment for the business in terms of people, processes and technology.

My personal opinion is that, while such devices will eventually lead to investing in new reporting, analytics and ETL tools they should initially be capable of integrating with what the organisation already has.

MPP platforms do require a change of mindset and it is just as possible to implement then badly as it is with any other piece of technology. Although most MPP vendors are very good with regard to support if something doesn't work you do not want to be caught between two vendors arguing that each other's technology is the cause.

Strategic Partnerships

This could come under the banner of integration however it is worth asking if the MPP Vendor has any strategic partnerships with toolset or analytics vendors. If you do find that you need to replace part of your BI stack then knowing that the vendor works well with a known range of tools is of major importance.

Physical footprint

These devices are in reality more than one physical server. One vendor highlighted their biggest client as having a 192 blade appliance supporting 40PB of data! Questions have to be asked as to the physical requirements for the chosen MPP.

  • Will your chosen appliance fit in your data centre?
  • What are its power requirements and can your data centre meet them?
  • What are its cooling requirements and can your cooling systems meet them?
  • What other connectivity is required for the appliance?

 

Security

The whole purpose of a data warehouse is to allow access to data but there are going to be pools of data that are sensitive and need protecting. There are a number of security questions that need to be asked.

  • Does the appliance support encryption?
  • Does the appliance integrate with LDAP, Active Directory etc or does it have its own security scheme?
  • Does it have object level security?
  • Does it support role based security?
  • Does it expose users passwords in any way? WARNING, some of them do expose users passwords to sys admins.

 

Auditing

There are a variety of system auditing tasks that I would expect an appliance to perform

  • Query usage?
  • Security access?
  • System performance?
  • Dependency tracking?

 

Licensing

The different vendors follow different licensing models and you need to be clear on precisely what you are buying into. For some the appliance includes software and hardware and ANY change to the hardware requires an upgrade license. Others are more casual and allow upgrades to RAM and disk without licensing consequence. Others regard only the production environment for licensing purposes.

One other thing you might consider asking: does the vendor have a model that supports data warehousing as a service?

 

Programmability

Sooner or later you are going to want to do more involved that running straight queries against the MPP. A few things you might want to know:

  • Does it have the equivalent of stored procedures? Not all of them do.
  • Does it have a scripting language?
  • Does it support any embedded language such as Java?
  • Does it have an API that allows extensibility?
  • Do you have the skills in house to use the API, scripting or embedded language?
  • Does it have any form of scheduler or something comparable to SQLSERVERAGENT?

 

Recoverability

It is never a nice experience to have to go through disaster recovery on a multi terabyte system.

  • Does the appliance support a full backup strategy
    • Differential backups
    • Point in time recovery
    • Object level backups
  • How does the system react to hardware failures?
    • RAM
    • Disk
    • An entire blade going down?
  • Does it support hot swappable hardware?
  • Can the system operate in the event of hardware failure albeit in a reduced mode?
  • Does it have the equivalent of rollback/undo logs?

 

Support

  • Does the vendor offer a support contract or SLA agreement?
  • Is there a user group?
  • Are there user forums?
  • Does the vendor supply spares kit?

 

Scalability

Most appliances will operate into the high terabyte range and some well into the petabyte range. The question is not "can they scale" but the nature of how they should scale.

  • What is the recommended ratio of RAM to disk?
  • Is there a case for adding a lot of slower RAM rather than a smaller amount of ultra-high speed RAM?
  • When is it more prudent to add blades rather than upgrade other blades?
  • How much disk should each blade address?
  • Are there bottlenecks in the appliance design? For example, work may be performed across many blades but if they are managed by a single blade then that single blade is the bottleneck.
  • What user concurrency can they support? Some solutions do not major on massive user concurrency.

One point I should like to make is that while in OLTP systems growth estimates tend to be in the 2x or 3x over a planning cycle data warehouse growth tends to be in the 10x or higher growth range.

 

Training

Teradata has a full certification scheme. Do not underestimate the training required for these appliances.

Given the cost of MPP appliances it is worth negotiating training as part of the price. Also consider who needs training

  • DBAs?
  • Infrastructure specialists?
  • Analytics users?
  • Disaster recovery specialists?

 

Affordability

The purchase price is just the start although Dataupia have made "the price you see is the price you pay" part of their selling proposition.

When talking to vendors don't make any assumptions about what the price includes. Always be explicit in what is expected.

  • What are the ongoing licensing costs?
  • What are the maintenance/support costs?
  • Does it have any proprietary hardware requirements?
  • Are there additional modules or tools to be bought?

Some vendors will be remarkably flexible on the purchase price but ongoing costs (total cost of ownership) is high.

 

Flexibility

Some MPP appliances are designed to be used a specific way.

  • Is the appliance suitable for near real-time data warehousing?
  • Can it run on commodity hardware or does it require specialist kit?
  • Can it run on heterogeneous hardware or does it require all hardware to be configured the same?
  • Can it handle online data loading or is loading and querying to be treated as two separate operations that must not coincide?

Flexibility can also include the way in which the appliance is used

  • If licensed on processor power is it possible to rent temporary license extensions? For example 11 months of the year the appliance is quite capable of coping with however many blades are in the appliance but for 2 weeks of the year it really needs another couple of blades.
  • How does the vendor treat the installation of any client tools? For example, does uninstalling a tool from one machine and installing it on another require any additional service and support?

 

Design considerations

One thing that is very important in the design of a database to be put onto an MPP is the distribution of the data. Most appliances have some form of hashing algorithm to distribute the data so that the maximum number of nodes can be brought to bear on a given query.

One thing to watch out for is where you have data that has hotspots in it. For example, if you have data for the past 12 months you do not want the most recent month to be confined to a single blade otherwise that single blade will be attempting to do the lion's share of the work!

My experience has been that new technologies require a new mindset in order to get the best out of them. The vendor should be able to advise on any key considerations that are required in order to get the best out of their appliance.

In conclusion

Before Microsoft bundled DTS in with SQL Server ETL tools were a specialist niche with a niche price to match. Potentially SQL2008R2 Parallel Processing edition could have the same effect although it has to be said that there is already healthy competition in the MPP market place.

From what I have seen so far each of these appliances have their strengths and weaknesses. I have seen an appliance perform at "I still don't believe it" speeds when configured and used properly and in other cases far worse than SQL Server 2000!

The next 12-24 months should prove to be very interesting for those of us interested in the world of MPP platforms!

Total article views: 4418 | Views in the last 30 days: 1
 
Related Articles
BLOG

Parallel Data Warehouse (PDW) Tip: Useful Appliance Queries

This post contains the following queries that support development and operational tasks within a Mic...

ARTICLE

Vendor Value

Software maintenance is often required when purchasing software packages. But do the vendors deliver...

ARTICLE

Appliances - Just Add Data

This Friday Steve Jones asks about your feelings on data appliances.

BLOG

New KB Article on Remote Mirroring Support Requirements

Microsoft has published a new Knowledge Base article on the requirements to receive support for wha...

BLOG

HP Announcement: PDW V2 Appliance Features

HP Announcement: PDW V2 Appliance Features New features for the HP’s Enterprise Data Warehouse (ED...

Tags
data warehousing    
mpp    
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones