Comments posted to this topic are about the item Why PolyBase matters
Thanks for thoughtful article.
Have one question - does PolyBase in your case generate Map/Reduce jobs to process data, or its still 'move data into SQL' and then process it?
Basically - do you need to transfer again all your dataset when you use PolyBase or it transformed into Map/Reduce and returning to you only result set?
Thanks and regards!
Glad you enjoyed the article, part 2 will be available tomorrow which shows more technical information.
However to answer your question, the inclusion of an HDI Region (whether on-premise in the appliance or azure) should be seen as the addition of hardware and software and is a separate engine from the PDW. The data movement between HDI and the PDW regions is a fully parallelized process controlled by Data Movement Services (DMS) but PolyBase is still responsible for the predicate push down capabilities which uses MapReduce to reduce the amount of data moved between the two regions based on the query predicates.The final result set produced by the MapReduce job is then moved using DMS to temporary tables in the PDW.
Important to note though, this only happens when the amount of data that would be landed in the PDW region exceeds 1GB per distribution (and you have enabled the functionality by providing the resource manager location in the external data source configuration)
I just want to credit a friend of mine and an APS guru James Rowland-Jones for provide the specific details and exact context to this answer.
I'll also refer you to a very good article by another APS Guru James Serra : PolyBase Explained[/url] where he explains exactly how the current implementation of PolyBase works and its limitations.
After all, what IS Polybase?
Very interested in follow-up articles, Alain. Keep them coming!
One question: With the creation of the Azure data warehouse, don't you make the DR set-up you had before obsolete?
Viewing 5 posts - 1 through 4 (of 4 total)