Let me reedit this after going back and re-reading the article. I think the big issue for me with this solution is the fact you're reading everything from the data store into the SQL database. The data store is billed based on reads, writes, and I believe bandwidth too. If you're using data lake analytics, you also get that extra bill of processing the data when it's ready for data factory to copy. This is a huge problem with those using large datasets because you're reading data that has already been processed and being billed for it? For reference, I know it's going to invalidate data that already exists, but still has to read it to invalidate it.
In my example, data is split YYYY/MM/DD for every data source in the data store. Thus, data factory will need to base it's reads on the correct YYYY/MM/DD path in the store to prevent unnecessary reads thus unnecessary charges that could kill your pocket over time. Can this solution address that in any way? Otherwise, this will work great for small data or dimensions/match tables, not so much for large datasets, which is likely the purpose most people use data lake store for.