• Eric Mamet (4/26/2015)


    Thanks

    It's very useful 🙂

    On the NoSQL side... Why using NoSQL?

    Is it because of the type of data you hold, or because of volume?

    I think remembering that "Big Data" is caracterised by the three "V"

    - Velocity

    - Volume

    - Variety

    In our case, we don't have yet (IMHO) any of those three Vs therefore I find SQL Server quite adequate for us.

    Do you think there could there be other reasons to use NoSQL if we already are a Microsoft shop?

    Cheers

    Eric

    I'm really starting to dislike the term big data simply because it's so vague. But, some of the main reasons people are leaning towards NoSQL is the fact it's open-source, it's schema-less and commodity cluster computing.

    Yes, it does handle high volumes, high velocity and diverse data well.

    But for us, I think the main reason is the schema-less system that makes it very flexible to just add data in go while having the enterprise-ready feel that scales across commodity hardware.

    SQL Server does require some work to get that ETL up and running. It does take some work to structure and sometimes find a relation. NoSQL is a bit more flexible in that regard, but not everything we need where we would wake up one day and find our RDBMS gone.

    That's because, at the end of the day, our data eventually needs structure. It eventually needs that schema. After the data scientist have explored and analyzed the data in NoSQL, it's going to be shifted towards a relational model. This is because we still need that structure for the end-user. We need some type of strict governance over the data that is going to bring it up to a standardization for reporting.

    If not, our data will still remain chaotic even though it's clean. It's very easy to shift and change the data in NoSQL. Therefore, the end user is going to have to be flexible and change with the data too. Unfortunately, our end users are not very tech-savy. Having to shift with the data may require them to have more knowledge of that shift (i.e.: what data changed, what data is new).

    So, having that data discovery or centralized data platform is good with NoSQL. Then having that structure to the madness that is going to standardize the data for the future is also good too with RDBMS.