Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12

A Tool for the Job Expand / Collapse
Author
Message
Posted Tuesday, November 20, 2012 9:03 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Wednesday, January 16, 2013 4:23 PM
Points: 415, Visits: 2,333
I've always accepted that all the tasks we want our digital machines to do are going to vary in the methodologies best used to implement them. I'm also hopefull that SQL folks at some point in their career come to the same conclusion as it would seem to be obvious to me, but I'm far from certain that this will be the case.
Post #1386986
Posted Tuesday, November 20, 2012 10:03 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Thursday, November 14, 2013 3:13 PM
Points: 306, Visits: 1,458
Ultimately any task where you care more about high speed and low cost than you do about consistency is a great candidate for NoSQL. However, I have to comment on this one.

Users/groups and ACLs: ('To some degree, LDAP was the original NoSQL database').

There are vulnerabilities in AD that resulted from this philosophy. For example: http://social.technet.microsoft.com/Forums/en-US/exchangesvrclientslegacy/thread/3da53460-ef76-4f01-94c9-f7b96fdaf99d

I also find the high-frequency trading bit to be really scary. Given the number of news stories about these applications going rogue I'm not sure I'd use it as a poster child for NoSQL. Not that these issues are related to NoSQL.

-DW
Post #1387023
Posted Tuesday, November 20, 2012 5:06 PM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Saturday, September 27, 2014 5:20 AM
Points: 467, Visits: 868
Darren Wallace (11/20/2012)
Ultimately any task where you care more about high speed and low cost than you do about consistency is a great candidate for NoSQL.


That pretty much sums it up, but I'll throw in my 2c anyway

Reccomendations & Time-series/forcasting: These are data-mining tasks where you can certainly use an RDBMS as a data-store. Both of these tasks can be done using built-in algorithms in SSAS projects with data sourced straight from SQL Server or if you like a Cube. Other data-mining packages like SAS, SPSS, Statistica, R and even tools like ggobi or weka expect their input data to be structured, at the very least as delimited flat files. If you are feeding unstructured data into these tools you will first need to do a lot of structuring! Pre-preparation of data is 99% of the work in data-mining: binning, grouping, binarising... these are all essentially structuring techniques that are compulsory for some algorithms. You can do that from NoSQL, sure, but you can't get away from the need to first structure the data. What can you really do with unstructured data? Store it, view it that's about all. Interpreting the data on the fly still counts as structuring.

Admittedly, Amazon is the king of the "recommendation" and they use their own NoSQL datastore, but when they were getting started distributed RDBMS were fairly primitive so it's no wonder they didn't use something off the shelf. When you are a big company that specialises in big-data and derives immense value directly from the clever handling of that data then you need proprietary competitive advantage, and can justify the cost.

Their EC2 service now lets you spin up big clusters of SQL server (and other RDBMS) and I wonder why they would do that if NoSQL were the only way? Kimball's whitepaper on big data suggests that when you are "drinking from the firehose" (I love that visceral imagery) that NoSQL makes sense as an initial data store. With Microsoft investigating Hadoop I can see an extended kimball process of capture to NoSQL --> ETL (with structuring) to RDBMS --> BI/Reporting / Analytics makes sense for most companies. If you want to skip the middle bit then you'll no doubt be using a team of java gurus and rolling your own analytics and reporting layer, and that's where it's going to cost a lot. I know it sounds counter-intuitive to say the shorter process will cost more but I think shortly Microsoft will provide easy tools to do ETL from NoSQL to RDBMS as a staging process. They might even plug BI tools directly into NoSQL but I find that less likely, the analogy is doing BI directly from OLTP - which is problematic and been well covered. Again it comes back to latency and how real-time you need to be and how much it's that's all worth to you.

LDAP: LDAP queries are not fun. Having tried to query AD via a linked server using both SQL and LDAP syntax I can tell you that it is like pulling teeth. I'd use some existing package for that any day. That's not quite the same thing as storing ACL data in an RDBMS though.

The media repository argument is interesting. They way blobs are handled in RDBMS seems to flip-flop between storing them in tables versus storing pointers to file system objects and there doesn't seem to be any agreement about the best way to do it. I suppose something closer to the filesystem naturally makes sense, but it's all about how you index and search for them. I'm interested in what develops there.
Post #1387194
Posted Wednesday, November 21, 2012 1:29 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 3:23 PM
Points: 2,907, Visits: 1,830
Darren Wallace (11/20/2012)
Ultimately any task where you care more about high speed and low cost than you do about consistency is a great candidate for NoSQL.


Anyone have a view on data quality with regard to NoSQL? Does it help/hinder?


LinkedIn Profile
Newbie on www.simple-talk.com
Post #1387271
Posted Wednesday, November 21, 2012 2:59 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Today @ 7:58 AM
Points: 587, Visits: 2,538
@David.Poole
I reckon that it varies enormously with the product, and with a single product over time. In some cases, the data consistency problems haven't been exactly subtle. Did you see ...
http://blog.engineering.kiip.me/post/20988881092/a-year-with-mongodb
http://www.infoq.com/news/2011/11/MongoDB-Criticism
... but things have been quieter for a year so these problems seem to have been sorted out.



Best wishes,

Phil Factor
Simple Talk
Post #1387311
Posted Wednesday, November 21, 2012 7:38 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 3:26 PM
Points: 1,675, Visits: 4,779
David.Poole (11/21/2012)
Darren Wallace (11/20/2012)
Ultimately any task where you care more about high speed and low cost than you do about consistency is a great candidate for NoSQL.


Anyone have a view on data quality with regard to NoSQL? Does it help/hinder?

I believe that the importance of data quality (ie: dropped, uncommitted, or orphaned records) depends on the application. That's why non-relational databases are more of a natural choice for some organizations than it is for others.

Let assume that Google, FaceBook, or Twitter had an intermittent transactional consistency issue such that 1 listings out of the top 100 were randomly excluded each time a user hit the website. Unless a QA engineer were systematically analyzing the results looking for such specific irregularities... no one would notice. Even if some users did notice, it wouldn't be a show stopper. It would be like "Oh, yeah we know about that bug and some people working on it.", but it's not as if external auditers or a regulatory agency is going to shut the business down until the problem is fixed.

In the banking industry, accounts have to balance, and information presented to clients is not subjective at all. Likewise, in the healthcare, government, or scientific industries, the data drives critical decisions and thus matters in a critical way.

For an e-commerce company like Amazon or e-Bay, those listing presented to users in the web browser are potential sales. I guess the same applies to Google and FaceBook, but to a lesser extent. However, from their perspective it's the paid add links, not the aggregated content, that matters most.
Post #1387431
Posted Wednesday, November 21, 2012 11:54 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 3:23 PM
Points: 2,907, Visits: 1,830
My concern regarding data quality is largely that it is absent.

The solution compiles, tests run, the users like the UI, job done. The reporting/BI aspect is someone elses problem.

Only it's NOT! For data to be an asset rather than a liability you have to build sufficient domain/referential integrity in as a foundation stone. It matters not whether it is NoSQL, flat files or RDBMS. If a NoSQL document needs to be a certain structure then how is that enforced? If attributes within the documents have rules defining legitemate values how are those enforced?

I worry that some of the development community are jumping on the band waggon of NoSQL simply because they are trying to bypass the disciplines that DBAs insist on without understanding why those disciplines are so important.

If you've been to any Big Data conferences you will come away thinking "but that's been my day job for 'x' years! Big Data is just a marketing term!".
Yes and No. The reason something so old has only just been given a marketing term is because non-IT and non-data professionals are now sitting up and taking notice of data and recognising its intrinsic worth. Up until now that audience has played lip service to topics surrounding "data as an asset" without really believing it.

Now they are realising they could make serious money out of data the light-bulbs have gone on only to find that the ancient curse of SHISHO is still as potent today as it was to our ancestors.

I may be data Santa Claus but what I can deliver on the stale mince pies of data quality and astringent brandy of technical debt is a rather nasty smell.


LinkedIn Profile
Newbie on www.simple-talk.com
Post #1387561
« Prev Topic | Next Topic »

Add to briefcase ««12

Permissions Expand / Collapse