• Greg_Della-Croce (6/6/2013)


    Thanks for clearing that up for me. Since Hadoop is a great scanning tool for when you have data over a TB, are there tools that can reach in and do discovery data mining on the results? I am not familiar with PIG and the other tools you talked about.

    My interest is in taking linguistic works, running them into a database of some sort, and doing discovery mining on them. The base is large enough for Hadoop to be a candidate to help, but it is discovery tools that I am missing the idea for right now.

    If you are mining text, then Splunk might be a good alternative. I haven't used it yet myself, but I ahve a colleague that is testing it on corporate network logs that total 500GB-1TB per day. Performance is very good. I think there are options for free downloads and testing.