It's the Engineers

  • A good craftsman knows their craft and their tools and will understand when they need to acquire new tools and/or new skills.  I think a lot of people strive to be craftsmen, its just that there are some really noisy people who make it look like the norm is a goldfish attention span and a magpie attraction to shiny shiny.
    I think this comes back to Steve's earlier missive on expert beginners.  The noisy ones never seem to learn the art of the possible with any particular tool and may even distract some of the quieter ones who would happily dive deeper.

    I think many of us using SQL Server have been told it won't scale.  When we've been shown the use case and the data we've been flummoxed at how anyone has managed....to.....get....anything.....to.....run....that....slow.  Especially as the data volumes in question could easily be hidden under a thin slice of sod all.

    There are obviously use cases where an alternative approach is needed.  The last time I checked Yahoo had a 24,000 node Hadoop cluster crunching up data for their web search facility.  That isn't a SQL Server shaped problem.
    There are cases where I think Microsoft have dropped the ball with SQL Server.  I think there are huge opportunities to improve the full-text search engine.  Until it is improved then ElasticSearch is tolerable.
    iTunes uses one of the largest Cassandra clusters.  Again, not a SQL Server shaped problem.

    How many SQL Server performance problems in a legitimate use case are caused by an inappropriate use of SQL Server.  For example an OLTP system hit by trying to manage logging and session state in SQL Server?

    Thomas Kesjer did a presentation at SQL Bits illustrating where all the Big Data was coming from.  It wasn't from OLTP.  I seem to remember him saying that if every man, woman and child on the planet tweeted like a teenager  Twitter would only account for slightly more than a PB.

  • "Data mining"

    What does this term say to you about the industry?

    What whoud you say about a mechanic who does parts mining in a huge bag where all arriving parts go into when arrived?

    What would you say about a cachier who does change mining in a one big box containing all the coins?

    That would be stupid, right?

    Any one with piece of mind would do shelving and fetching, rather than dropping and mining.

    And still - it's a main stream, common practice in "data science".

    To me it's a clear indicator that there is very little science in that "science", if any.

    _____________
    Code for TallyGenerator

  • Sergiy - Tuesday, January 9, 2018 7:40 PM

    "Data mining"What does this term say to you about the industry?What whoud you say about a mechanic who does parts mining in a huge bag where all arriving parts go into when arrived?What would you say about a cachier who does change mining in a one big box containing all the coins?That would be stupid, right?Any one with piece of mind would do shelving and fetching, rather than dropping and mining.And still - it's a main stream, common practice in "data science".To me it's a clear indicator that there is very little science in that "science", if any.

    Hrrm. What's the definition of data mining?

    "Data mining is the computing process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems"

    Now, what's a generic definition of science?

    "The intellectual and practical activity encompassing the systematic study of the structure and behavior of the physical and natural world through observation and experiment."

    Some things to highlight here when meshing both definitions together: discovering patterns, study of structure, study of behavior, study of physical, through observation, and experiment.

    When you talk about parts mining and change mining, they really don't make sense in that context. It's sort of like you are trying to be one of those infomercials where some purposely fails with a product to prove another product is far more superior. Of course, it doesn't make sense for a cashier to change mine under normal circumstances. But, if a scientist is trying to understand a data set of say, 1 billion records, one might decide to deploy a method to analyze all of those records and data points in hopes to better understand the structure, discover patterns, and so forth for experimental purposes. Especially if those data points represents 1 billion coins that are stored across many different systems.

    Regardless, I think you're putting too much emphasis on terminologies. It's semantics. It really comes down to the person, their methodology, their purpose, and their findings.

  • I understand both Xsevensinzx and sergiys points and both I think are valid.

    In whatever discipline you are involved in you should be seeking to identify and learn your tools and professionals will have to cope with understanding their subject and the increased amounts of data that are available to them. The professionals that are able to understand both their discipline and their data will rise in their professions those that can't may be replaced by data scientists although I would think that even data scientists may concentrate on particular domains.

  • Xsevensinzx,

    "discovering patterns in large data sets" - what?

    You have stored your data for long enough so it managed to reach the level of large data sets and only after that decided to discover some patterns in it?

    Is it what you name a database???

    Sorry, it's rather a data dump.

    There is no science in a dump.

    Sadly, but it's all what those "document db's" are about: dump whatever comes from "upstairs" into a big ditch, and let somebody else "discover patterns".

    And it's kinda mainstream now.

    If there would be some minimal level of data science then you'd be able to build a data dictionary, and you'd need to read much smaller inbound data sets, sorting different parts of data into different bins, and having the patterns revealed as the data arrived, with no need for long and expensive digging through the dump.

    But that would be "too academic", wouldn't it?

    And nobody has time for that academic stuff in real life projects.

    Any PM will tell you that.

    Ironically, "academic stuff" is where the science is.

    And it's out of the scope of most of the projects in IT industry for a while.

    Cheap and easily accessible hardware destroys IT industry the same way as cheap and easily accessible food destroys human bodies in the 1st world, and as cheap and easily accessible Asian labour destroyed technological progress in manufacturing.

    _____________
    Code for TallyGenerator

  • Sergiy - Wednesday, January 10, 2018 5:46 AM

    Xsevensinzx,"discovering patterns in large data sets" - what?You have stored your data for long enough so it managed to reach the level of large data sets and only after that decided to discover some patterns in it?Is it what you name a database???Sorry, it's rather a data dump.There is no science in a dump.Sadly, but it's all what those "document db's" are about: dump whatever comes from "upstairs" into a big ditch, and let somebody else "discover patterns".And it's kinda mainstream now.If there would be some minimal level of data science then you'd be able to build a data dictionary, and you'd need to read much smaller inbound data sets, sorting different parts of data into different bins, and having the patterns revealed as the data arrived, with no need for long and expensive digging through the dump.But that would be "too academic", wouldn't it?And nobody has time for that academic stuff in real life projects.Any PM will tell you that.Ironically, "academic stuff" is where the science is.And it's out of the scope of most of the projects in IT industry for a while.Cheap and easily accessible hardware destroys IT industry the same way as cheap and easily accessible food destroys human bodies in the 1st world, and as cheap and easily accessible Asian labour destroyed technological progress in manufacturing.

    Sorry, no.

    It's very easy to assume that data that is being dumps is already well defined. In my example, I have billions of records across a wide set of fields. Some fields are even arrays of data that are always changing and left into the analysis to break out on their own. While it's true, there is a lot of relationships and fields already defined. The purpose of data mining is find the patterns in specific analysis you did not think to include. Why? Because when a analyst goes to analyze the data, they are often focused on answering specific data questions. They often overlook certain patterns because they did not think to factor them in. Discovery here is not to say you are telling the DBA that they did not do their job well because they did not define this relationship, but the opposite, to use algorithms that tell you that, "Hey, maybe you should also include weather temps and humidity levels with these products for this size and color as well include age and sex for these other products when looking for these sizes and colors."

    You know, things that make you go, "Oh, I didn't think to include that in my analysis!" because you were so blinded by human bias.

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply