Data Has a Dollar Value

It seems that every year we see new ways of analyzing information that companies are adopting. In this era of Big Data, with the challenges of real time BI analysis of (often) streaming sets of data, companies search for ways to handle the load. We had map-reduce methods to process bits a few years back and lately there has been a growing popularity of machine learning (and deep learning) used to gain insights from the massive data sets we have.

The problem is that in trying to analyze data, we find that we often don't have enough data in many cases. While some parts of our organizations face a surplus of data, others trying to provide an analysis might face a shortage, at least for some types of data. This might be especially true when business people want to engage in a new type of business or a new way of working with customers.

The last couple years have given rise to a number of companies that actually are gathering and selling labeled data, or even generating synthetic data that can be used to build and train models for analysis. As we look to let machines learn to solve some problems on their own, we need to provide them lots of data, which has become big business. I have heard of companies paying six or seven figures a year to get data sets for their data scientists.

In some sense, as noted in this keynote, data is the more valuable part of these systems. Staff matters, and certainly the software and models are important, but the data is key. Good data, with lots of features, can produce a better trained system than poor data. Many of us that work in traditional software know this as well. If we use poor data sets in development, with limited values, and not in the skew and selectivity that we'll see in the live system, we often build lower quality software with more bugs.

In some sense, I think that our data is more valuable than we realize, and far too many developers don't take advantage of using the data our organizaition does have to build features and properly test them. Actually, too few of us actually test things well, but certainly we often can't without a good set of data. I've been disappointed with random generators, though they are useful in that they can find unexpected issues from the random values, including NULLs, that will creep into systems. I really wish we had better subsetting tools that would help us use a portion of our production data. Redgate is working on tooling, but I'd think this was a problem we'd have gotten better at solving, between software people and database staff.

I've had a nice career working with data, and I'm glad that the recognition of the value of data has continued to grow through the years. Now I'd like to see us actually start to emphasize the importance of producing and using more useful data sets when we build software, whether in traditional means or using machine learning techniques. My guess is we'll get more useful and better quality software if we do.

Contract or Perm

by Steve Jones

SQLServerCentral.com

Editorial

If you are accepting a DBA position, does it make sense to work as a contractor or permanent employee?

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2007-11-21

203 reads

Discuss

Mini-Me

by Steve Jones

SQLServerCentral.com

Editorial

Will the next version of Windows be a "Mini-Me" version of Vista? Who knows, and it's too early to tell, but apparently there's a mini-kernel version of Windows 7, the one after Vista, which fits into 25MB on disk. That's a touch lower than the 4GB that Vista takes up. Granted it's not a full […]

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2007-10-25

105 reads

Discuss

An Hour in Time

by Steve Jones

SQLServerCentral.com

Editorial

Daylight Savings time switches a little later this year. In fact it's November 4th this year, after having been in October for all of my life. In case you don't remember which way we move the clocks, here's a saying: Spring forward, fall back.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

5 (1)

You rated this post out of 5. Change rating

2007-10-17

216 reads

Discuss

Software is Like Building a House

by Steve Jones

SQLServerCentral.com

Editorial

One of the really classic analogies in software is that it's like building a house. You have a foundation, multiple teams, lots of contractors that specialize in something, etc. And it's an analogy that's debated as to its relevance over and over. I won't go into the correctness of this analogy, but I wanted to comment on it.