SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Common Data Challenges


Common Data Challenges

Author
Message
Steve Jones
Steve Jones
SSC Guru
SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)

Group: Administrators
Points: 682202 Visits: 21588
Comments posted to this topic are about the item Common Data Challenges

Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com
DinoRS
DinoRS
SSCommitted
SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)

Group: General Forum Members
Points: 1781 Visits: 823
I'm not in the business as long as you are Steve - roughly 1/3rd of your time - and I only very recently started to care about data outside Performance and DR but from supporting Developers previously, I think some of the challenges remain the same, back then, today and potentially tomorrow.

That would be first and foremost data management. Remember all those functions, procedures, expressions we've had to form back then to get Excel exported dates back to something useful? We still do and we potentially still will be doing in the future. Another problem I would call is data sourcing: Do I really need 20 exported CSV Files to get all the data I want to process or have requirements changed that much so I could potentially just get 2 or 3 large CSV Files with all necessary data? After all we are processing more data today than 10 years ago.

And with processing more data I think we will see more transformations towards different approaches of data and data processing. Is the data a snapshot? -> Most likely your plain old ETL Process for the next decades to come. Is the data a continuous stream? -> As we certainly want to remember the interesting things from our data streams, we'll keep those but we still want to process the stream continuosly, for things like that we will see much more use of things like Hadoop and Machine Learning so yeah, we will and do see a lot of new challenges waiting for us. Might be a little bit different as we might not be looking that much at index optimizations anymore but rather wether our ML algorithms do enable our business to make the decisions reliably to our advantage or not?
Dave Poole
Dave Poole
SSC Guru
SSC Guru (68K reputation)SSC Guru (68K reputation)SSC Guru (68K reputation)SSC Guru (68K reputation)SSC Guru (68K reputation)SSC Guru (68K reputation)SSC Guru (68K reputation)SSC Guru (68K reputation)

Group: General Forum Members
Points: 68563 Visits: 4110
I started my career in 1988. I've found that a lot of the problems with data are as a result of training gaps. Each generation suffers the same education gaps so are doomed to keep making the same mistakes.

The problems I see are not isolated to IT. I do not believe that mankind has adapted to the internet, the cultural implications, behaviours, ways of working etc and that exacerbates the recurrence of old mistakes. We have the capability to drive at a million miles an hour but the chassis is older than you'd believe and no one has upgraded the brakes!

LinkedIn Profile
www.simple-talk.com
Michael Lysons
Michael Lysons
SSCertifiable
SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)

Group: General Forum Members
Points: 6040 Visits: 1659
Application and database design that allows users to enter bad data. The users are (generally) not to blame, they will take the path of least resistance, and we end up with bad data that needs addressing.

I work in the NHS, and often some requirement will arise that the hospital's Patient Administration System (PAS) can't properly handle, so a workaround is required. For example, I work at hospital A, and hospital B decides to use our spare capacity to do some of their clinical work - our PAS has to record this activity, usually by storing some identifying data in a data item not designed for that purpose. Which ultimately means the data warehouse receives data for a different hospital, which then needs to be stripped out of all operational datasets etc. But, we have to ensure that hospital B can see that data (this can mean various things from direct to indirect access) so they can receive payment for it.

These are regular challenges and at a high level they haven't changed much over the years, although technology changes have occurred, e.g. HL7 interfacing (and interfacing in general) is a much bigger part of the work than it was 10 years ago.
Steve Jones
Steve Jones
SSC Guru
SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)

Group: Administrators
Points: 682202 Visits: 21588
David.Poole - Thursday, March 7, 2019 4:11 AM
I started my career in 1988. I've found that a lot of the problems with data are as a result of training gaps. Each generation suffers the same education gaps so are doomed to keep making the same mistakes.

The problems I see are not isolated to IT. I do not believe that mankind has adapted to the internet, the cultural implications, behaviours, ways of working etc and that exacerbates the recurrence of old mistakes. We have the capability to drive at a million miles an hour but the chassis is older than you'd believe and no one has upgraded the brakes!


Agree. Lots of cultural issues, and lots of us trying to adapt to the newer way of being connected to data.

Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com
Steve Jones
Steve Jones
SSC Guru
SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)

Group: Administrators
Points: 682202 Visits: 21588
Michael Lysons - Thursday, March 7, 2019 4:37 AM
Application and database design that allows users to enter bad data. The users are (generally) not to blame, they will take the path of least resistance, and we end up with bad data that needs addressing.

While I agree, I also know that so many business processes aren't as tightly defined as we would like. Usually because we didn't account for the chaos of the world when we built the system. As a result, I've gone more towards having optional fields and update capabilities that allow users to clean up data and move it around later. The "every field is x" or we need all this data was a trend in the 80s/90s and it didn't work out well. Too many problems from systems trying to force users to change their work rather than systems adapting to users.

I'd argue that we need better app design that creates flexibility to meet the problems of the world.

Of course, this means still constant data challenges for us to deal with.


Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com
ZZartin
ZZartin
One Orange Chip
One Orange Chip (29K reputation)One Orange Chip (29K reputation)One Orange Chip (29K reputation)One Orange Chip (29K reputation)One Orange Chip (29K reputation)One Orange Chip (29K reputation)One Orange Chip (29K reputation)One Orange Chip (29K reputation)

Group: General Forum Members
Points: 29484 Visits: 18336
One super annoying trend I'm seeing more of is people wanting to use their real time and or EAV interfaces for bulk data transfer. I mean I'm sorry but when you 90% of your fields are defined in EAV or your real time interface is some bloated JSON/XML and you want to transfers hundreds of thousands or more records every day through it in a narrow window you'renot going to have a good time.
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (975K reputation)SSC Guru (975K reputation)SSC Guru (975K reputation)SSC Guru (975K reputation)SSC Guru (975K reputation)SSC Guru (975K reputation)SSC Guru (975K reputation)SSC Guru (975K reputation)

Group: General Forum Members
Points: 975208 Visits: 49307
Nothing has changed with the data itself when it comes to problems.

What has changed is the frequency of those problems and the desperate hacks people try to hammer together either themselves or buy implementing other peoples hacks in the form of 3rd party software, shrink wrapped or not.

The frequency has increased simply because data is more prevalent than it ever was simply due to the growth of the use of computers and the notions people have about what data is important.

The hacks have increased because of the waves of people that never used data before that have entered the field because it's both a prevalent field and a lucrative field. Ironically, it's like a bad drug habit. The more people do it, the more they need to do it because a lot of the people that have problems importing, analyzing, using, and storing the data are also the same ones generating the data for others.

If you don't think so, just look at the questions/problems posed on these and other forums, database related or not.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

When you put the right degree of spin on it, the number 318 is also a glyph that describes the nature of a DBAs job. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Rod
Rod
SSC-Dedicated
SSC-Dedicated (32K reputation)SSC-Dedicated (32K reputation)SSC-Dedicated (32K reputation)SSC-Dedicated (32K reputation)SSC-Dedicated (32K reputation)SSC-Dedicated (32K reputation)SSC-Dedicated (32K reputation)SSC-Dedicated (32K reputation)

Group: General Forum Members
Points: 32469 Visits: 2887
I'm involved in a re-write of an old MS Access application into a WPF app (at least, there may be more). I've discussed this in these forums before. The Access app is actually a front-end to a SQL Server database. (Long before I got here, it used to all be within Access, but someone in the past took the time to migrate the data to a SQL Server database, but they left the Access front-end.) At the moment, though, I'm on the side lines as two BA's are busy analyzing the data, with a view towards replacing it with another database. The article you referenced Steve, it points to another article titled, How an Agile Approach Can Help Solve Your Data Problems. The article points out a waterfall approach to database/data design vs. an agile approach. If that article is correct, we are very much following a waterfall approach. Although, they've been doing waterfall for longer than I've been working.

Anyway, I don't want to defend the old database. I don't feel anything for or against it. I just wonder why they're even bothering to rearchitect the database?

Kindest Regards,Rod
Connect with me on LinkedIn.
Dave Poole
Dave Poole
SSC Guru
SSC Guru (68K reputation)SSC Guru (68K reputation)SSC Guru (68K reputation)SSC Guru (68K reputation)SSC Guru (68K reputation)SSC Guru (68K reputation)SSC Guru (68K reputation)SSC Guru (68K reputation)

Group: General Forum Members
Points: 68563 Visits: 4110
ZZartin - Thursday, March 7, 2019 7:42 AM
One super annoying trend I'm seeing more of is people wanting to use their real time and or EAV interfaces for bulk data transfer. I mean I'm sorry but when you 90% of your fields are defined in EAV or your real time interface is some bloated JSON/XML and you want to transfers hundreds of thousands or more records every day through it you're in a narrow window not going to have a good time.

I feel your pain. I haven't done anything with JSON in SQL Server but I know that JSON Path, unlike XPATH does not have a getParent() equivalent function which means that bulk ingestion of JSON containing arrays produces two unconnectable recordsets.
The approach we've had to address it is import a file containing many JSON documents of the same document type and loop over each document triggering multiple extractions. In effect reinventing RBAR.
I am currently experimenting with ways of bulk ingesting the non-array part of the document, which is very fast, and then submitting those documents containing arrays to the RBAR process.

I've had some luck with a library called YAJL but no son of mine will every be called JASON.


LinkedIn Profile
www.simple-talk.com
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum









































































































































































SQLServerCentral


Search