Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Confused about how to design a new process (third week into new role) Expand / Collapse
Author
Message
Posted Wednesday, August 8, 2012 9:55 AM


Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Thursday, June 26, 2014 4:08 AM
Points: 708, Visits: 3,286
Good afternoon,

I'm not sure if I'm posting this in the right section but I could really do with some advice on how to progress this project. Any suggestions/ideas would be greatly appreciated!

I have been tasked with automating the following process:

Existing setup:

Our data services department offer various services such as:

1) Batch duplicate processing
2) Data cleansing, address standardisation / validation.
3) Matching routines (i.e. Individual, Business, Family & Household)

...etc

The way the current process works is like this:

* Company ABC will provide us with a mailing list and a job spec (e.g. cleanse names, validate addresses and merge Household level)
* A network folder is created for the job which contains the source data files, documentation and also a SQL folder
* A new database is created for the customer (if one doesn't already exist) which will store the staging and output tables.

The SQL folder for the job will contain a dozen scripts to perform various data cleansing/matching routines, for example:

1) Data Hygiene script
2) Hygiene Quality Check script
3) merge script
4) Post merge script
5) Final Hygiene script
6) Output script
7) Hygiene Quality Check output script

There are 7 data processing staff members using 4 SQL server instances with each instance holding over 30 databases. The database for a job could be on any of these 4 SQL servers! The 4 SQL Servers are 2005 but we also have SQL Server 2008 instances used for other projects.

What I have found so far is that the scripts in each folder are all kind of similar but each one is specific to the job it was used for. So for example a merge script in one folder will contain various update statements to perform further data cleansing for that job while another have less etc...

The same applies to the other scripts. All the folders contain their own versions of these scripts and they're all similar but not the same.

What I am tasked with is automate the following process:

1) Data processing staff imports the raw file into SQL (this bit I don't have to deal with as we already have a data loader program implemented which does the job well.
2) They gather the required scripts from various folders and manually run each one (adding further updates as they go along)
3) Another staff member performs the Quality Check after each step
4) If another job comes from the same customer then the same database can be used and the same scripts get reused (again they get modified if further data cleansing is required)

I'm sure you guys get the picture by now!

So what we have is 4 SQL Servers each with 30+ databases in each one (some customers will have multiple databases whilst others will have just one database for recurring jobs)

40+ SQL folders and each folder will have its own version of the merge, data hygiene scripts etc...


I am now in my third week in this job and I can't make head no tail of this mess! I've spent hours going through scripts, then I started looking at the merge process last week and now I'm back to the beginning.

Now I am beginning to stress big time! I'm really keen to impress but I've not seen anything like this and I'm beginning to doubt myself. I've spent the last 4 years working in a Service Desk as a SQL Support Analyst so I have strong problem solving skills but I'm struggling to design a process to automate this mess.

Basically the ultimate goal is to automate the process so a source table goes through:

1) Data Cleansing.
2) Data Standardisation
3) Data Merge

Nothing is committed at this stage but a rich report should be provided to the data processing staff member so they get counts of:

1) Count of record with missing title/forename/surname etc.
2) Count of cleansed forenames/surnames/addresses etc...
3) Count of individual records/Business records or Family & Household.
4) Sample of matches and probably matches based on a match score

..etc ...etc...

Once the user is happy with the result they can then commit the result which would ultimately merge it with the Single Customer View database for that customer.

Thanks for taking the time to read this and if anyone can offer help/advice I would be eternally grateful!

Regards.


---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Post #1341988
Posted Wednesday, August 8, 2012 10:49 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 6:13 PM
Points: 6,175, Visits: 7,254
It sounds, personally, like there's actually two primary stages to this process. Standard and Custom. Right now they're rolled into one.

What I would recommend you do is create a standard 'cleansing' protocol. This would be a primary load location (or a series of scripts deployed to the database that you've just built via Model) that would take care of all standard cleanup components.

Then, customs would be ran after the standard, hopefully keeping what's standard from now on and what's custom per client/data component separate.

Also, you might think about giving your users DDLAdmin on the databases so they can deploy their scripts as procs so you can more easily keep track of what belongs to whom. My guess is this is the way it's easiest for business because they've never been shown an alternative.

However, really, that's a big deal you're dealing with over there and it sounds like that's their primary source of revenue, at least for the department you're supporting. I'd tread lightly and first learn their process before you go and start making major alterations.



- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
Post #1342026
Posted Wednesday, August 8, 2012 11:16 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Friday, June 27, 2014 12:43 PM
Points: 15,444, Visits: 9,596
I agree with Craig.

Break the process down into smaller pieces first. Conquer each piece, one at a time.

In this kind of confusing situation, don't try to get a grasp on all of it all at once. Pick one piece, get a solid grip on that, then work out to other pieces from there.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1342050
Posted Wednesday, August 8, 2012 2:52 PM


Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Thursday, June 26, 2014 4:08 AM
Points: 708, Visits: 3,286
This would be a primary load location (or a series of scripts deployed to the database that you've just built via Model) that would take care of all standard cleanup components.


Firstly let me say thank you for your time reading my long post. I really appreciate your opinion.

You are right and I really like your suggestion.


Also, you might think about giving your users DDLAdmin on the databases so they can deploy their scripts as procs so you can more easily keep track of what belongs to whom. My guess is this is the way it's easiest for business because they've never been shown an alternative.


They already have and they create many staging tables and do all sorts of inserts and deletions. I've seen this in a lot of scripts.


However, really, that's a big deal you're dealing with over there and it sounds like that's their primary source of revenue, at least for the department you're supporting. I'd tread lightly and first learn their process before you go and start making major alterations.


You're spot on once again. I had a meeting with the Director of Software development and she made a point about the data services team being afraid of change. This explains why the the Director of Data Services is so keen to know what it is that I'm doing! She's asked for two updates this week already!

So I have some decisions to make.....

a) Do I start with small changes like enhance existing functions/sprocs?

For example yesterday I rewrote two data cleansing functions. The original version took 27 seconds to run on a data set of 250k record. My new version took 5 seconds! But this is a small cog in a large data hygiene/merge machine!

b) Do as you suggest and start by creating a generic data cleanse script?

I think I bit off more than I could chew taking on this role!


---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Post #1342236
Posted Wednesday, August 8, 2012 3:14 PM


Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Thursday, June 26, 2014 4:08 AM
Points: 708, Visits: 3,286
agree with Craig.

Break the process down into smaller pieces first. Conquer each piece, one at a time.


Once again thanks for your input G2 your time is much appreciated.

Please see my reply to Craig. What do you think?


In this kind of confusing situation, don't try to get a grasp on all of it all at once. Pick one piece, get a solid grip on that, then work out to other pieces from there.


Yes that makes sense. But the Director of Software Development also made a point about me not losing site of the big picture. What she meant was that I should design an over all system that works which doesn't have to be perfect. I don't think she realizes how complex this project is or am I the one making a big deal out of this project?!
I was nosing around the other day looking for some SQL scripts when I stumbled across folders containing SQL scripts and crude documentation of various failed attempts at automating the data hygiene process. Weird.... I've no idea why these were not used. Maybe these guys are scared of change!


---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Post #1342258
Posted Wednesday, August 8, 2012 3:15 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 6:13 PM
Points: 6,175, Visits: 7,254
Abu Dina (8/8/2012)

You're spot on once again. I had a meeting with the Director of Software development and she made a point about the data services team being afraid of change. This explains why the the Director of Data Services is so keen to know what it is that I'm doing! She's asked for two updates this week already!

That's not horribly uncommon anyway, particularly for a new employee when there's no direct peers (it doesn't sound like you have any, anyways).


a) Do I start with small changes like enhance existing functions/sprocs?
Absolutely! Ask to sit with one person through their day too, so you can see all of what they're doing end to end. What you want to do is understand the business workflow from the eyes of the current users. Find their pain points. Whatever you can do to remove those will give you more 'bank' for later when you start introducing things they're not comfortable with.

b) Do as you suggest and start by creating a generic data cleanse script?
Perhaps. This will require more buy-in from the business users (Data Services) but could be a long-term goal to make sure new 'common' changes are included from now on (as you'll update the model scripts) but customs can stay custom without affecting any other clients. You'll definately want to find some way of having a common build script that everyone's working from so that fixes you make today don't get lost tomorrow when someone copies from a different folder.

You're going to spend a lot of time here just playing catchup with the business, and understanding what the intention behind all of the seeming traditions are. If you can stay true to the intention without making anyone feel silly for not 'thinking of that before', particularly in an entrenched business process, you'll be ready to start making strides into making that into a well oiled machine.

Just be careful. Too much, too fast, and you'll end up with 'pushback' because of people being afraid of losing their jobs if you automate too much. This is something you'll want to speak with directly to the director about, and I recommend it be a face to face. What you're looking for, in the immediate sense, is ways to make them more productive without taking away their job entirely. That starts with the 'pain points' from earlier.



- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
Post #1342260
Posted Wednesday, August 8, 2012 3:41 PM


Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Thursday, June 26, 2014 4:08 AM
Points: 708, Visits: 3,286
Evil Kraig F (8/8/2012)
Abu Dina (8/8/2012)

You're spot on once again. I had a meeting with the Director of Software development and she made a point about the data services team being afraid of change. This explains why the the Director of Data Services is so keen to know what it is that I'm doing! She's asked for two updates this week already!

That's not horribly uncommon anyway, particularly for a new employee when there's no direct peers (it doesn't sound like you have any, anyways).


Yes, I'm working on my own on this project


Absolutely! Ask to sit with one person through their day too, so you can see all of what they're doing end to end. What you want to do is understand the business workflow from the eyes of the current users. Find their pain points. Whatever you can do to remove those will give you more 'bank' for later when you start introducing things they're not comfortable with.


I spent two weeks with the data services team (in fact I asked for the second week) so I can get a good understanding of the process. Maybe my original post sounds confused but I do have developed a decent knowledge of some of the processes but I'm no genius to be honest. Maybe I'm' just slow but there is still a lot for me to learn. You're making me feel better already. It sounds like I have made some right decisions but I'm just panicking at what I consider lack of progress. I can tell you straightaway that one of the problem they have are slow running queries. Some merge scripts take hours to run. I've already posted this n a different thread and another SSC member identified some bad coding in there so there's already things that I can do to existing scripts to make them run faster etc..


b) Do as you suggest and start by creating a generic data cleanse script?
Perhaps. This will require more buy-in from the business users (Data Services) but could be a long-term goal to make sure new 'common' changes are included from now on (as you'll update the model scripts) but customs can stay custom without affecting any other clients. You'll definately want to find some way of having a common build script that everyone's working from so that fixes you make today don't get lost tomorrow when someone copies from a different folder.

You're going to spend a lot of time here just playing catchup with the business, and understanding what the intention behind all of the seeming traditions are. If you can stay true to the intention without making anyone feel silly for not 'thinking of that before', particularly in an entrenched business process, you'll be ready to start making strides into making that into a well oiled machine.

Just be careful. Too much, too fast, and you'll end up with 'pushback' because of people being afraid of losing their jobs if you automate too much. This is something you'll want to speak with directly to the director about, and I recommend it be a face to face. What you're looking for, in the immediate sense, is ways to make them more productive without taking away their job entirely. That starts with the 'pain points' from earlier.


Wise words.. wow, I have so much to learn. You know at first I wasn't sure if I should post my problem on this site but I'm so glad I did.

Thank you.


---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Post #1342275
Posted Wednesday, August 8, 2012 3:54 PM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 5:09 PM
Points: 23,089, Visits: 31,635
I can relate but in a different way. One of the bloodest projects I had was automating an invoicing process at a former employer. I was very familiar with system and it still killed me while trying to automate the process. We are talking a COBOL application (spagetti COBOL to be precise). I thought it would take 3 months, took 6 months before it went into production.



Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #1342287
Posted Wednesday, August 8, 2012 3:58 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 6:13 PM
Points: 6,175, Visits: 7,254
Lynn Pettis (8/8/2012)
I can relate but in a different way. One of the bloodest projects I had was automating an invoicing process at a former employer. I was very familiar with system and it still killed me while trying to automate the process. We are talking a COBOL application (spagetti COBOL to be precise). I thought it would take 3 months, took 6 months before it went into production.


I have no idea what you're talking about Lynn... My current contract was a 3 month 'write up some simplistic SSIS tasks to move data around'.

1.5 years later we deployed a warehouse that actually delivered the required specifications. THEN the users started testing... *facepalm*

Basically we're sharing a horror story or two so you realize they're out there. Business looks at problems very differently than we do and expectations can run wildly offcenter to the reality of the requirements. The trick is presenting them effectively to the personalities involved.



- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
Post #1342291
Posted Thursday, August 9, 2012 6:51 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Friday, June 27, 2014 12:43 PM
Points: 15,444, Visits: 9,596
On the point of "don't lose sight of the big picture", I'll just say that this is one of the biggest mistakes people make, assuming they can't do anything with the details till the big picture has been fully "grocked". This leads to more decision-deadlock than any other factor in management.

Especially when you consider that "the big picture" is frequently the equivalent of a bunch of random paint thrown through a jet engine blast onto a canvas, and called "art": http://www.youtube.com/watch?v=SFG1LOy0ehk


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1342600
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse