SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Self-Healing ETL


Self-Healing ETL

Author
Message
Steve Jones
Steve Jones
SSC Guru
SSC Guru (602K reputation)SSC Guru (602K reputation)SSC Guru (602K reputation)SSC Guru (602K reputation)SSC Guru (602K reputation)SSC Guru (602K reputation)SSC Guru (602K reputation)SSC Guru (602K reputation)

Group: Administrators
Points: 602364 Visits: 21101
Comments posted to this topic are about the item Self-Healing ETL

Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com
Dave Poole
Dave Poole
SSC Guru
SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)

Group: General Forum Members
Points: 60706 Visits: 3986
Back in the days of DTS it was possible to write a DTS package that altered itself dynamically at run time. We tended to use this for things that are now standard components in SSIS but still, the ability to adjust to situations at run-time is a great boon.

I have to ingest JSON documents into our data warehouse and one of the challenges is detecting when a new key:value has appeared in the files supplied. Technically everything I need to do is possible the bit that is sad is that I am building a technical solution to cope with a lack of communication.

Another useful facility is the ability to detect column orders in files and even have a thesaurus for column names. 3rd parties supplying data may be supplying Excel spreadsheets as a stop-gap until a formal solution is put in place. The thesaurus facility allows for CustomerNumber, CustomerNo, Customer_Number etc. The column detector allows for the CustomerNumber field (or any other for that matter) appearing in the 1st column on one supply and in some other column on subsequent supplies.

LinkedIn Profile
www.simple-talk.com
Dalkeith
Dalkeith
SSCrazy
SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)

Group: General Forum Members
Points: 2872 Visits: 1384
I think this would be an excellent use of AI.

I suspect a lot in here have worked up their own tools to speed up the tedious task of matching things like columns or normalizing up or down to assist in writing necessary updates and appends. I am not completely familiar with the products out there available for this mainly because it is an infrequent task for me and it seemed as easy to write my own functions as buy an actual product.

For me large scale ETL is still a somewhat rare request and any AI data manipulation automation is being directed at daily tasks at present.
xsevensinzx
xsevensinzx
SSC-Insane
SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)

Group: General Forum Members
Points: 21186 Visits: 5792
This is why I switched primarily to Python for all of my ETL so we can do things like this on top of easily doing dynamic pivots and headers. At this stage though, been moving away from headers all together though.
pbelter
pbelter
Forum Newbie
Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)

Group: General Forum Members
Points: 1 Visits: 8
I am not sure if this applies to the subject matter, but I have found through experience of moving data across set boundaries (Customer Systems to Supplier Systems) can be accomplished by using existing tools such as regular expressions. In my case I have to decipher information provided by the customer to convert it to data that the supplier systems can identify with. In most cases it requires working with the engineers who have to use the data for existing processes. Terminology is the biggest hurdle to overcome in these situations. However if an Normality can be identified in the format of the data when presented in a string, regular expression's can be used to identify potential gaps in the deciphering process and be presented to the engineers for investigation and resolution. Sometimes issues can be resolved on the code development side, other times collaboration with the Customer Systems Engineers is needed. These situations are usually experienced when Process Changes occur (Either on the Customers Systems or the Suppliers Systems) where the Normality of the formatted data has been effected. Not every data system is designed the same, obviously, however the intended purpose of the data must exist.
For example - The intended Purpose of the data is to define an assembly process of an axle. The data must express a part, a quantity, a description. So if my description contains "WHEEL" then we can say a Normality is every axle gets a wheel. So on and so on. Is this Self Healing? Most likely not, but to prevent the need to self heal, to me is the more logical choice.
Dave Poole
Dave Poole
SSC Guru
SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)

Group: General Forum Members
Points: 60706 Visits: 3986
xsevensinzx - Thursday, October 11, 2018 5:37 AM
This is why I switched primarily to Python for all of my ETL so we can do things like this on top of easily doing dynamic pivots and headers. At this stage though, been moving away from headers all together though.

What do you use to co-ordinate the transformations? Apache Airflow or similar?


LinkedIn Profile
www.simple-talk.com
Matt Miller (4)
Matt Miller (4)
SSC Guru
SSC Guru (111K reputation)SSC Guru (111K reputation)SSC Guru (111K reputation)SSC Guru (111K reputation)SSC Guru (111K reputation)SSC Guru (111K reputation)SSC Guru (111K reputation)SSC Guru (111K reputation)

Group: General Forum Members
Points: 111227 Visits: 21706
Dalkeith - Thursday, October 11, 2018 1:43 AM
I think this would be an excellent use of AI.

I suspect a lot in here have worked up their own tools to speed up the tedious task of matching things like columns or normalizing up or down to assist in writing necessary updates and appends. I am not completely familiar with the products out there available for this mainly because it is an infrequent task for me and it seemed as easy to write my own functions as buy an actual product.

For me large scale ETL is still a somewhat rare request and any AI data manipulation automation is being directed at daily tasks at present.

It's really not AI at all - it's about introducing reflection and introspection techniques into your ETL. Programming languages have used this/made this available for some time now, and while it does take some care to know how to match things up (i.e. find the existing elements and make sure that they are still "the same") vs knowing how to detect new elements and deciding what to do with those new items, this is an old concept in a new setting.

As Steve mentioned - while it is possible, the first question is whether you can really afford to make some guesses "on the fly": what is the fallout of making the wrong choice/a bad matchup? what is the performance hit? For example - I might do this on demographic/background info (little to no risk), but I would probably lean towards "failing" the job if this were financial or regulatory data.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
xsevensinzx
xsevensinzx
SSC-Insane
SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)

Group: General Forum Members
Points: 21186 Visits: 5792
David.Poole - Friday, October 12, 2018 1:17 AM
xsevensinzx - Thursday, October 11, 2018 5:37 AM
This is why I switched primarily to Python for all of my ETL so we can do things like this on top of easily doing dynamic pivots and headers. At this stage though, been moving away from headers all together though.

What do you use to co-ordinate the transformations? Apache Airflow or similar?

What I built is basically Airflow from scratch. This is because Airflow was not really built with support for Azure at the time.

Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search