Brush Up on Your ETL Skills

  • Steve Jones - SSC Editor

    SSC Guru

    Points: 714668

    Comments posted to this topic are about the item Brush Up on Your ETL Skills

  • xsevensinzx

    One Orange Chip

    Points: 25531

    IP address, cookies, gender, zip codes, even religious views to name a few.

    87% of American adults could be accurately and uniquely identified using just three data points — date of birth, gender, and a five-digit zip code — using publicly available census data, a sobering statistic that highlights why such robust pseudonymization measures are needed, particularly in light of large-scale data breaches such as the Equifax security incident.

    In my world, it's a lot of ETL. Luckily, as I'm working mostly with data scientist, both the machine learning and ETL can co-exist in Python. I think even before then, I was still eager to use Python over T-SQL and SSIS just for the mere fact that you can setup distributed processing in Python very easy with ETL pipelines where each data stream can be processed in a share nothing environment while also working holistically together. The only issue is that some of these scripting languages are not the fastest tool in the box when compared to other options, but generally work out in the end because they can scale horizontally where others can only scale up.

  • David.Poole

    SSC Guru

    Points: 75083

    For GDPR Article 17 "Right of Erasure" may require you to have a mechanism to delete all forum posts and private messages for a particular user.
    I don't think  "Right of Erasure" would cover articles written by someone in the unlikely event of an author requesting erasure.
    "Right of Erasure" does not trump the legal requirement to keep financial records for the legally mandated time period
    If Redgate haven't done so already it is wise to get advice from legal.

    A subject access request for subscribers would cover anything in their profile but as the site provides the mechanism to see this it is effectively self service.  If there is nothing beyond what people can self-serve then it may be as simple as having an explicit GDPR page that states how a requester can retrieve their own data.

    Article 20 "Right to data portability" is an interesting one.  It doesn't limit its scope but I think historically it re-enforces consumers rights to swap energy suppliers, broadband/mobile providers and now banking providers.  In the context of SQLServerCentral it could be a mechanism to download a subscriber's profile by that subscriber.

    Another interesting wrinkle is what do you do when not all your data is in SQL Server?  Does something like Apache Presto (implemented in AWS as Athena) provide an answer to this and serendipitously to a general business problem?

    As general advice to people facing GDPR I would say take a good hard look at any company file shares, email in-boxes, drop-box/One Drive type accounts, work-stations, Sharepoint etc.  In the SQL Server world we have a structured data store with a defined retention strategy, purge, archive and backup.  On company file shares and mail-boxes there is God knows what, God knows where and in God knows what format.
    If your HR department takes a scan of your passport when you first join the company then they need to have defined processes in place to purge those images when they are no-longer in use.  Unless you have some form of auditing software such as http://www.groundlabs.com which has the capability to perform OCR on images it is going to be very hard to identify what your exposure and risk is.

  • chrisn-585491

    SSCoach

    Points: 15846

    You say "ETL specialist", I say "Data Janitor".  😀

  • below86

    SSChampion

    Points: 11212

    We use SSIS for all of our ETL.  I would prefer that SSIS be used more for EL and not (T)ransform.  Setting up the SSIS to Extract and Load the data to a 'work' table, then using SQL to transform the data. In everything I've done so far in my career I haven't found any 'Transform' that I couldn't do in SQL.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us

  • Steve Jones - SSC Editor

    SSC Guru

    Points: 714668

    David.Poole - Wednesday, February 21, 2018 1:48 AM

    For GDPR Article 17 "Right of Erasure" may require you to have a mechanism to delete all forum posts and private messages for a particular user.
    I don't think  "Right of Erasure" would cover articles written by someone in the unlikely event of an author requesting erasure.
    "Right of Erasure" does not trump the legal requirement to keep financial records for the legally mandated time period
    If Redgate haven't done so already it is wise to get advice from legal.

    Article 20 "Right to data portability" is an interesting one.  It doesn't limit its scope but I think historically it re-enforces consumers rights to swap energy suppliers, broadband/mobile providers and now banking providers.  In the context of SQLServerCentral it could be a mechanism to download a subscriber's profile by that subscriber.

    Maybe. Our business is providing answers to people. It's possible an entity could get a right of erasure, but I doubt it for the things we share. We'd want a legal decision to the contrary.

    For Article 20, the profile is a good example where we might need to provide that for someone, though I doubt we'd get a request. We keep fairly little information here that isn't public.

  • ManicStar

    SSCoach

    Points: 17992

    chrisn-585491 - Wednesday, February 21, 2018 6:08 AM

    You say "ETL specialist", I say "Data Janitor".  😀

    Yep...

  • mjh 45389

    SSCertifiable

    Points: 5695

    "Right of Erasure" is interesting (maybe not)! One of my best friends died suddenly and very, very unexpectedly a few years ago. However, he lived on on the Internet for a long time as his family struggled to get him removed from social media like FriendsReunited and LinkedIn. They were less bothered about professional forums similar to this one...

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply