I am bringing in data from google Campaign Manager into Google Cloud for processing in Google Bigquery. I know you are a Microsoft developer, but still your input will be valuable.
These are the tables here:https://developers.google.com/doubleclick-advertisers/dtv2/reference/match-tables#ads and it must be matched(data modeled) to connect/join the tables to retrieve data(https://developers.google.com/doubleclick-advertisers/dtv2/reference/file-format).
(i)Data comes in zipped files(text/csv)
(ii)Must retrieve relevant files from data dump and put them in one table. An example would be that "Activity" and so on are broken down by day. This means there is more than one "activity" file.
I need to select only files containing "activity" to create an Activity table in Bigquery. Have you ever used a wildcard function in your ETL? See attached to what i am referring to.
(iii) This means that all the tables must be retrieved from the data dump.
Also, what are your thoughts on ETL pipelines that retrieve data everyday, what can i do to enhance such data flows and also to reduce cost in Cloud? Please see:
I'd like to let you know that Impression, Click and Rich media files are generated hourly while the Activity files are generated daily. That means we will see 24 Impression, clicks and Rich media files and one activity file being generated per day in the DT bucket. And upon checking with my resources, I see that custom fields like creative field name, creative field number etc., are available in the Match tables that are generated daily in the DT bucket. However these Match tables are lookup tables for different types of IDs you find in the DT files.
You can refer to this article(https://developers.google.com/doubleclick-advertisers/dtv2/reference/match-tables) for more information on this. That said, please feel free to write back to this email thread if you have any other queries and I'll be happy to assist."
With this picture in your head, can you maybe give me an idea of an optimised and relatively easy data flow?