How to store a central repository of data

  • I am fairly new to database design and haven't been able to find a lot on this question.

    I work for a marketing and advertising firm. We don't have a "transaction" database since all of our data comes from our clients. I'm responsible to load it all into a database so we can remove duplicates, clean the data, and run analytics.

    We normally create a separate database for each client for security reasons (decision made by IT security). But most of these databases has some things in common, mostly address data.

    I want a central repository of things like zip code data (zip codes linked to county, regions, etc.), country codes, state codes, etc. Usually, this repository would be used just for the initial insert and cleanup. I'd have to enter it into the client's database the way the client wants to see it.

    Any advice, thoughts, or references on how this is best implemented? Are there problems associated with linking data like this between databases? Is there an easier way to keep a single zip code table but have it be accessible to several address tables?

  • sbaer (5/6/2014)


    I am fairly new to database design and haven't been able to find a lot on this question.

    I work for a marketing and advertising firm. We don't have a "transaction" database since all of our data comes from our clients. I'm responsible to load it all into a database so we can remove duplicates, clean the data, and run analytics.

    We normally create a separate database for each client for security reasons (decision made by IT security). But most of these databases has some things in common, mostly address data.

    Have a talk with the security guys and find out exactly what the requirements are, what you must comply to etc. A mutual understanding on this is imperative!

    I want a central repository of things like zip code data (zip codes linked to county, regions, etc.), country codes, state codes, etc. Usually, this repository would be used just for the initial insert and cleanup. I'd have to enter it into the client's database the way the client wants to see it.

    If possible, establish a single data collection point. From there it is easy to distribute the client's data to the appropriate destinations. In the central repository, maintain all the the referential data that is NOT client specific.

    Any advice, thoughts, or references on how this is best implemented? Are there problems associated with linking data like this between databases? Is there an easier way to keep a single zip code table but have it be accessible to several address tables?

    All reporting repositories should be client specific!

    Hopes this helps, feel free to inquire further.

    😎

  • If possible, establish a single data collection point. From there it is easy to distribute the client's data to the appropriate destinations. In the central repository, maintain all the the referential data that is NOT client specific.

    Would this be a correct understanding of the above:

    All of my "central repository" is stored in a single database called "Staging". As data is received, it flows through tables in Staging before getting to the final destination of the customer specific database. While in Staging, I can link to any of the central repository necessary, transforming data fields as necessary for the specifics of each customer. Once all of the cleanup and look up is complete, the data can be transferred to the customer specific database with the values required by the customer.

    It would mean Staging would be a very large database with several schemas, stored procedures, and possibly hundreds of tables. But the data should reside there only temporarily as it is transformed so the physical size of the database would never be very large. I think the security team would buy off on this considering the data is removed to a more "secure" location once it is cleaned.

    If I've missed anything, let me know. Otherwise, this seems like a logical and simple solution.

  • sbaer (5/6/2014)


    If possible, establish a single data collection point. From there it is easy to distribute the client's data to the appropriate destinations. In the central repository, maintain all the the referential data that is NOT client specific.

    Would this be a correct understanding of the above:

    All of my "central repository" is stored in a single database called "Staging". As data is received, it flows through tables in Staging before getting to the final destination of the customer specific database. While in Staging, I can link to any of the central repository necessary, transforming data fields as necessary for the specifics of each customer. Once all of the cleanup and look up is complete, the data can be transferred to the customer specific database with the values required by the customer.

    It would mean Staging would be a very large database with several schemas, stored procedures, and possibly hundreds of tables. But the data should reside there only temporarily as it is transformed so the physical size of the database would never be very large. I think the security team would buy off on this considering the data is removed to a more "secure" location once it is cleaned.

    If I've missed anything, let me know. Otherwise, this seems like a logical and simple solution.

    The staging would not be too large, the data would be removed from one import to another. It will need an amble breathing space but there should be no overall growth.

    Several hundreds or even thousands of tables and procedures are no problem, establish a good logical separation via client_schema_object naming convention.

    Otherwise, I think you got it!

    😎

  • sbaer (5/6/2014)


    I am fairly new to database design and haven't been able to find a lot on this question.

    I work for a marketing and advertising firm. We don't have a "transaction" database since all of our data comes from our clients. I'm responsible to load it all into a database so we can remove duplicates, clean the data, and run analytics.

    We normally create a separate database for each client for security reasons (decision made by IT security). But most of these databases has some things in common, mostly address data.

    I want a central repository of things like zip code data (zip codes linked to county, regions, etc.), country codes, state codes, etc. Usually, this repository would be used just for the initial insert and cleanup. I'd have to enter it into the client's database the way the client wants to see it.

    Any advice, thoughts, or references on how this is best implemented? Are there problems associated with linking data like this between databases? Is there an easier way to keep a single zip code table but have it be accessible to several address tables?

    I had the same problem and after long searching I found this software[/url].

    It saved me a lot of time...

  • Thanks for the reply. I actually have a purchased zip code database. Just having the data doesn't mean you have it in the best place. What I needed was to establish a process where that data could be used by several databases, run against several different kinds of data coming in, and have it standardized so I didn't have to load it into 10 different places just to have access to it.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply