Help on database design - Data Aggregator

  • I have a data aggregator application which should be able to import data with

    different structures (mostly CRM data) such as Customer,invoice, campaigns, etc..

    we can't know the fields or structures in advance.

    The application should cater for data normalization like duplicates detection, and it should provide data mining and dash boarding on the available data.

    I want to see what is the best architecture to use for the database.

    Should i update the database to create tables / append new fields whenever a new structure or field is encountered Or use XML data type to store the data? Any other suggestion

    Thanks

  • yasserbatsh (10/10/2013)


    I have a data aggregator application which should be able to import data with

    different structures (mostly CRM data) such as Customer,invoice, campaigns, etc..

    we can't know the fields or structures in advance.

    The application should cater for data normalization like duplicates detection, and it should provide data mining and dash boarding on the available data.

    I want to see what is the best architecture to use for the database.

    Should i update the database to create tables / append new fields whenever a new structure or field is encountered Or use XML data type to store the data? Any other suggestion

    Thanks

    Normalized design and accomodating data of unknown structure is mutually exclusive. If the plan is to leverage SQL Server or another relational database, then the ETL process should map (conform) fields from these various data sources to properly normalized tables and columns. Technically, you could create one big table and contain each record in a column of type XML, or you could implment sparse columns, but there are better solutions.

    If what you need is a solution for "dumping" unstructured data into a database, and then fetching it using a key, then I'd reccomend that you look into MongoDB or some other document centric database solution.

    Regardless of the database platform, even the requirement of duplicate detection can not be properly done, unless you know what "fields" compose a unqiue key. Perhaps you could use a hash key across the entire set of fields.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Thank you Eric for your reply.

    i will indeed need to leverage on sql server for my solution.

    What do u think is a better solution to tackle such requirement other than xml data types or sparse columns?

    Thanks.

  • yasserbatsh (10/10/2013)


    Thank you Eric for your reply.

    i will indeed need to leverage on sql server for my solution.

    What do u think is a better solution to tackle such requirement other than xml data types or sparse columns?

    Thanks.

    I have a data aggregator application which should be able to import data with

    different structures (mostly CRM data) such as Customer,invoice, campaigns, etc..

    we can't know the fields or structures in advance.

    When you say "data aggregator", I'm thinking something like a data mart, a database that aggregates data from multiple sources and possibly different structures. To leverage the strengths of a relational database like SQL Server, I would reccomend creating a fixed number of conformed and normalized tables. For exmple, there should be one (and only one) Customer table, Invoice table, Campaign table, etc.

    Your ETL process in the SQL Server environment would most likely be SSIS. It should have one field mapping (DataFlow task or package) for each data source, and the destination would be the appropriate conformed table. So, you may have multiple data sources, each with a different Customer structure and attributes, but you decide upfront which attributes (columns) you want to import into your tables.

    For example, when you query the Invoice table, the only thing differentiating records originating from 'Customer A' versus 'Customer B' would be something like a Client_ID column. The other columns like Invoice_ID, Product_ID, Quantity, etc. should be the same, even if the originating data from each client is structured very differently.

    If each customer has their own product coding scheme, but any code can map to an industry standard SKU, then you would also ideally perform mapping in the ETL process to conform customer specific product codes into SKU codes.

    A document centric database like MongoDB would allow you to persist and index the source data as is. However, to performing operations like reporting across multiple customers, running ad-hoc queries, or integrating with other databases would be much more limited than what you could do with a SQL Server database where data for all customers is conformed and normalized into relational tables.

    It really depends on the pourpose of this database and the case usage requirements.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • When you say "data aggregator", I'm thinking something like a data mart, a database that aggregates data from multiple sources and possibly different structures.To leverage the strengths of a relational database like SQL Server, I would reccomend creating a fixed number of conformed and normalized tables. For exmple, there should be one (and only one) Customer table, Invoice table, Campaign table, etc.

    what if the user wants to import a new data structure that is not normalized yet in database ...say the data is for assets... is it best to treat such cases by creating the new structure programatically into database or store new structures as xml data types? other better ways?

    i have seen similar case in MS dynamic CRM. they have an import wizard that lets you import new structures and define the data types of fields and foreign keys as well but i could not find how they treat this at the database level.. any clues?

    thanks.

  • yasserbatsh (10/10/2013)


    When you say "data aggregator", I'm thinking something like a data mart, a database that aggregates data from multiple sources and possibly different structures.To leverage the strengths of a relational database like SQL Server, I would reccomend creating a fixed number of conformed and normalized tables. For exmple, there should be one (and only one) Customer table, Invoice table, Campaign table, etc.

    what if the user wants to import a new data structure that is not normalized yet in database ...say the data is for assets... is it best to treat such cases by creating the new structure programatically into database or store new structures as xml data types? other better ways?

    i have seen similar case in MS dynamic CRM. they have an import wizard that lets you import new structures and define the data types of fields and foreign keys as well but i could not find how they treat this at the database level.. any clues?

    thanks.

    My experience is in the healthcare, banking, and government sector. When there is a need to import data, then a business analyst will gather requirements about the exact format of the source input file and also document which attributes are required by the organization to import into the database. This scenario about letting non-IT end users import data from generic sources (anything from invoices to restraunt menus) into a database has no context.

    If basically all you need to do is injest and archive this data somewhere, and you're not sure how the organization will actually use it, then perhaps MongoDB would be a better choice. You could use MongoDB as a staging database, and then have an ETL process that feeds specific data items into SQL Server as the organization requires it.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Thank you Eric for your help and time,

    i' ll do further reseach on the mongodb and ssis packages.

    have a nice day.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply