• yasserbatsh (10/10/2013)


    I have a data aggregator application which should be able to import data with

    different structures (mostly CRM data) such as Customer,invoice, campaigns, etc..

    we can't know the fields or structures in advance.

    The application should cater for data normalization like duplicates detection, and it should provide data mining and dash boarding on the available data.

    I want to see what is the best architecture to use for the database.

    Should i update the database to create tables / append new fields whenever a new structure or field is encountered Or use XML data type to store the data? Any other suggestion

    Thanks

    Normalized design and accomodating data of unknown structure is mutually exclusive. If the plan is to leverage SQL Server or another relational database, then the ETL process should map (conform) fields from these various data sources to properly normalized tables and columns. Technically, you could create one big table and contain each record in a column of type XML, or you could implment sparse columns, but there are better solutions.

    If what you need is a solution for "dumping" unstructured data into a database, and then fetching it using a key, then I'd reccomend that you look into MongoDB or some other document centric database solution.

    Regardless of the database platform, even the requirement of duplicate detection can not be properly done, unless you know what "fields" compose a unqiue key. Perhaps you could use a hash key across the entire set of fields.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho