Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Help on database design - Data Aggregator


Help on database design - Data Aggregator

Author
Message
yasserbatsh
yasserbatsh
Forum Newbie
Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)

Group: General Forum Members
Points: 4 Visits: 22
I have a data aggregator application which should be able to import data with
different structures (mostly CRM data) such as Customer,invoice, campaigns, etc..
we can't know the fields or structures in advance.
The application should cater for data normalization like duplicates detection, and it should provide data mining and dash boarding on the available data.
I want to see what is the best architecture to use for the database.
Should i update the database to create tables / append new fields whenever a new structure or field is encountered Or use XML data type to store the data? Any other suggestion

Thanks
Eric M Russell
Eric M Russell
SSCarpal Tunnel
SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)

Group: General Forum Members
Points: 4546 Visits: 9485
yasserbatsh (10/10/2013)
I have a data aggregator application which should be able to import data with
different structures (mostly CRM data) such as Customer,invoice, campaigns, etc..
we can't know the fields or structures in advance.
The application should cater for data normalization like duplicates detection, and it should provide data mining and dash boarding on the available data.
I want to see what is the best architecture to use for the database.
Should i update the database to create tables / append new fields whenever a new structure or field is encountered Or use XML data type to store the data? Any other suggestion

Thanks


Normalized design and accomodating data of unknown structure is mutually exclusive. If the plan is to leverage SQL Server or another relational database, then the ETL process should map (conform) fields from these various data sources to properly normalized tables and columns. Technically, you could create one big table and contain each record in a column of type XML, or you could implment sparse columns, but there are better solutions.

If what you need is a solution for "dumping" unstructured data into a database, and then fetching it using a key, then I'd reccomend that you look into MongoDB or some other document centric database solution.

Regardless of the database platform, even the requirement of duplicate detection can not be properly done, unless you know what "fields" compose a unqiue key. Perhaps you could use a hash key across the entire set of fields.


"The universe is complicated and for the most part beyond your control, but your life is only as complicated as you choose it to be."
yasserbatsh
yasserbatsh
Forum Newbie
Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)

Group: General Forum Members
Points: 4 Visits: 22
Thank you Eric for your reply.
i will indeed need to leverage on sql server for my solution.
What do u think is a better solution to tackle such requirement other than xml data types or sparse columns?
Thanks.
Eric M Russell
Eric M Russell
SSCarpal Tunnel
SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)

Group: General Forum Members
Points: 4546 Visits: 9485
yasserbatsh (10/10/2013)
Thank you Eric for your reply.
i will indeed need to leverage on sql server for my solution.
What do u think is a better solution to tackle such requirement other than xml data types or sparse columns?
Thanks.


I have a data aggregator application which should be able to import data with
different structures (mostly CRM data) such as Customer,invoice, campaigns, etc..
we can't know the fields or structures in advance.


When you say "data aggregator", I'm thinking something like a data mart, a database that aggregates data from multiple sources and possibly different structures. To leverage the strengths of a relational database like SQL Server, I would reccomend creating a fixed number of conformed and normalized tables. For exmple, there should be one (and only one) Customer table, Invoice table, Campaign table, etc.

Your ETL process in the SQL Server environment would most likely be SSIS. It should have one field mapping (DataFlow task or package) for each data source, and the destination would be the appropriate conformed table. So, you may have multiple data sources, each with a different Customer structure and attributes, but you decide upfront which attributes (columns) you want to import into your tables.

For example, when you query the Invoice table, the only thing differentiating records originating from 'Customer A' versus 'Customer B' would be something like a Client_ID column. The other columns like Invoice_ID, Product_ID, Quantity, etc. should be the same, even if the originating data from each client is structured very differently.

If each customer has their own product coding scheme, but any code can map to an industry standard SKU, then you would also ideally perform mapping in the ETL process to conform customer specific product codes into SKU codes.

A document centric database like MongoDB would allow you to persist and index the source data as is. However, to performing operations like reporting across multiple customers, running ad-hoc queries, or integrating with other databases would be much more limited than what you could do with a SQL Server database where data for all customers is conformed and normalized into relational tables.

It really depends on the pourpose of this database and the case usage requirements.


"The universe is complicated and for the most part beyond your control, but your life is only as complicated as you choose it to be."
yasserbatsh
yasserbatsh
Forum Newbie
Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)

Group: General Forum Members
Points: 4 Visits: 22
When you say "data aggregator", I'm thinking something like a data mart, a database that aggregates data from multiple sources and possibly different structures.To leverage the strengths of a relational database like SQL Server, I would reccomend creating a fixed number of conformed and normalized tables. For exmple, there should be one (and only one) Customer table, Invoice table, Campaign table, etc.


what if the user wants to import a new data structure that is not normalized yet in database ...say the data is for assets... is it best to treat such cases by creating the new structure programatically into database or store new structures as xml data types? other better ways?

i have seen similar case in MS dynamic CRM. they have an import wizard that lets you import new structures and define the data types of fields and foreign keys as well but i could not find how they treat this at the database level.. any clues?

thanks.
Eric M Russell
Eric M Russell
SSCarpal Tunnel
SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)

Group: General Forum Members
Points: 4546 Visits: 9485
yasserbatsh (10/10/2013)
When you say "data aggregator", I'm thinking something like a data mart, a database that aggregates data from multiple sources and possibly different structures.To leverage the strengths of a relational database like SQL Server, I would reccomend creating a fixed number of conformed and normalized tables. For exmple, there should be one (and only one) Customer table, Invoice table, Campaign table, etc.


what if the user wants to import a new data structure that is not normalized yet in database ...say the data is for assets... is it best to treat such cases by creating the new structure programatically into database or store new structures as xml data types? other better ways?

i have seen similar case in MS dynamic CRM. they have an import wizard that lets you import new structures and define the data types of fields and foreign keys as well but i could not find how they treat this at the database level.. any clues?

thanks.

My experience is in the healthcare, banking, and government sector. When there is a need to import data, then a business analyst will gather requirements about the exact format of the source input file and also document which attributes are required by the organization to import into the database. This scenario about letting non-IT end users import data from generic sources (anything from invoices to restraunt menus) into a database has no context.
If basically all you need to do is injest and archive this data somewhere, and you're not sure how the organization will actually use it, then perhaps MongoDB would be a better choice. You could use MongoDB as a staging database, and then have an ETL process that feeds specific data items into SQL Server as the organization requires it.


"The universe is complicated and for the most part beyond your control, but your life is only as complicated as you choose it to be."
yasserbatsh
yasserbatsh
Forum Newbie
Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)

Group: General Forum Members
Points: 4 Visits: 22
Thank you Eric for your help and time,
i' ll do further reseach on the mongodb and ssis packages.
have a nice day.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search