Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Help on database design - Data Aggregator Expand / Collapse
Author
Message
Posted Thursday, October 10, 2013 8:37 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, November 05, 2013 6:29 AM
Points: 4, Visits: 22
I have a data aggregator application which should be able to import data with
different structures (mostly CRM data) such as Customer,invoice, campaigns, etc..
we can't know the fields or structures in advance.
The application should cater for data normalization like duplicates detection, and it should provide data mining and dash boarding on the available data.
I want to see what is the best architecture to use for the database.
Should i update the database to create tables / append new fields whenever a new structure or field is encountered Or use XML data type to store the data? Any other suggestion

Thanks
Post #1503627
Posted Thursday, October 10, 2013 8:58 AM


UDP Broadcaster

UDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP Broadcaster

Group: General Forum Members
Last Login: Today @ 12:40 PM
Points: 1,477, Visits: 4,281
yasserbatsh (10/10/2013)
I have a data aggregator application which should be able to import data with
different structures (mostly CRM data) such as Customer,invoice, campaigns, etc..
we can't know the fields or structures in advance.
The application should cater for data normalization like duplicates detection, and it should provide data mining and dash boarding on the available data.
I want to see what is the best architecture to use for the database.
Should i update the database to create tables / append new fields whenever a new structure or field is encountered Or use XML data type to store the data? Any other suggestion

Thanks


Normalized design and accomodating data of unknown structure is mutually exclusive. If the plan is to leverage SQL Server or another relational database, then the ETL process should map (conform) fields from these various data sources to properly normalized tables and columns. Technically, you could create one big table and contain each record in a column of type XML, or you could implment sparse columns, but there are better solutions.

If what you need is a solution for "dumping" unstructured data into a database, and then fetching it using a key, then I'd reccomend that you look into MongoDB or some other document centric database solution.

Regardless of the database platform, even the requirement of duplicate detection can not be properly done, unless you know what "fields" compose a unqiue key. Perhaps you could use a hash key across the entire set of fields.



"Winter Is Coming" - April 6, 2014
Post #1503636
Posted Thursday, October 10, 2013 12:21 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, November 05, 2013 6:29 AM
Points: 4, Visits: 22
Thank you Eric for your reply.
i will indeed need to leverage on sql server for my solution.
What do u think is a better solution to tackle such requirement other than xml data types or sparse columns?
Thanks.
Post #1503731
Posted Thursday, October 10, 2013 12:53 PM


UDP Broadcaster

UDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP Broadcaster

Group: General Forum Members
Last Login: Today @ 12:40 PM
Points: 1,477, Visits: 4,281
yasserbatsh (10/10/2013)
Thank you Eric for your reply.
i will indeed need to leverage on sql server for my solution.
What do u think is a better solution to tackle such requirement other than xml data types or sparse columns?
Thanks.


I have a data aggregator application which should be able to import data with
different structures (mostly CRM data) such as Customer,invoice, campaigns, etc..
we can't know the fields or structures in advance.


When you say "data aggregator", I'm thinking something like a data mart, a database that aggregates data from multiple sources and possibly different structures. To leverage the strengths of a relational database like SQL Server, I would reccomend creating a fixed number of conformed and normalized tables. For exmple, there should be one (and only one) Customer table, Invoice table, Campaign table, etc.

Your ETL process in the SQL Server environment would most likely be SSIS. It should have one field mapping (DataFlow task or package) for each data source, and the destination would be the appropriate conformed table. So, you may have multiple data sources, each with a different Customer structure and attributes, but you decide upfront which attributes (columns) you want to import into your tables.

For example, when you query the Invoice table, the only thing differentiating records originating from 'Customer A' versus 'Customer B' would be something like a Client_ID column. The other columns like Invoice_ID, Product_ID, Quantity, etc. should be the same, even if the originating data from each client is structured very differently.

If each customer has their own product coding scheme, but any code can map to an industry standard SKU, then you would also ideally perform mapping in the ETL process to conform customer specific product codes into SKU codes.

A document centric database like MongoDB would allow you to persist and index the source data as is. However, to performing operations like reporting across multiple customers, running ad-hoc queries, or integrating with other databases would be much more limited than what you could do with a SQL Server database where data for all customers is conformed and normalized into relational tables.

It really depends on the pourpose of this database and the case usage requirements.



"Winter Is Coming" - April 6, 2014
Post #1503749
Posted Thursday, October 10, 2013 1:55 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, November 05, 2013 6:29 AM
Points: 4, Visits: 22
When you say "data aggregator", I'm thinking something like a data mart, a database that aggregates data from multiple sources and possibly different structures.To leverage the strengths of a relational database like SQL Server, I would reccomend creating a fixed number of conformed and normalized tables. For exmple, there should be one (and only one) Customer table, Invoice table, Campaign table, etc.


what if the user wants to import a new data structure that is not normalized yet in database ...say the data is for assets... is it best to treat such cases by creating the new structure programatically into database or store new structures as xml data types? other better ways?

i have seen similar case in MS dynamic CRM. they have an import wizard that lets you import new structures and define the data types of fields and foreign keys as well but i could not find how they treat this at the database level.. any clues?

thanks.
Post #1503778
Posted Thursday, October 10, 2013 2:16 PM


UDP Broadcaster

UDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP Broadcaster

Group: General Forum Members
Last Login: Today @ 12:40 PM
Points: 1,477, Visits: 4,281
yasserbatsh (10/10/2013)
When you say "data aggregator", I'm thinking something like a data mart, a database that aggregates data from multiple sources and possibly different structures.To leverage the strengths of a relational database like SQL Server, I would reccomend creating a fixed number of conformed and normalized tables. For exmple, there should be one (and only one) Customer table, Invoice table, Campaign table, etc.


what if the user wants to import a new data structure that is not normalized yet in database ...say the data is for assets... is it best to treat such cases by creating the new structure programatically into database or store new structures as xml data types? other better ways?

i have seen similar case in MS dynamic CRM. they have an import wizard that lets you import new structures and define the data types of fields and foreign keys as well but i could not find how they treat this at the database level.. any clues?

thanks.

My experience is in the healthcare, banking, and government sector. When there is a need to import data, then a business analyst will gather requirements about the exact format of the source input file and also document which attributes are required by the organization to import into the database. This scenario about letting non-IT end users import data from generic sources (anything from invoices to restraunt menus) into a database has no context.
If basically all you need to do is injest and archive this data somewhere, and you're not sure how the organization will actually use it, then perhaps MongoDB would be a better choice. You could use MongoDB as a staging database, and then have an ETL process that feeds specific data items into SQL Server as the organization requires it.



"Winter Is Coming" - April 6, 2014
Post #1503795
Posted Thursday, October 10, 2013 2:33 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, November 05, 2013 6:29 AM
Points: 4, Visits: 22
Thank you Eric for your help and time,
i' ll do further reseach on the mongodb and ssis packages.
have a nice day.
Post #1503803
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse