November 8, 2010 at 1:59 am
What is data cleansing and Data filtering and how can we do the data cleansing and Data filtering from the SQL 2005 server database to Maximo database.
November 9, 2010 at 6:13 am
...or to put it more succinctly "how long is a piece of string?"
🙂
Sorry I couldn't resist!
I'm no forum expert but surely you can't expect a decent answer if you post a question like that? Data cleaning/filtering or basically ETL (extract, transform, load) is not a fixed set of procedures or rules. It depends on the data, it depends on the source database (application) and it depends on the destination database (application).
November 10, 2010 at 4:59 am
What is data cleansing
http://en.wikipedia.org/wiki/Data_cleansing
What is Data filtering
http://dbconvert.com/filtering.php
how can we do the data cleansing and Data filtering from the SQL 2005 server database to Maximo database
Set up connection to SQL 2005 database (simple and easy) see connection mangers in SSIS, and Extract the data. Once processed the data in way you want, lots of tools/processes for that in SSIS, Transforming the data. Then set up connection to Maximo, which l have seen a posting doing using MS Access DB so using possibly an OLEDB conneciton will allow out write the data to the tables in Maximo, Load the data.
SSIS is an ETL (Extract, Transform, Load) tool.
How you do these steps depends on the databases, business rules, data etc you are working with. Start playing with SSIS, then google and ask questions
September 4, 2019 at 2:13 pm
This was removed by the editor as SPAM
April 8, 2021 at 9:14 am
This was removed by the editor as SPAM
January 24, 2024 at 11:05 am
This was removed by the editor as SPAM
March 17, 2024 at 8:38 pm
This was removed by the editor as SPAM
March 17, 2024 at 8:40 pm
This was removed by the editor as SPAM
March 18, 2024 at 2:59 pm
Need to cleanse data comes from acceptance of dirty data to begin with. It happens and there is apparently nothing we can do about it.
There are two cases of data loading:
Case 1) independent table may be case of loading some Excel, XML or similar data. If you know or can infer what makes the data dirty do your best to follow advices from previous posts. In time, you will get better idea of what could be wrong with the data. Mind you, it is impossible to 100% cleanse the data this way, with time and experience, you will get better and come closer to 100%.
Case 2) should be easier, apparently, because properly designed relational database will have some constraints and nonconforming data will simply be rejected at the gate. You don't want to keep trying to add the data and each time get message about violation of another constraint. Often incoming data is not consistent with data types. For each column in incoming data check if data types are corect.
SELECT individual_column -- numeric
FROM IcomingRawData
WHERE NOT ISNUmeric (individual_column )
So, to avoid the system telling you that something is wrong, you can find definitions of all constraints of interest (that affect target table(s)) and simply run select queries for each constraint, something like this:
For constraint CHECK (StartDate<=EndDate) something like this would do:
SELECT SomeColumns
FROM IncomeingRawData
WHERE NOT (StartDate<=EndDate)
Once you find all dirty data, you need a decision - "fix it" or "throw it to trash". "Fix it" is OK if the errors are obvious. If not, leave it alone and send nasty letter to the data sender. "Throw to trash" may make sense, but it may make loading impossible due to Foreign Key constraints.
The best approach is none of the above. Find out in advance what incoming data should look like, design a table - column names, data types, data size, NULLability (sigh!, there should be none, but nothing we can do about that) with lots of constraints, the more the merrier. Then demand sender to send the data with your design in mind. If possible, see if data on their end can be in same type of teh database as your target database is, so you can pass them constraints in advance. They should make sure data is as clean (follows fdata types, nullability and constraints) before it is sent.
Of course, you could be getting data from PDF files or screenshots of the web pages. If that is the case, entire operation should be questioned.
You know already - garbage in, garbage out. DBA or a developer should not be the only person to deal with dirty incoming data.
That is all, good luck
🙂
Zidar's Theorem: The best code is no code at all...
May 3, 2024 at 8:52 pm
This was removed by the editor as SPAM
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply