Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Check and transfer data between two databases in same server using complex conditions Expand / Collapse
Author
Message
Posted Tuesday, November 6, 2012 4:57 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, February 6, 2013 5:10 AM
Points: 4, Visits: 20
Source DB1.TB1 [ ID, EAN, ASIN, category, NAME, CONDITION]

1, null, 20001, ps2, COD2, new
2, 1002, 20002, xbox, HOLA, new
3, 1003, 20003, xbox, Spider Man, used

and DB1.TB2 [ ID, PRICE]

1, 50
2, 51
3, 61


Destination DB2.TB1 [ KID, EAN, category, NAME, CONDITION, PRICE ]

1013, 1001, ps2, COD2, new, 50
1015, 1002, xbox, HOLA, new, 31


Now I need to transfer the data from source DB1 TB1 and TB2 to destination table with some conditions:

ID is same products int DB1, KID is the actual "ID" we refer to in the organization. each item without this id should be updated with KID_max + 1.

The code must check the existing items in DB2.TB1 and update its price if the items exist else add it to this table using new KID .

I wanted to use `join` but I am not sure where shall I put the `on` condition. Since I could not find any entity that would be common in both, names may be same but not sure if they can be used in this `on` condition. Some items in the DB2.TB1 do not have EAN. Some Items on the DB1.TB1 do not have EAN as well but all do have ASIN.

Just some sample codes that I may use to compare the products and update the destination would be enough for me, I think I will be able to modify the codes if I am able to know the path that I should follow to solve this this type of task.

Info: DB2.TB1 consists of around 53k items and source DB1.TB1 consists 18k items.
This task needs to be done every day so I need to consider some performance issue as well.

I am not sure if the full code will be too much to ask or not (being a beginner,it seems too complex) I would like to know some information on how shall I approach to solve this issue.
Post #1381499
Posted Tuesday, November 6, 2012 6:41 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Friday, December 12, 2014 8:24 PM
Points: 1,945, Visits: 3,173
Please post DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, data types, etc. in your schema are. If you know how, follow ISO-11179 data element naming conventions and formatting rules (you did not). Code should be in Standard SQL as much as possible and not local dialect. But you did not even try something.

Sample data is also a good idea, along with clear specifications. It is very hard to debug code when you do not let us see it. There is no such thing as a generic, magical “id” in RDBMS; you have the EAN which is an industry standard key. Likewise, Amazon Standard Identification Number (asin) is also a key. Please post DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, data types, etc. in your schema are. If you know how, follow ISO-11179 data element naming conventions and formatting rules (you did not). Code should be in Standard SQL as much as possible and not local dialect. But you did not even try something.

Sample data is also a good idea, along with clear specifications. It is very hard to debug code when you do not let us see it. There is no such thing as a generic, magical “id” in RDBMS; you have the EAN which is an industry standard key. Likewise, Amazon Standard Identification Number (asin) is also a key. See http://www.amazon.com/gp/seller/asin-upc-isbn-info.html

CREATE TABLE Merchandise
(ean CHAR(15) NOT NULL UNIQUE,
asin CHAR(10) NOT NULL UNIQUE,
item_name VARCHAR(25) NOT NULL,
shipping_category CHAR(??) NOT NULL,
item_condition CHAR(4) NOT NULL
CHECK (item_condition IN ('used', 'new'),
item_price DECIMAL(12,2) CHECK (item_price >= 0.00));

Two keys is a design problem; how do you ensure that both of them identify the same item?

The price is an attribute of each item; it should not be split out into its own table. Why do you think that a price is a separate entity? I am letting the price be NULL-able, in case you do not have one when you get the item.

>> The shipping price of each item depends upon its KID name. So the products must be categorized according to this as well which are dependent upon the name index such as ps2 or xbox. <<

What KID name? There is no such column in your narrative. You need a shipping category of some kind in the table. That is another attribute of an item.

>> I am not sure if the full code will be too much to ask or not (being a beginner,it seems too complex) I would like to know some information on how shall I approach to solve this issue. <<

Please read an book on RDBMS and data modeling. You do not know enough to ask a good question and a forum is not the right place to get an education.


Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #1381541
Posted Tuesday, November 6, 2012 7:00 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, February 6, 2013 5:10 AM
Points: 4, Visits: 20
Thank you for your response. @Celko,
I agree that I do not have enough knowledge about the RDMS, I am in the learning stage, have tried myself on searching related topics though I have not mentioned here. I don't think it is possible to learn these topics in short period of time so as I am learning as well as working on some of the things.

Our organization supplied me with a piece of software that downloads the products to re-post in our website, its from amazon. The downloading is totally different, we have been using ID2 for identification, which is not related to both EAN and ASIN, so my little knowledge was thinking to compare EAN,ASIN and the NAMEs of the products so that I can update the tables. Not knowing how! I have basic knowledge by now, at least i understand the codes, and sometimes modify it to suit my needs.
As I told in my post it seems quite complex to me just to begin with so wanted to know some approaches how its done in this kind of situation.
Post #1381552
Posted Wednesday, November 7, 2012 4:14 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Tuesday, October 28, 2014 6:04 AM
Points: 1,127, Visits: 1,599
adhikarideep (11/6/2012)
Source DB1.TB1 [ID, EAN, ASIN, NAME, CONDITION]

1,1001,20001,ps2,COD2,new
2,1002,20002,xbox,HOLA,new
3,1003,20003,xbox,Spider Man,used



I understand the requirement but you're Sample Data is not correct. The above quoted part shows that the table has 5 columns but the data you have given is 6 columns.
Please provide correct sample data for a working solution for your requirement.


Vinu Vijayan

For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden
Post #1381899
Posted Wednesday, November 7, 2012 5:45 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, February 6, 2013 5:10 AM
Points: 4, Visits: 20
vinu512 (11/7/2012)
[quote]adhikarideep (11/6/2012)

I understand the requirement but you're Sample Data is not correct. The above quoted part shows that the table has 5 columns but the data you have given is 6 columns.
Please provide correct sample data for a working solution for your requirement.


Thanks for your understanding, I have by now updated my question along with the entity the tables. Hope some suggestions will help me figure out how to go ahead. Looking at it my head is spinning, I do not know if a sql level solution will help or application is supposed to do it.
Post #1381925
Posted Wednesday, November 7, 2012 10:42 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Tuesday, October 28, 2014 6:04 AM
Points: 1,127, Visits: 1,599
I can see that you have edited your sample data, but only the part that I quoted in my last post. From what I can see, there are still issues with your Sample Data. The data in the EAN at the Destination does not match with the data at the Source. I am assuming that its a typo error and going with the values at the source.

You can select the data by joining the tables at the Source ON the Id field in both TB1 and TB2. But, you don't have such a field at the Destination. So....I would say that the design at the Destination is terribly wrong and without making changes to this design it won't be possible to do what you are trying to do.

I would have inserted/updated the data by checking the EAN field in the Destination table. But, as I see....the first row has an EAN value "NULL" and the corresponding row at the Destination has a value which is NOT NULL. Which means that even if we know that the two rows are same but we can't assume that it might be true for all the rest of the data on a larger scale.

So, all you could do here is add an "ID" field to the destination table which would map the existing data at the Destination to the data at the Source. If this is done once then it would be a lot lot easier for you to solve such problems in the future.

Finally, if you think that the NAME field is unique and you want to map the data between the Source and the Destination on the NAME field then you can use the following script:

--Creating Tables

Create table Ex
(ID int,
EAN int,
ASIN BigInt,
category NVarchar(20),
NAME NVarchar(20),
CONDITION NVarchar(20) )

Create table Ex1
(ID int,
PRICE Float)

Create Table Ex2
(KID int,
EAN Int,
category NVarchar(20),
NAME NVarchar(20),
CONDITION NVarchar(20),
PRICE Float )


--Inserting Sample Data

Insert Into Ex
Select 1, null, 20001, 'ps2', 'COD2', 'new'
Union ALL
Select 2, 1002, 20002, 'xbox', 'HOLA', 'new'
Union ALL
Select 3, 1003, 20003, 'xbox', 'Spider Man', 'used'

Insert Into Ex1
Select 1, 50
Union ALL
Select 2, 51
Union ALL
Select 3, 61

Insert Into Ex2
Select 1013, 1001, 'ps2', 'COD2', 'new', 50
Union ALL
Select 1015, 1002, 'xbox', 'HOLA', 'new', 31


--Script for your Requirement

Declare @newKID Int
Select @newKID = MAX(KID) From Ex2
IF Exists(Select Name From Ex Except Select Name From Ex2)
Begin
Insert Into Ex2
Select (@newKID + ROW_NUMBER() Over (Order By ID)) As KID, EAN, category, NAME, CONDITION, PRICE From
(
Select a.Id, a.EAN, a.category, a.NAME, a.CONDITION, b.PRICE
From Ex As a JOIN Ex1 As b ON a.ID = b.ID
) As p
Where NAME IN (Select Name From Ex Except Select Name From Ex2)
End
IF EXISTS(Select Name From Ex Intersect Select Name From Ex2)
Begin
Update Ex2
Set EAN = p.EAN, category = p.category, NAME = p.NAME, CONDITION = p.CONDITION, PRICE = p.PRICE
From
(
Select c.EAN, a.category, a.NAME, a.CONDITION, b.PRICE, c.KID
From Ex As a JOIN Ex1 As b ON a.ID = b.ID
JOIN Ex2 As c On a.NAME = c.NAME
) As p
Where Ex2.EAN = p.EAN
End



You can make the necessary changes in the Script according to your actual data after testing it on the sample data. Here I use Ex and Ex1 as Source tables and Ex2 As the Destination Table.

This would not cause performance issues even while inserting a hud load of data and would work in a matter of seconds......as I have avoide using Cursors for Row by Row insertions.

Hope it helps.


Vinu Vijayan

For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden
Post #1382259
Posted Friday, November 9, 2012 3:47 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, February 6, 2013 5:10 AM
Points: 4, Visits: 20
vinu512 (11/7/2012)


So, all you could do here is add an "ID" field to the destination table which would map the existing data at the Destination to the data at the Source. If this is done once then it would be a lot lot easier for you to solve such problems in the future.



Thanks a lot vinu512, for understanding my situation.
among all options you provided along with some I looked upon, I think adding new "ID" must be a good idea.The names not being unique as we used the same name for the used product and the new one indicating it on other columns .
so, going to first (but difficult kind of option Think), how shall I add the "ID" to this table which is around 53k big.

>><<I saw something called "lookup" and "fuzzy*lookup" these as far as I know are some statistical tools to compare the similarity and provides the confidence interval as well as the level of similarity.
would it be good idea to use it, I wanted to use it seems difficult. would it do what I am looking for to do?

my organization most of the time changes its sources of products, and different programmer are used to get those products to the database, there are same products from different supplier with same name but we try to display the cheapest one. The only way we use to differentiate the products is its kid.
Another problem is that it is not documented, most of the time have to go and see all the consequences of the changes I make.
Post #1382927
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse