Check and transfer data between two databases in same server using complex conditions

  • Source DB1.TB1 [ ID, EAN, ASIN, category, NAME, CONDITION]

    1, null, 20001, ps2, COD2, new

    2, 1002, 20002, xbox, HOLA, new

    3, 1003, 20003, xbox, Spider Man, used

    and DB1.TB2 [ ID, PRICE]

    1, 50

    2, 51

    3, 61

    Destination DB2.TB1 [ KID, EAN, category, NAME, CONDITION, PRICE ]

    1013, 1001, ps2, COD2, new, 50

    1015, 1002, xbox, HOLA, new, 31

    Now I need to transfer the data from source DB1 TB1 and TB2 to destination table with some conditions:

    ID is same products int DB1, KID is the actual "ID" we refer to in the organization. each item without this id should be updated with KID_max + 1.

    The code must check the existing items in DB2.TB1 and update its price if the items exist else add it to this table using new KID .

    I wanted to use `join` but I am not sure where shall I put the `on` condition. Since I could not find any entity that would be common in both, names may be same but not sure if they can be used in this `on` condition. Some items in the DB2.TB1 do not have EAN. Some Items on the DB1.TB1 do not have EAN as well but all do have ASIN.

    Just some sample codes that I may use to compare the products and update the destination would be enough for me, I think I will be able to modify the codes if I am able to know the path that I should follow to solve this this type of task.

    Info: DB2.TB1 consists of around 53k items and source DB1.TB1 consists 18k items.

    This task needs to be done every day so I need to consider some performance issue as well.

    I am not sure if the full code will be too much to ask or not (being a beginner,it seems too complex) I would like to know some information on how shall I approach to solve this issue.

  • Thank you for your response. @celko,

    I agree that I do not have enough knowledge about the RDMS, I am in the learning stage, have tried myself on searching related topics though I have not mentioned here. I don't think it is possible to learn these topics in short period of time so as I am learning as well as working on some of the things.

    Our organization supplied me with a piece of software that downloads the products to re-post in our website, its from amazon. The downloading is totally different, we have been using ID2 for identification, which is not related to both EAN and ASIN, so my little knowledge was thinking to compare EAN,ASIN and the NAMEs of the products so that I can update the tables. Not knowing how! I have basic knowledge by now, at least i understand the codes, and sometimes modify it to suit my needs.

    As I told in my post it seems quite complex to me just to begin with so wanted to know some approaches how its done in this kind of situation.

  • adhikarideep (11/6/2012)


    Source DB1.TB1 [ID, EAN, ASIN, NAME, CONDITION]

    1,1001,20001,ps2,COD2,new

    2,1002,20002,xbox,HOLA,new

    3,1003,20003,xbox,Spider Man,used

    I understand the requirement but you're Sample Data is not correct. The above quoted part shows that the table has 5 columns but the data you have given is 6 columns.

    Please provide correct sample data for a working solution for your requirement.

    Vinu Vijayan

    For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

  • vinu512 (11/7/2012)


    adhikarideep (11/6/2012)


    I understand the requirement but you're Sample Data is not correct. The above quoted part shows that the table has 5 columns but the data you have given is 6 columns.

    Please provide correct sample data for a working solution for your requirement.

    Thanks for your understanding, I have by now updated my question along with the entity the tables. Hope some suggestions will help me figure out how to go ahead. Looking at it my head is spinning, I do not know if a sql level solution will help or application is supposed to do it.

  • I can see that you have edited your sample data, but only the part that I quoted in my last post. From what I can see, there are still issues with your Sample Data. The data in the EAN at the Destination does not match with the data at the Source. I am assuming that its a typo error and going with the values at the source.

    You can select the data by joining the tables at the Source ON the Id field in both TB1 and TB2. But, you don't have such a field at the Destination. So....I would say that the design at the Destination is terribly wrong and without making changes to this design it won't be possible to do what you are trying to do.

    I would have inserted/updated the data by checking the EAN field in the Destination table. But, as I see....the first row has an EAN value "NULL" and the corresponding row at the Destination has a value which is NOT NULL. Which means that even if we know that the two rows are same but we can't assume that it might be true for all the rest of the data on a larger scale.

    So, all you could do here is add an "ID" field to the destination table which would map the existing data at the Destination to the data at the Source. If this is done once then it would be a lot lot easier for you to solve such problems in the future.

    Finally, if you think that the NAME field is unique and you want to map the data between the Source and the Destination on the NAME field then you can use the following script:

    --Creating Tables

    Create table Ex

    (ID int,

    EAN int,

    ASIN BigInt,

    category NVarchar(20),

    NAME NVarchar(20),

    CONDITION NVarchar(20) )

    Create table Ex1

    (ID int,

    PRICE Float)

    Create Table Ex2

    (KID int,

    EAN Int,

    category NVarchar(20),

    NAME NVarchar(20),

    CONDITION NVarchar(20),

    PRICE Float )

    --Inserting Sample Data

    Insert Into Ex

    Select 1, null, 20001, 'ps2', 'COD2', 'new'

    Union ALL

    Select 2, 1002, 20002, 'xbox', 'HOLA', 'new'

    Union ALL

    Select 3, 1003, 20003, 'xbox', 'Spider Man', 'used'

    Insert Into Ex1

    Select 1, 50

    Union ALL

    Select 2, 51

    Union ALL

    Select 3, 61

    Insert Into Ex2

    Select 1013, 1001, 'ps2', 'COD2', 'new', 50

    Union ALL

    Select 1015, 1002, 'xbox', 'HOLA', 'new', 31

    --Script for your Requirement

    Declare @newKID Int

    Select @newKID = MAX(KID) From Ex2

    IF Exists(Select Name From Ex Except Select Name From Ex2)

    Begin

    Insert Into Ex2

    Select (@newKID + ROW_NUMBER() Over (Order By ID)) As KID, EAN, category, NAME, CONDITION, PRICE From

    (

    Select a.Id, a.EAN, a.category, a.NAME, a.CONDITION, b.PRICE

    From Ex As a JOIN Ex1 As b ON a.ID = b.ID

    ) As p

    Where NAME IN (Select Name From Ex Except Select Name From Ex2)

    End

    IF EXISTS(Select Name From Ex Intersect Select Name From Ex2)

    Begin

    Update Ex2

    Set EAN = p.EAN, category = p.category, NAME = p.NAME, CONDITION = p.CONDITION, PRICE = p.PRICE

    From

    (

    Select c.EAN, a.category, a.NAME, a.CONDITION, b.PRICE, c.KID

    From Ex As a JOIN Ex1 As b ON a.ID = b.ID

    JOIN Ex2 As c On a.NAME = c.NAME

    ) As p

    Where Ex2.EAN = p.EAN

    End

    You can make the necessary changes in the Script according to your actual data after testing it on the sample data. Here I use Ex and Ex1 as Source tables and Ex2 As the Destination Table.

    This would not cause performance issues even while inserting a hud load of data and would work in a matter of seconds......as I have avoide using Cursors for Row by Row insertions.

    Hope it helps.

    Vinu Vijayan

    For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

  • vinu512 (11/7/2012)


    So, all you could do here is add an "ID" field to the destination table which would map the existing data at the Destination to the data at the Source. If this is done once then it would be a lot lot easier for you to solve such problems in the future.

    Thanks a lot vinu512, for understanding my situation.

    among all options you provided along with some I looked upon, I think adding new "ID" must be a good idea.The names not being unique as we used the same name for the used product and the new one indicating it on other columns .

    so, going to first (but difficult kind of option Think), how shall I add the "ID" to this table which is around 53k big.

    >><<I saw something called "lookup" and "fuzzy*lookup" these as far as I know are some statistical tools to compare the similarity and provides the confidence interval as well as the level of similarity.

    would it be good idea to use it, I wanted to use it seems difficult. would it do what I am looking for to do?

    my organization most of the time changes its sources of products, and different programmer are used to get those products to the database, there are same products from different supplier with same name but we try to display the cheapest one. The only way we use to differentiate the products is its kid.

    Another problem is that it is not documented, most of the time have to go and see all the consequences of the changes I make.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply