|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Wednesday, February 06, 2013 5:10 AM
Points: 4,
Visits: 20
|
|
Source DB1.TB1 [ ID, EAN, ASIN, category, NAME, CONDITION]
1, null, 20001, ps2, COD2, new 2, 1002, 20002, xbox, HOLA, new 3, 1003, 20003, xbox, Spider Man, used
and DB1.TB2 [ ID, PRICE]
1, 50 2, 51 3, 61
Destination DB2.TB1 [ KID, EAN, category, NAME, CONDITION, PRICE ]
1013, 1001, ps2, COD2, new, 50 1015, 1002, xbox, HOLA, new, 31
Now I need to transfer the data from source DB1 TB1 and TB2 to destination table with some conditions:
ID is same products int DB1, KID is the actual "ID" we refer to in the organization. each item without this id should be updated with KID_max + 1.
The code must check the existing items in DB2.TB1 and update its price if the items exist else add it to this table using new KID .
I wanted to use `join` but I am not sure where shall I put the `on` condition. Since I could not find any entity that would be common in both, names may be same but not sure if they can be used in this `on` condition. Some items in the DB2.TB1 do not have EAN. Some Items on the DB1.TB1 do not have EAN as well but all do have ASIN.
Just some sample codes that I may use to compare the products and update the destination would be enough for me, I think I will be able to modify the codes if I am able to know the path that I should follow to solve this this type of task.
Info: DB2.TB1 consists of around 53k items and source DB1.TB1 consists 18k items. This task needs to be done every day so I need to consider some performance issue as well. I am not sure if the full code will be too much to ask or not (being a beginner,it seems too complex) I would like to know some information on how shall I approach to solve this issue.
|
|
|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: Tuesday, January 15, 2013 11:11 AM
Points: 1,945,
Visits: 2,782
|
|
Please post DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, data types, etc. in your schema are. If you know how, follow ISO-11179 data element naming conventions and formatting rules (you did not). Code should be in Standard SQL as much as possible and not local dialect. But you did not even try something.
Sample data is also a good idea, along with clear specifications. It is very hard to debug code when you do not let us see it. There is no such thing as a generic, magical “id” in RDBMS; you have the EAN which is an industry standard key. Likewise, Amazon Standard Identification Number (asin) is also a key. Please post DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, data types, etc. in your schema are. If you know how, follow ISO-11179 data element naming conventions and formatting rules (you did not). Code should be in Standard SQL as much as possible and not local dialect. But you did not even try something.
Sample data is also a good idea, along with clear specifications. It is very hard to debug code when you do not let us see it. There is no such thing as a generic, magical “id” in RDBMS; you have the EAN which is an industry standard key. Likewise, Amazon Standard Identification Number (asin) is also a key. See http://www.amazon.com/gp/seller/asin-upc-isbn-info.html
CREATE TABLE Merchandise (ean CHAR(15) NOT NULL UNIQUE, asin CHAR(10) NOT NULL UNIQUE, item_name VARCHAR(25) NOT NULL, shipping_category CHAR(??) NOT NULL, item_condition CHAR(4) NOT NULL CHECK (item_condition IN ('used', 'new'), item_price DECIMAL(12,2) CHECK (item_price >= 0.00));
Two keys is a design problem; how do you ensure that both of them identify the same item?
The price is an attribute of each item; it should not be split out into its own table. Why do you think that a price is a separate entity? I am letting the price be NULL-able, in case you do not have one when you get the item.
>> The shipping price of each item depends upon its KID name. So the products must be categorized according to this as well which are dependent upon the name index such as ps2 or xbox. <<
What KID name? There is no such column in your narrative. You need a shipping category of some kind in the table. That is another attribute of an item.
>> I am not sure if the full code will be too much to ask or not (being a beginner,it seems too complex) I would like to know some information on how shall I approach to solve this issue. <<
Please read an book on RDBMS and data modeling. You do not know enough to ask a good question and a forum is not the right place to get an education.
Books in Celko Series for Morgan-Kaufmann Publishing Analytics and OLAP in SQL Data and Databases: Concepts in Practice Data, Measurements and Standards in SQL SQL for Smarties SQL Programming Style SQL Puzzles and Answers Thinking in Sets Trees and Hierarchies in SQL
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Wednesday, February 06, 2013 5:10 AM
Points: 4,
Visits: 20
|
|
Thank you for your response. @Celko, I agree that I do not have enough knowledge about the RDMS, I am in the learning stage, have tried myself on searching related topics though I have not mentioned here. I don't think it is possible to learn these topics in short period of time so as I am learning as well as working on some of the things. Our organization supplied me with a piece of software that downloads the products to re-post in our website, its from amazon. The downloading is totally different, we have been using ID2 for identification, which is not related to both EAN and ASIN, so my little knowledge was thinking to compare EAN,ASIN and the NAMEs of the products so that I can update the tables. Not knowing how! I have basic knowledge by now, at least i understand the codes, and sometimes modify it to suit my needs. As I told in my post it seems quite complex to me just to begin with so wanted to know some approaches how its done in this kind of situation.
|
|
|
|
|
Ten Centuries
      
Group: General Forum Members
Last Login: Monday, May 20, 2013 12:37 AM
Points: 1,049,
Visits: 1,439
|
|
adhikarideep (11/6/2012) Source DB1.TB1 [ID, EAN, ASIN, NAME, CONDITION]
1,1001,20001,ps2,COD2,new 2,1002,20002,xbox,HOLA,new 3,1003,20003,xbox,Spider Man,used
I understand the requirement but you're Sample Data is not correct. The above quoted part shows that the table has 5 columns but the data you have given is 6 columns. Please provide correct sample data for a working solution for your requirement.
Vinu Vijayan
For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Wednesday, February 06, 2013 5:10 AM
Points: 4,
Visits: 20
|
|
vinu512 (11/7/2012) [quote]adhikarideep (11/6/2012)
I understand the requirement but you're Sample Data is not correct. The above quoted part shows that the table has 5 columns but the data you have given is 6 columns. Please provide correct sample data for a working solution for your requirement.
Thanks for your understanding, I have by now updated my question along with the entity the tables. Hope some suggestions will help me figure out how to go ahead. Looking at it my head is spinning, I do not know if a sql level solution will help or application is supposed to do it.
|
|
|
|
|
Ten Centuries
      
Group: General Forum Members
Last Login: Monday, May 20, 2013 12:37 AM
Points: 1,049,
Visits: 1,439
|
|
I can see that you have edited your sample data, but only the part that I quoted in my last post. From what I can see, there are still issues with your Sample Data. The data in the EAN at the Destination does not match with the data at the Source. I am assuming that its a typo error and going with the values at the source.
You can select the data by joining the tables at the Source ON the Id field in both TB1 and TB2. But, you don't have such a field at the Destination. So....I would say that the design at the Destination is terribly wrong and without making changes to this design it won't be possible to do what you are trying to do.
I would have inserted/updated the data by checking the EAN field in the Destination table. But, as I see....the first row has an EAN value "NULL" and the corresponding row at the Destination has a value which is NOT NULL. Which means that even if we know that the two rows are same but we can't assume that it might be true for all the rest of the data on a larger scale.
So, all you could do here is add an "ID" field to the destination table which would map the existing data at the Destination to the data at the Source. If this is done once then it would be a lot lot easier for you to solve such problems in the future.
Finally, if you think that the NAME field is unique and you want to map the data between the Source and the Destination on the NAME field then you can use the following script:
--Creating Tables
Create table Ex (ID int, EAN int, ASIN BigInt, category NVarchar(20), NAME NVarchar(20), CONDITION NVarchar(20) )
Create table Ex1 (ID int, PRICE Float)
Create Table Ex2 (KID int, EAN Int, category NVarchar(20), NAME NVarchar(20), CONDITION NVarchar(20), PRICE Float )
--Inserting Sample Data Insert Into Ex Select 1, null, 20001, 'ps2', 'COD2', 'new' Union ALL Select 2, 1002, 20002, 'xbox', 'HOLA', 'new' Union ALL Select 3, 1003, 20003, 'xbox', 'Spider Man', 'used'
Insert Into Ex1 Select 1, 50 Union ALL Select 2, 51 Union ALL Select 3, 61 Insert Into Ex2 Select 1013, 1001, 'ps2', 'COD2', 'new', 50 Union ALL Select 1015, 1002, 'xbox', 'HOLA', 'new', 31
--Script for your Requirement
Declare @newKID Int Select @newKID = MAX(KID) From Ex2 IF Exists(Select Name From Ex Except Select Name From Ex2) Begin Insert Into Ex2 Select (@newKID + ROW_NUMBER() Over (Order By ID)) As KID, EAN, category, NAME, CONDITION, PRICE From ( Select a.Id, a.EAN, a.category, a.NAME, a.CONDITION, b.PRICE From Ex As a JOIN Ex1 As b ON a.ID = b.ID ) As p Where NAME IN (Select Name From Ex Except Select Name From Ex2) End IF EXISTS(Select Name From Ex Intersect Select Name From Ex2) Begin Update Ex2 Set EAN = p.EAN, category = p.category, NAME = p.NAME, CONDITION = p.CONDITION, PRICE = p.PRICE From ( Select c.EAN, a.category, a.NAME, a.CONDITION, b.PRICE, c.KID From Ex As a JOIN Ex1 As b ON a.ID = b.ID JOIN Ex2 As c On a.NAME = c.NAME ) As p Where Ex2.EAN = p.EAN End
You can make the necessary changes in the Script according to your actual data after testing it on the sample data. Here I use Ex and Ex1 as Source tables and Ex2 As the Destination Table.
This would not cause performance issues even while inserting a hud load of data and would work in a matter of seconds......as I have avoide using Cursors for Row by Row insertions.
Hope it helps.
Vinu Vijayan
For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Wednesday, February 06, 2013 5:10 AM
Points: 4,
Visits: 20
|
|
vinu512 (11/7/2012)
So, all you could do here is add an "ID" field to the destination table which would map the existing data at the Destination to the data at the Source. If this is done once then it would be a lot lot easier for you to solve such problems in the future.
Thanks a lot vinu512, for understanding my situation. among all options you provided along with some I looked upon, I think adding new "ID" must be a good idea.The names not being unique as we used the same name for the used product and the new one indicating it on other columns . so, going to first (but difficult kind of option Think), how shall I add the "ID" to this table which is around 53k big.
>><<I saw something called "lookup" and "fuzzy*lookup" these as far as I know are some statistical tools to compare the similarity and provides the confidence interval as well as the level of similarity. would it be good idea to use it, I wanted to use it seems difficult. would it do what I am looking for to do? my organization most of the time changes its sources of products, and different programmer are used to get those products to the database, there are same products from different supplier with same name but we try to display the cheapest one. The only way we use to differentiate the products is its kid. Another problem is that it is not documented, most of the time have to go and see all the consequences of the changes I make.
|
|
|
|