Universal Product Codes: a Database Primer

• Comments posted to this topic are about the item Universal Product Codes: a Database Primer

Hakim Ali
www.sqlzen.com

• In the algorithm for calculating the check digit, you have k = X mod 10. If k = 0 all is fine. Otherwise do 10-k. What do you think is the purpose of 10-k? What not just k all the time? What additional advantage does 10-k add to the process of calculating a check digit? I just don't see it. Who invented this?

• I believe the algorithm for calculating the check digit takes the difference between the 'weighted sum' and the nearest multiple of 10 that is equal or higher.

if the sum was 55, the nearest multiple of 10 that is equal or higher is 60

so 60-55 = 5 is the check digit

this can be changed to (10 - (sum mod 10)) mod 10

The reason for the second MOD is that in cases where the weighted sum is a multiple of 10, you would get a check digit of 10, which is too many digits. The second mod will then return this as 0.

• great article, thanks !

How is the uniqueness of the code guaranteed? Is there an open-access database somewhere that I can use to look up what a product is by using its code?

• Thanks for that Ali, a useful article for those of us in the Retail space, which we do not see enough of. No criticism of your article, but I note that in fact the US has fallen behind in the use of Barcodes, after introducing the great concept.

Possibly due to greater diversity outside the US, I see a wider use of Barcode formats. We use EAN as standard, but as well as simple Unique product identifiers, as you well described, I see Price and Weight embedded barcodes, e.g. like you would have for a variable weight item like a Steak. We are also seeing 2D and RFID codes coming in from Suppliers. (I worked for a large Wholesale food supplier, where tracking of meat products through the whole supply chain is heavily regulated).

I would very much like to see more articles dealing with industry practice in this area. I'm sure that we all face teh same challenges, like the leading Zero issues that you mentioned. I hate when the busines bring in UPC data into Excel and send it out, without the leading zeros, so now our Retailers cannot scan perfectly normal products !! Also, as Wholesalers, we see multiple Barcodes for the same physical item, if we source from multiple vendors. We also need to handle the Outer Barcode for the case and the Inner Barcode for the item that the Retailer sells, etc.

There has to be more than just me and you interested in this stuff ?? 🙂

Thanks again for taking the time to do the article. JK

Tks,

JK

• Michael Meierruth (2/28/2013)

In the algorithm for calculating the check digit, you have k = X mod 10. If k = 0 all is fine. Otherwise do 10-k. What do you think is the purpose of 10-k? What not just k all the time? What additional advantage does 10-k add to the process of calculating a check digit? I just don't see it. Who invented this?

Thanks for the question. I'm not sure why the algorithm was designed that way, I only implemented the sql code for it. Perhaps user 'Dohsan' is on to something with his explanation. As for who invented this entire coding system including the algorithm, I believe it was an entity called the Uniform Grocery Product Code Council, formed in the 1970s. This body was later renamed to the Uniform Code Council, and is now known as GS1 US. They are the central clearinghouse and administrators for uniform product codes in the United States. You can look them up here: http://www.gs1us.org/

Hakim Ali
www.sqlzen.com

• Dohsan (2/28/2013)

I believe the algorithm for calculating the check digit takes the difference between the 'weighted sum' and the nearest multiple of 10 that is equal or higher.

if the sum was 55, the nearest multiple of 10 that is equal or higher is 60

so 60-55 = 5 is the check digit

this can be changed to (10 - (sum mod 10)) mod 10

The reason for the second MOD is that in cases where the weighted sum is a multiple of 10, you would get a check digit of 10, which is too many digits. The second mod will then return this as 0.

Thank you for helping explain that. I do not know if that is the reason, but it sounds reasonable.

Hakim Ali
www.sqlzen.com

• As mentioned before, nice article.

Correction on EAN:I work in Norway and we now use exclusively GTIN (Global Trade Item Number) which was earlier known as EAN. http://www.gtin.info/

http://gepir.gs1.org/v32/xx/gtin.aspx?Lang=en-US

Example: 7070644000622

• grenm (2/28/2013)

great article, thanks !

How is the uniqueness of the code guaranteed? Is there an open-access database somewhere that I can use to look up what a product is by using its code?

Any company that wants to use a universal product code has to work with GS1 US, the administrators of the product code system in the United States. This is the body that assigns company codes. I'm assuming they have systems in place to ensure that company and product codes are unique. To a certain extent, the companies get to control the product family code and product code within the entire code, and I'm sure they would also have systems in place to ensure uniqueness since re-using the same code for separate products benefits nobody, least of all the manufacturers, retailers and consumers.

As for a central database, yes GS1 US has one, as do various retailers. From my own experience in this industry, I can tell you that this database is neither easy to get a hold of, nor free. GS1 US will sell it to you, as will various 3rd party re-sellers. Beware if you work in this field: the quality of data is very very shoddy for the most part, you'll spend more time cleaning it up and scrubbing it than you would like to. If you do a web search, you should be able to find websites that will let you look up one or a few product codes at a time and convert from one format to another, but nothing that will let you download the entire database for free that I'm aware of.

Hakim Ali
www.sqlzen.com

• Joe Kelly-376311 (2/28/2013)

...

I hate when the busines bring in UPC data into Excel and send it out, without the leading zeros, so now our Retailers cannot scan perfectly normal products !!

...

There has to be more than just me and you interested in this stuff ?? 🙂

Thank you for your comments. OMG, I cannot tell you how much time I have spent cleaning up product codes with leading zeroes missing. I even wrote a function to "complete" an incomplete UPC-A, but cannot share it because it was done on company time and is covered by a NDA.

Yes, I'm sure there are more people interested in this work, but my own experience is that resources on the web are either lacking or very expensive, so people who do work in this space either buy software or create their own which they don't want to share out, at least not for free.

Hakim Ali
www.sqlzen.com

• ah 25926 (2/28/2013)

As mentioned before, nice article.

Correction on EAN:I work in Norway and we now use exclusively GTIN (Global Trade Item Number) which was earlier known as EAN. http://www.gtin.info/

http://gepir.gs1.org/v32/xx/gtin.aspx?Lang=en-US

Example: 7070644000622

Thanks for shedding more light on that.

Hakim Ali
www.sqlzen.com

• Hi JK, I would like to put in my 2 cents here 🙂

I do not agree with you that US is lagging behind everyone else in usage of UPC. it's just more strictly regulated here than in other places. which is not necessary a bad thing.

here (US) you can use any barcode format that suits your need, as long as it is only used with in your business infrastructure. however if you will be packaging your product for retail you MUST include a standard UPC-A/UPC-E or EAN barcode that have a registered information on it in standard format.

so, if you deal with specialty items a lot you will see a multitude of barcodes

on an item packaging, most of which you might not be able to process

but you always should be able to find a standard code in one of approved formats to identify the item universally. and that I think what this article was intended to help you with, as an entry point into dealing with UPC data.

and describes most used (at least in US) UPC formats

Also , keep in mind that UPC was invented to represent an identification to be used only as a visual ID representation that could be machine readable. It is simply a globally unique numeric ID that links the specific item to Country/Manufacturer/Item family.

Alone by itself it is meaningless.

and of course some new variation of the unique identification that is also a machine readable have been developed over the years.

And we need to use them accordingly as well.

So to response to your reference to other codes used now days , if you want to design future-proof system you need to ensure that

You can accept/process/store any data regardless of source.

On one of my past employment, I was designing an package of apps that used barcode info.

It was for a chain store enterprise that had a single warehouse location and multiple retail locations including mail order.

The package included a warehouse app/db that collected and process data

For local inventory (in warehouse supply product availability)

For mail order

For individual location orders and inventory.

Business logic dictated that each item :

1. can have multiple vendors (as in the same Manufacturer multiple suppliers/wholesalers) ? same UPC code/packaging

2. any vendor can have same item from different manufactures ? different UPCcode/packaging

3. can have item repackaged for private branding or item can be acquired as wholesale/OEM lot(no packaging) and packaged locally

4. must be able to assign a custom Barcode to any item using internal barcode printers/labels regardless of above conditions

But all of it must be able to track back to a specific item/price/inventory location.

As far as I know the application is still in use after 10 years now. No problems

The custom barcodes were not UPC-A/UPC-E type

But printers and scanners could read them

You could enter code by hand/barcode scanner

You can link barcode or multiple codes to a single item.

Any item that was available from multiple vendors was assigned custom barcode to indicate what vendor it was from

Or a default cost/price was linked to UPC.

Each location POS system would export daily inventory data into Access/TXT files and transmit it to main office.

via phone/DSL

main office module would import all data in to central DB and generate reorder tickets as needed

• ah 25926 (2/28/2013)

As mentioned before, nice article.

Correction on EAN:I work in Norway and we now use exclusively GTIN (Global Trade Item Number) which was earlier known as EAN. http://www.gtin.info/

http://gepir.gs1.org/v32/xx/gtin.aspx?Lang=en-US

Example: 7070644000622

Agreed, GTIN identifiers are actually a super-set of UPC-A, UPC-E, and EAN-8 and EAN-13, and thus are a very common and flexible representation for product codes. Anyone interested in inventorying product codes with global reach across numerous countries/regions would probably be best served storing GTIN instead of UPC-A.

Also, I believe your GTIN example should be 14 digits (07070644000622?), not 13.

• hakim.ali (2/28/2013)

Michael Meierruth (2/28/2013)

In the algorithm for calculating the check digit, you have k = X mod 10. If k = 0 all is fine. Otherwise do 10-k. What do you think is the purpose of 10-k? Why not just k all the time? What additional advantage does 10-k add to the process of calculating a check digit? I just don't see it. Who invented this?

Thanks for the question. I'm not sure why the algorithm was designed that way, I only implemented the sql code for it. Perhaps user 'Dohsan' is on to something with his explanation. As for who invented this entire coding system including the algorithm, I believe it was an entity called the Uniform Grocery Product Code Council, formed in the 1970s. This body was later renamed to the Uniform Code Council, and is now known as GS1 US. They are the central clearinghouse and administrators for uniform product codes in the United States. You can look them up here: http://www.gs1us.org/%5B/quote%5D

What I'm sustaining is that there is no added value obtained in calculating the check digit by applying this extra 10-k rule. It provides no extra value in making the check digit more functional or reliable or meaningful or whatever as a check digit.

• Michael Meierruth (2/28/2013)