## Introduction

As a database primer, the intention of this article is to provide an introduction to Universal Product Codes, and their usage in a relational database.

Universal Product Codes are codes used to uniquely identify retail products in many countries. The system originated in the United States in the 1970s to make grocery checkouts faster. Universal codes are printed on product packaging in the form of barcodes, for scanning at checkout. The code is also generally printed in text just below the barcode. Being universal means that the code on any given product will be the same throughout the country it is sold in.

Universal Product Codes come in more than one format. The most popular format is called the UPC-A, followed by the UPC-E and the EAN, among others. The UPC-A barcode represents 12 numerical characters (no letters or special characters). Each number is represented by a unique barcode. A typical UPC-A looks like this (this is for illustration only and is not a real barcode):

In this illustration, the first digit (1) is called the prefix. The next five digits (22222) represent the manufacturer. All products made by that manufacturer will have the same manufacturer code. The three digits after that (333) represent the product family code. For instance, all cereals manufactured by Kellogg will have the same code here, assuming Kellogg has coded them as one “family”. The next two digits (44) represent the individual product. An obvious limitation here is that each family can only have a hundred products within it, but manufacturers get around that by playing with family codes. The last digit (5) is the check digit, used to verify that the scanner has read the previous 11 characters correctly. A formula is used to compute the check digit from the first 11 characters, and then verified against this 12^{th} character.

The next most popular format is the UPC-E. It was invented to allow encoding of small items, such as bars of candy or chewing gum, that are too small to print a regular UPC-A on. It consists of 8 numeric characters. Unlike the UPC-A, the UPC-E does not have a straightforward representation for manufacturer, product family and product. It is simply a compressed version of the UPC-A. The EAN code (originally European Article Number, now International Article Number but still called EAN, not IAN) is a variation of the UPC-A that adds an extra character to the left of the barcode, representing the country where the product is sold.

A real example of a UPC-A code is 043000014240, which belongs to the cereal Honey Bunches of Oats, made by the company Post. Here, the manufacturer code is 43000, the product family code is 014, and the product code is 24. The barcode representations of the UPC-A and the EAN respectively look like this:

Note that the EAN has 13 digits, with the one extra digit on the left representing the country of retail, in this case the United States. The remaining 12 are the same as the UPC-A.

## Product Codes in Databases

If you are creating a database to store retail products, chances are you will want to store and look up UPC-A codes at the very least, and probably also UPC-E codes. These are the two most frequently used formats in the United States; product codes in many other countries are based on the US code system, and are very similar. You might also have the need to translate UPC-A codes to UPC-E and back.

For more granular lookup and analysis, you may also want to store code components (manufacturer code, product family code, product code) separately, in addition to the entire code. Since the codes are always numeric, the seemingly obvious choice is to store all these codes and components as numeric types. But think again, because often the leading characters in the entire UPC-A, and in individual components, are one or more zeroes.

A manufacturer code of, say ‘053’ becomes '53' when saved as an integer, making it incorrect and useless. I prefer storing the codes and components as char (they are always fixed length). However, many code related operations such as converting between UPC-A and UPC-E, calculating check digit etc require the numerical values of code characters, so you will need to convert to a numeric type in SQL code.

Probably the three most frequently used product code related operations are:

- Calculate UPC-A check digit, given the first 11 characters
- Convert a UPC-A into a UPC-E (not all UPC-A codes can be converted, they have to meet certain conditions)
- Convert a UPC-E into a UPC-A (all valid UPC-E codes have a counterpart UPC-A)

The code to perform these operations, in the form of T-SQL functions, is below and also attached to this article as .sql files. I am the original author of this code, and provide it here for anybody to use for free.

### **Calculate UPC-A check digit:**

if exists(select 1 from INFORMATION_SCHEMA.ROUTINES where ROUTINE_NAME = 'udf_Calculate_UPCA_CheckDigit' ) begin drop function dbo.udf_Calculate_UPCA_CheckDigit end go create function dbo.udf_Calculate_UPCA_CheckDigit ( @upca_11 varchar(20) ) returns varchar(1) as begin /************************************************************************** 2013/01/15 Hakim Ali Function returns the Check Digit when passed in first 11 characters of UPC-A. **************************************************************************/-- Local variables declare @checkdigit varchar(1) declare @sum_of_odds int declare @sum_of_evens int -- Initial settings set @checkdigit = '' set @upca_11 = ltrim(rtrim(isnull(@upca_11,''))) set @sum_of_odds = 0 set @sum_of_evens = 0 -- Calculate CheckDigit if (len(@upca_11) = 11 and isnumeric(@upca_11) = 1) begin set @sum_of_odds = convert(int,substring(@upca_11,1,1)) + convert(int,substring(@upca_11,3,1)) + convert(int,substring(@upca_11,5,1)) + convert(int,substring(@upca_11,7,1)) + convert(int,substring(@upca_11,9,1)) + convert(int,substring(@upca_11,11,1)) set @sum_of_evens = convert(int,substring(@upca_11,2,1)) + convert(int,substring(@upca_11,4,1)) + convert(int,substring(@upca_11,6,1)) + convert(int,substring(@upca_11,8,1)) + convert(int,substring(@upca_11,10,1)) if ((((@sum_of_odds * 3) + (@sum_of_evens)))%10 = 0) begin set @checkdigit = '0' end else begin set @checkdigit = convert(varchar(1),(10 - (((@sum_of_odds * 3) + (@sum_of_evens)))%10)) end end return @checkdigit end -- end create function go

### Convert UPC-A to UPC-E

if exists (select 1 from INFORMATION_SCHEMA.ROUTINES where ROUTINE_NAME = 'udf_UPCA_to_UPCE' ) begin drop function dbo.udf_UPCA_to_UPCE end go create function dbo.udf_UPCA_to_UPCE ( @upcA varchar(50) -- limiting to varchar(12) only reads first 12 characters of any longer strings passed in ) returns varchar(8) as begin /************************************************************************** 2013/01/17 Hakim Ali Function to take in UPC-A, calculate and return its UPC-E. **************************************************************************/-- Local variables declare @manuf_code varchar(5) declare @product_code varchar(5) declare @upcE varchar(8) -- Initial settings set @upcA = ltrim(rtrim(isnull(@upcA,''))) set @upcE = '' -- Calculate UPC-E if ( -- Required conditions for conversion: length must be 12 , must start with 0 or 1 , must have 0000 between positions 5 and 12 len(@upcA) = 12 and left(@upcA,1) in ('0','1') and substring(@upcA,5,8) like '00%' and isnumeric(@upcA) = 1 ) begin set @manuf_code = substring(@upcA,2,5) set @product_code = substring(@upcA,7,5) -- ---------------------------------------------------------------------------- -- Note: iterations must be followed in order. If type 1 applies, use it over type 2 and so on. -- ---------------------------------------------------------------------------- -- Type 1 if (right(@manuf_code,3) in ('000','100','200') and convert(int,@product_code) between 0 and 999 ) begin set @upcE = left(@upcA,1) + left(@manuf_code,2) + right(@product_code,3) + substring(@manuf_code,3,1) + right(@upcA,1) end -- Type 2 else if (right(@manuf_code,2) = '00' and convert(int,@product_code) between 0 and 99 ) begin set @upcE = left(@upcA,1) + left(@manuf_code,3) + right(@product_code,2) + '3' + right(@upcA,1) end -- Type 3 else if (right(@manuf_code,1) = '0' and convert(int,@product_code) between 0 and 9 ) begin set @upcE = left(@upcA,1) + left(@manuf_code,4) + right(@product_code,1) + '4' + right(@upcA,1) end -- Type 4 else if (convert(int,@product_code) between 5 and 9) begin set @upcE = left(@upcA,1) + left(@manuf_code,5) + right(@product_code,1) + right(@upcA,1) end end -- main if: Required conditions return @upcE end -- create function

### Convert UPC-E to UPC-A

if exists (select 1 from INFORMATION_SCHEMA.ROUTINES where ROUTINE_NAME = 'udf_UPCE_to_UPCA' ) begin drop function dbo.udf_UPCE_to_UPCA end go create function dbo.udf_UPCE_to_UPCA ( @upcE varchar(50) -- limiting to varchar(8) only reads first 8 characters of any longer strings passed in ) returns varchar(12) as begin /************************************************************************** 2013/01/18 Hakim Ali Function to take in UPC-E, calculate and return its UPC-A. **************************************************************************/-- local variables declare @main_six varchar(6) -- The 6 important (middle) digits of UPCE declare @upcA varchar(12) -- Calculated UPCA -- Iniial settings set @upcE = ltrim(rtrim(isnull(@upcE,''))) set @upcA = '' -- Calculate UPC-A if ( -- Required conditions for conversion: length must be 8 , must start with 0 or 1 len(@upcE) = 8 and left(@upcE,1) in ('0','1') and isnumeric(@upcE) = 1 ) begin set @main_six = substring(@upcE,2,6) if (right(@main_six,1) in ('0','1','2')) begin set @upcA = left(@upcE,1) + left(@main_six,2) + right(@main_six,1) + '0000' + substring(@main_six,3,3) + right(@upcE,1) end else if (right(@main_six,1) = '3') begin set @upcA = left(@upcE,1) + left(@main_six,3) + '00000' + substring(@main_six,4,2) + right(@upcE,1) end else if (right(@main_six,1) = '4') begin set @upcA = left(@upcE,1) + left(@main_six,4) + '00000' + substring(@main_six,5,1) + right(@upcE,1) end else begin set @upcA = left(@upcE,1) + left(@main_six,5) + '0000' + substring(@main_six,6,1) + right(@upcE,1) end end -- main if: Required conditions return @upcA end -- end create function