Array and Custom Data Types in PostgreSQL

,

Introduction

Just like every other database, PostgreSQL has its own set of basic datatypes, like Boolean, Varchar, Text, Date, Time, etc. We can easily store data like numbers, characters, date, time, etc. using these supported datatypes, but what if we need to store multiple data elements in a single column?

Suppose we are storing the data of employees in an organization and it's obvious that most of the employees will have multiple contact numbers alternate contact numbers. How do we store them all in one column instead of having different columns for all the numbers? We can use of ARRAY datatype in PostgreSQL, which we will cover in-depth shortly.

PostgreSQL also supports different forms of checks and restrictions on a column in DDL scripts. What if we want to have similar restrictions for the columns of multiple tables? Instead of repeatedly writing the redundant restriction logic in every DDL script, we can make use of custom data type here. We will also cover these custom types later in the article.

Scenario

Suppose an employee can add multiple contact numbers to their record. We need to store these contact numbers against a single column in the database. Let us assume that the first contact number is the primary and the remaining are all alternate contact numbers. We will use the array data type to accommodate the column containing the contact number to store multiple numbers.

Let us create an employee table with the contact_number column as an ARRAY data type. We can run the following query to create the table:

CREATE TABLE employee_info (
employee_id serial PRIMARY KEY,
employee_full_name VARCHAR (100),
contact_number TEXT []
);

Once the table is created, we can store some data into it. We have three employees have multiple contact numbers. We add these three employees using the following query:

INSERT INTO employee_info (employee_full_name, contact_number)
VALUES('Sabyasachi Mukherjee',ARRAY [ '+91-1234567890','+91-0987654321' ]);
INSERT INTO employee_info (employee_full_name, contact_number)
VALUES('Pawel Smith',ARRAY [ '+1-1234567','+1-0987654' ]);
INSERT INTO employee_info (employee_full_name, contact_number)
VALUES('Steven Hopkins',ARRAY [ '+91-55667890' ]);

If we check the contents of the employee_info table now, we will see:

Retrieving specific data from the array

We can access array elements using the subscript within square brackets,[]. We just assumed that the first phone number of every employee is his/her primary contact number, so the subscript used would be "1". If we want to query for the primary contact number of every employee in the organization, we can run the following query:

select employee_full_name, contact_number[1]
from employee_info;

We get the following output:

We can also do many more operations on this array data type, just like we can do on other data type columns. We can check for the employee information whose second contact number is "+91-0987654321" by running the following query:

select employee_full_name
from employee_info
where contact_number[2] = '+91-0987654321';

This will look for all the matching records in the table whose 2nd contact number (2nd array element) matches to the given number. This returns the following output:

What if we tried to access an index from the array that doesn't exist?

In the below data set, what if we tried to access the third contact number of Sabyasachi Mukherjee, which actually doesn't exist?

If we run a query to get the third contact number of Sabyasachi, the query would look like:

select employee_full_name, contact_number[3]
from employee_info 
where employee_full_name = 'Sabyasachi Mukherjee';

However if we run this query, we will get a null value returned since there are only 2 contact numbers for this employee. Generally if we try to access the element at an index of an array that is beyond it's size, we get some index out of range/bound exception in most languages. Here we get just a null returned.

Searching for a specific data in the record of arrays

Suppose we need to find out the employee name who holds a particular phone number. This requires us to query across every record of the table and checking if "any" contact number matches the given phone number. We use the ANY() function. In case there are multiple employees having same phone number (which should not be the case ideally), the results will print multiple employee names.

In such a case we can run the following query:

select employee_full_name
from employee_info
where '+91-1234567890' = ANY(contact_number);

This gives us the following output:

We have seen how the array datatype really comes handy when we need to store multiple information against a single category in a single entry record.

The Custom Data Type

PostgreSQL gives us the flexibility to create user defined custom datatypes besides using the built in datatypes. We can create a custom datatype using either CREATE DOMAIN or  CREATE TYPE. CREATE DOMAIN creates the user defined datatype with support to use constraints such as NOT NULL, CHECK, etc. CREATE TYPE creates a composite user defined datatype, which is used in a stored procedure as the data type of the returned value.

We will see on how to use both DOMAIN and TYPE to create custom datatypes and understand their need using some basic scenarios that we might come across in our day to day work.

Custom Domain

We all know that in many forms, we have to enter our first and last name individually. Ideally they should not contain spaces or be blank/NULL. Suppose we have a database where we store the candidate data and we need to make sure that no white space or blank values are entered in the first name and last name columns. To achieve this, we can use the following create table query:

CREATE TABLE candidate_data (
    candidate_id SERIAL PRIMARY KEY,
    first_name VARCHAR NOT NULL,
    last_name VARCHAR NOT NULL,
    email_id VARCHAR NOT NULL,
    CHECK (
        first_name !~ '\s'
        AND last_name !~ '\s'
    )
);

But it might be the case that we have many such tables where we are storing the first name and last name and everywhere we have to have a similar check. In this situation, instead of having the CHECK constraint in every CREATE TABLE query, it's wise and recommended to have a domain that will check for these restrictions and we can reuse the same code everywhere.

We can create a domain with similar check as follows:

CREATE DOMAIN candidate_name AS 
VARCHAR NOT NULL CHECK (value !~ '\s');

Now we can use this domain in the candidate_data table as follows:

CREATE TABLE candidate_data (
    candidate_id SERIAL PRIMARY KEY,
    first_name candidate_name,
    last_name candidate_name,
    email_id VARCHAR NOT NULL
)

Now if we try to enter a first name with spaces inside of it, the system it will give us the appropriate error as follows:

INSERT INTO candidate_data (first_name, last_name, email_id)
VALUES('Sabya sachi','Mukherjee','sabya.mukherjee@company.com');

However, if we insert data without spaces, the query successfully records the entry:

INSERT INTO candidate_data (first_name, last_name, email_id)
VALUES('Sabyasachi','Mukherjee','sabya.mukherjee@company.com');

Now we understand the use of DOMAIN. We can create our custom domain and use it when we need to have similar restrictions and checks in multiple columns across many tables in the database. It helps us to make DDL queries look cleaner and easy to maintain.

Additional information on custom domains

We might need to get the domain name information in a schema to reuse the same. In such situation, we can run the following in psql terminal:

\dD

This command list out the list of domains created in the database to which you are connected:

Custom data type

CREATE TYPE allows us to create a custom data type, which will be used as a data type of function return value. Suppose we need to get the employee name and primary contact number from the employee_info table using the employee_id provided. For this, we can have a function that will accept the employee_id and return the name and contact number as a custom datatype.

First, we need to create the suitable datatype using the following query:

CREATE TYPE employee_data AS (
    employee_name VARCHAR,
    contact_no VARCHAR
);

We have created a composite data type that includes the information about two different columns. In this case, there are two different data types, too. One is VARCHAR while the other is INT. Next, we need to define a function that will accept the employee_id and return the name and contact number accordingly. The function body would look like this code:

CREATE OR REPLACE FUNCTION get_employee_snapshot (e_id INT) 
RETURNS employee_data AS 
$$ 
SELECT
    employee_full_name,
    contact_number[1]
FROM
    employee_info
WHERE
    employee_id = e_id ; 
$$ 
LANGUAGE SQL;

The RETURN statement includes the new datatype that we created as the returned type.

Now, if we check the information we have in the employee_info table, we can see:

If we want to get the employee information for employee_id = 2, then we can call the function and get the output as follows:

Since custom data types are composite data types, we might need to edit them at a future time. What if we also need to return employee_id too? To edit an existing custom data type, we can make the use of ALTER TYPE keyword as follows:

ALTER TYPE employee_data ADD ATTRIBUTE employee_number INT;

Note: Once you change the field count in the custom data type, you also have to change the function to make sure same number of fields are returned, otherwise you will get an error.

To drop any attribute, you can run:

ALTER TYPE employee_data DROP ATTRIBUTE employee_number;

You can also rename the datatype with ALTER:

ALTER TYPE employee_data RENAME TO EMPLOYEE_SNAPSHOT_DATA;

Again after you rename a custom data type, make sure you change it's references in every function so that you don't end up getting any errors.

To drop a custom data type, we can use:

DROP TYPE EMPLOYEE_SNAPSHOT_DATA;

Note: You can only successfully delete a custom data type if it's not being reference in any function or elsewhere.

Conclusion

In this article, we have learnt on how to use the ARRAY data type to store multiple data of similar data type in a single column of a record. We learnt on how to use custom domains to our advantage for not creating same restrictions across multiple DDL statements. We also learnt on how to use custom data types to return manifold details at one shot.

Rate

5 (2)

Share

Share

Rate

5 (2)