Stairway to SQL Dialects

Stairway to SQL Dialects Level 2: Postgres

,

In this level of the Stairway to SQL Dialects series, I focus primarily on how to convert data declaration language (DDL) statements—or data definition language, in some circles—and data manipulation language (DML) statements from Postgres to Microsoft SQL Server. I also touch upon such issues as transaction models and supported platforms. I recommend that, when you have to port SQL dialect code, you first paste the code into the Management Studio and parse it. If you get lucky, it will run as is. All SQL products use ANSI standard SQL as their core language, with product-specific dialect added on to that core.

History

If you work with databases, you ought to know the name Michael Stonebraker. He did the first serious work on SQL optimizers and then spun off more database products than most of us have worked with in our careers. If you want to check out what he’s done, google Ingres, PostgreSQL, Mariposa, Cohera, Aurora, StreamBase, C-Store, Vertica, Morpheus, Goby, H-Store, VoltDB, and SciDB.

His first research project was Ingres at the University of California, Berkeley, starting in the early 1970s and ending in the early 1980s. Postgres, also referred to as PostgreSQL, evolved from the Ingres project. Postgres used many of the ideas of Ingres, but not its code. Postgres is now a popular open source product with a strong user community. In fact, version 9.2 should be released in fall of 2012, around the time of the European Postgres users conference in Prague, Czech Republic in October 2012.

Implementation Considerations

The vast majority of Postgres installations are on Linux and UNIX platforms, not on Microsoft Windows. In addition, transaction control is implemented differently in Postgres than how it is implemented in SQL Server. T-SQL sets locks to avoid conflicts, a process referred to as pessimistic concurrency control because it assumes that conflict is the normal situation.

Postgres uses a system known as multiversion concurrency control (MVCC), which gives each user a “snapshot” of the database. A user’s changes are not visible to other users until a transaction is committed. This process is referred to as optimistic concurrency control because it assumes that conflicts are rare and can be handled as exceptions.

DDL in Postgres

Postgres DDL should hold no surprises for the SQL Server developer. Most of the code can be moved to T-SQL with a cut-and-paste operation and a little minor editing. Some of the proprietary data types require specialized knowledge, but the basic types are pretty much the same in all SQL products.

Numeric

Postgres supports the following numeric data types:

  • INTEGER: Vanilla four-byte integer in the plus or minus two billion range and the same as INT4 internally. The INTEGER data type is compatible with the INT data type in SQL Server.
  • INT2: Two-byte integer in the plus or minus thirty-two thousand range. The INT2 data type is compatible to the SMALLINT data type in T-SQL.
  • INT8: Eight-byte integer compatible to the BIGINT data type in T-SQL dialect. I think you see the pattern now.
  • NUMERIC(s,p): An exact decimal of scale (s) and precision (p). The NUMERIC data type is compatible with the NUMERIC(s,p) and DECIMAL(s,p) data types in T-SQL. In fact, NUMERIC(s,p) and DECIMAL(s,p) are the same in T-SQL, although we tend to use DECIMAL. In the ANSI/ISO standards, they are different! DECIMAL has the least precision of the two and perhaps more which it used to round back to (p) decimal places, while NUMERIC follows an exact precision (p). You should probably stick with DECIMAL in SQL Server in case Microsoft comes up to standards.
  • FLOAT: A numeric value with 15 digits of precision. However,
  • FLOAT4 is a numeric value with six digits of precision.
  • T-SQL has a FLOAT(n) option, which is the best way to do a conversion. The ISO synonym for real is FLOAT(24).
  • MONEY: A currency data type as crazy as the T-SQL data type of the same name. It is locale sensitive, so you have to know what the database setting is. The T-SQL MONEY type does not do correct math, so do not blindly map the Postgres data type to the T-SQL one. Use the T-SQL DECIMAL data type instead.
  • OID, SERIAL and SEQUENCE: OID stands for object identifier, an exposed physical locator that applies to all tables in a schema and is a not sequential. Like IDENTITY, the OID data type can be used as a fake primary or foreign key and in joins. Code that depends on such proprietary exposed physical locators in any dialect needs to be rewritten.
  • The SERIAL data type in Postgres serves a function similar to the non-relational IDENTITY data type in T-SQL. The two data types auto increment.
  • Also, both Postgres and T-SQL now support the ANSI standards CREATE SEQUENCE statement.

The math functions in Postgres are the usual +,-,*,/, and % operators, just like you find in T-SQL, but ! is factorial which does not exist in T-SQL and ^ is the exponent operator instead of the T-SQL POWER() function. But we have @ as well as ABS() for absolute value, |/ for square root, and ||/ for cube root as well as POWER(). The other math functions are pretty clear. In Postgres, the comparison operators also allow both the ANSI <> and != operators, just like T-SQL. Of course good programmers will not use != in their code.

Postgres has the ANSI x BETWEEN SYMMETRIC y AND z, which is a shorthand for ((x BETWEEN y AND z) OR (x BETWEEN z AND y)), so you have a simple text edit when converting your statement to T-SQL.

In addition, Postgres allows another pair of shorthand notations: expression ISNULL for expression IS NULL and expression NOTNULL for expression IS NOT NULL, but be careful when porting code not to blindly change ISNULL to COALESCE.

Postgres has the new ANSI comparison operator expression1 IS [NOT] DISTINCT FROM expression2 that T-SQL does not. For non-NULL inputs, IS DISTINCT FROM is the same as the <> operator. However, when running this statement in Postgres, if both inputs are NULL, the statement returns FALSE, and if only one input IS NULL, it returns TRUE. Similarly, IS NOT DISTINCT FROM is identical to = for non-NULL inputs, but in Postgres the statement returns TRUE when both inputs are NULL, and FALSE when only one input IS NULL. Thus, Postgres effectively treats these constructs as though NULL were a data value, rather than UNKNOWN.

There are several ways to do this in T-SQL to handle NULLs, such as COALESCE (expression1, absurd constant)[= | <>] COALESCE (expression2, absurd constant).

Don’t relax yet! Postgres also has the ANSI logical value test. Boolean values can be tested using the operators shown in Listing 1, which map three-valued logic to two-valued logic.

search condition IS TRUE
search condition IS NOT TRUE
search condition IS FALSE
search condition IS NOT FALSE
search condition IS UNKNOWN
search condition IS NOT UNKNOWN

Listing 1: ANSI logical value test in Postgres

These can be a real problem to translate to T-SQL. In the CHECK() clause of a DDL statement, UNKNOWN and TRUE are the same, the benefit of doubt assumption that lets a column be NULL-able. In the ON and WHERE clauses of a DML statement, UNKNOWN and FALSE are the same! Starting to see how complex trying to map this into T-SQL?

Strings

The Postgres CHAR(n) and VARCHAR(n) data types present no surprises. They work just like they do in T-SQL. The world is written in ASCII and Unicode these days.

The Postgres TEXT data type allows the string to fill the length of the row. In T-SQL dialect, you might use VARCHAR(MAX). The better way would be to use an exact length in both dialects.

To embed a single quote in a string value, such as the apostrophe in the name O'Donnell, you can use a pair of single quotes, as in O''Donnell, or you can use the Unix convention of a backslash to escape the single quote, as in O\'Donnell. The first will port, the second will not, but it’s an easy text edit.

String operation conversions are best done by hand. The concatenation symbol in Postgres is the ANSI ||, not the overloaded + of T-SQL. While Postgres has the LIKE predicate, it also uses ~~ instead and a single ~ for a regular expression.

Temporal

Like T-SQL, Postgres supports a number of date/time data types:

  • DATE: This explains itself. Postgres uses the ISO-8601 display format that has been part of the ANSI standard but only recently became the default for the DATE data type in T-SQL.
  • TIME: This too explains itself. Postgres uses the ISO-8601 display format that has been part of the ANSI standard but only recently became the default time data type in T-SQL.
  • TIMESTAMP: This data type is comparable to the old DATETIME data type in T-SQL, in the sense that it gives the system clock in a human-readable format and allows temporal math. Today, it would be better to use the new DATETIME2(n) in T-SQL when porting your code.
  • INTERVAL: An ANSI standard that holds an interval of time units. T-SQL does not have a direct equivalent. Instead, a proprietary function call is used.

Temporal Functions

The ANSI standard function CURRENT_TIMESTAMP is also in T-SQL now, replacing the old Sybase/UNIX getdate(). There are also the CURRENT_DATE function, which we have to write as CAST (CURRENT_TIMESTAMP AS DATE), and the CURRENT_TIME functions, which we have to write as CAST (CURRENT_TIMESTAMP AS TIME).

The INTERVAL data type and expressions with over-loaded plus and minus operators replace the T-SQL function calls for temporal math, such as DATEADD. Table 1 provides several examples that demonstrate how the INTERVAL data type works. And, yes, this is ANSI standard syntax.

ExampleResult
DATE '2001-09-28' + INTEGER '7'DATE '2001-10-05'
DATE '2001-09-28' + INTERVAL '1 HOUR'TIMESTAMP '2001-09-28 01:00:00'
INTERVAL '1 DAY' + INTERVAL '1 HOUR'INTERVAL '1 DAY 01:00:00'
TIMESTAMP '2001-09-28 01:00' + INTERVAL '23 HOURS'TIMESTAMP '2001-09-29 00:00:00'
TIME '01:00' + INTERVAL '3 HOURS'TIME '04:00:00'
- INTERVAL '23 HOURS'INTERVAL '-23:00:00'
DATE '2001-10-01' - DATE '2001-09-28'INTEGER '3' days understood
DATE '2001-10-01' - INTEGER '7'DATE '2001-09-24'
DATE '2001-09-28' - INTERVAL '1 HOUR'TIMESTAMP '2001-09-27 23:00:00'
TIME '05:00' - TIME '03:00'INTERVAL '02:00:00'
TIME '05:00' - INTERVAL '2 HOURS'TIME '03:00:00'
TIMESTAMP '2001-09-28 23:00' - INTERVAL '23 HOURS'TIMESTAMP '2001-09-28 00:00:00'
AL '1 DAY' - INTERVAL '1 HOUR'INTERVAL '1 DAY -01:00:00'
TIMESTAMP '2001-09-29 03:00' - TIMESTAMP '2001-09-27 12:00'INTERVAL '1 DAY 15:00:00'
900 * INTERVAL '1 SECOND'INTERVAL '00:15:00'
21 * INTERVAL '1 DAY'INTERVAL '21 DAYS'
DOUBLE PRECISION '3.5' * INTERVAL '1 HOUR'INTERVAL '03:30:00'
INTERVAL '1 HOUR' / DOUBLE PRECISION '1.5'INTERVAL '00:40:00'

Table 1: Calculating date and time intervals in Postgres

Instead of the proprietary DATEPART() function used in T-SQL, Postgres has the EXTRACT(field FROM source) function, along with a few other functions.

Other Data Types

Postgres also supports the BOOLEAN data type, which has the following characteristics:

  • Does not support NULL values.
  • Accepts the following true input values: TRUE, T, YES, Y, 1.
  • Accepts the following false input values: FALSE, F, NO, 0
  • Returns only the values t and f, for true and false, respectively.

You’ll probably map the BOOLEAN data type to the BIT type in SQL Server and configure it as NOT NULL. Remember, though, that BIT is a numeric data type in T-SQL, which means it’s NULLable, and that logical operations on it make no sense.

Postgres also provides geometry data types, network IP address types, and some internal types. With luck, you won’t have to port those. In addition, Postgres supports XML and array columns as well as text searching.

There are also enumerated types (the ENUM in the C family of languages) and arrays in Postgres, which violate ANSI standards and the relational model, and they don’t have a match in T-SQL either. However, a discussion of these is beyond the scope of this level.

Temp Tables

The model for temporary tables in Postgres is weaker than T-SQL in some ways. It is a way to park a query result for later use. You cannot create a temp table with column declarations but only with a query. They use a full CREATE statement where T-SQL has the # and ## prefixes on names that can appear in other statements. Here is the syntax:

CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE table_name
    [ (column_name [, ...] ) ] [ [ WITH | WITHOUT ] OIDS ]
    AS query DML in Postgres 

The INSERT INTO statement in Postgres follows the ANSI standard, including support for the DEFAULT VALUES option, just like T-SQL. However, in Postgres you must use the INTO keyword (as in INSERT INTO), which is not required in T-SQL dialect.

The UPDATE VALUES statement in Postgres supports the ANSI standard, just like T-SQL. including a version of the proprietary FROM clause. This should be converted to a MERGE statement.

The DELETE FROM statement in Postgres supports the ANSI standard, just like T-SQL. However, Postgres requires the FROM keyword in the DELETE clause; T-SQL does not.

The Postgres SELECT statement is also straight ANSI syntax, and shown in Listing 2.

[WITH cte queries] SELECT [DISTINCT]select_list FROM table_expression [sort_specification];

Listing 2: The syntax for the Postgres SELECT statement

Postgres lets you specify a SELECT without a FROM clause to display a row of computations, just like in T-SQL. This is not a query, but an over-loaeded keyword. Queries are the usual SELECT.. FROM statement, in which the table expression can be a simple table list, an inner join, or a left, right, or full outer join, just like you have in T-SQL. However, Postgres does support extra shorthand options, as shown in Listing 3.

<table 1> <join type> <table 2> USING (join column list)

Listing 3: The USING clause available to joined tables in the FROM clause

The USING clause contains a comma-separated list of column names common to the joined tables. The tables are joined based on equal value pairs between a set of columns. Furthermore, the output of JOIN USING has one column for each of the equated pairs of input columns, followed by the remaining columns from each table. Thus, USING (a, b, c) in Postgres is equivalent to ON (t1.a = t2.a AND t1.b = t2.b AND t1.c = t2.c) in SQL Server, with the exception that if ON is used there will be two columns a, b, and c in the result, whereas with USING there will be only one of each (and they will appear first if SELECT * is used).

The NATURAL option (shown in Listing 4) is a shorthand form of USING; it forms a USING list consisting of all column names that appear in both input tables. As with USING, the common columns appear only once in the output table. If there are no common columns, NATURAL behaves like CROSS JOIN.

<table 1> NATURAL <join type> <table 2>

Listing 4: Using the NATURAL option when joining tables

Most Postgres programmers do not use NATURAL and USING; they are dangerous because tables can change. Translate the shorthand by hand. In Postgres, you can also use a view or table valued function in the FROM clause. The syntax for a table valued function is similar to T-SQL's syntax but with different semantics. It is too complicated for a short article.

You should run into few surprises when you port a WHERE clause from Postgres to SQL Server. The same thing goes for the GROUP BY and HAVING clauses.

The UNION and UNION ALL operators in Postgres are also like T-SQL, but Postgres supports the full INTERSECT and EXCEPT operations with the ALL options. You will need to hand-code those when porting to T-SQL.

The ORDER BY clause in Postgres supports the full set of ANSI options, as shown in Listing 5.

ORDER BY sort_expression1 [ASC | DESC] [NULLS {FIRST | LAST}]
  [, sort_expression2 [ASC | DESC] [NULLS {FIRST | LAST}] …]  [ LIMIT { number | ALL } ] [ OFFSET number ]

Listing 5: The syntax for the ORDER BY clause in Postgres

If a limit count is given in the ORDER BY clause, no more than that many rows will be returned (but possibly fewer, if the query itself yields fewer rows). LIMIT ALL is the same as omitting the LIMIT clause.

The OFFSET clause in Postgres says to skip that many rows before beginning to return rows. OFFSET 0 is the same as omitting the OFFSET clause, and LIMIT NULL is the same as omitting the LIMIT clause. If both OFFSET and LIMIT appear, then OFFSET rows are skipped before starting to count the LIMIT rows that are returned.

When porting code to T-SQL, the best approach is to use the ROW_NUMBER() function and a containing query based on the syntax shown in Listing 6.

 WHERE row_nbr BETWEEN offset_nbr AND offset_nbr + limit_nbr

A sort expression can be any expression that would be valid in the query’s SELECT list.

Common Table Expressions

When it comes to common table expressions (CTEs) in Postgres, there are a few surprises when compared to T-SQL. In Postgres, for example, you introduce a recursive CTE by using WITH RECURSIVE, which is part of the ANSI standard.

Also, in Postgres you can use a CTE with other DML statements, but not quite the same way as T-SQL. The body can be more than just a SELECT statement in Postgres, as shown in the example in Listing 6.

WITH Moved_Rows
AS 
(DELETE FROM Products
  WHERE shipment_date BETWEEN '2010-10-01' AND '2010-11-01'
 RETURNING *)
INSERT INTO Products_Log
SELECT * FROM Moved_Rows;

Listing 6: Defining a DELETE statement in a CTE

This statement effectively moves rows from the Products table to the Products_Log Table. The DELETE FROM statement in the WITH clause deletes the specified rows from Products and returns their contents by means of the RETURNING clause. The SELECT statement then reads that output and inserts it into Products_Log. T-SQL would do this with an OUTPUT clause.

This can get very elaborate. You can delete from more than one table at a time, do weird inserts and so forth. The only merciful restriction in these sorts of statements is that the WITH clauses cannot be recursive. Expect to turn these constructs into stored procedures.

Bottom Line

Porting applications to SQL Server is not the usual direction. Developers are more likely to port SQL Server to Postgres because it is open source, supports more ANSI features, runs on more platforms, uses a different ACID model and has a great optimizer.

If you live in a Microsoft world and need to move Postgres code to SQL Server, don’t expect the port to be easy. You might luck up, but that’s not the way the smart money bets. The pure SQL part will not hard if the original Postgres is clean and simple; you will have to write the extra ANSI features with complicated T-SQL. The hard part is that Postgres assumes a non-Microsoft environment and a different concurrency model.

This article is part of the parent stairway Stairway to SQL Dialects

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating