Stairway to SQL Dialects

Stairway to SQL Dialects Level 3: MySQL

,

History

Before we go any further, MySQL is pronounced “My S-Q-L”, not “my Sequel,” as I found out when I taught a series of online classes for the MySQL folks. What I also didn’t know at the time was that the product was named after developer Michael Widenius' daughter, My.

MySQL AB was founded in Uppsala, Sweden in 1995 to distribute the database system and make money from supporting it. In 2008 Sun Microsystems acquired the company, and then in 2010, the Oracle Corporation bought Sun Microsystems. There was a bit of worry that Oracle would no longer release the products as open source or would kill them completely. The same concern existed in the Java community when Oracle acquired Java. In both cases, Oracle has been playing nice with the open source community.

There is a rule that all open source software has to have an animal mascot. This is a law of the universe. MySQL has a dolphin. They also have a different model for how to manage a database, compared to what SQL Server programmers are used to.

Operating Systems, Storage Systems and Stacks

MySQL is not Microsoft only. It runs on Unix, Linux, Windows, Macintosh and even Novell NetWare, and it is highly portable. At its most basic level, MySQL supports several storage management systems. The most common one is InnoBase, but there are also MyISAM for read speed, InnoDB for transactions and referential integrity, and MySQL Archive for storing historical data.

Finally, MySQL is part of the LAMP stack—Linux, Apache, MySQL and PHP (or Perl)—that dominates the website market.

In ecology, there are the Laws of Adaptation. The first law says that the better adapted to one environment a species is, the worse adapted it is to all other environments. In English, that means Polar Bears do well at the North Pole but not in the Amazon. The second law is that the more environments a species can occupy, the more efforts that species must make to adapt in each environment. Human beings can live at the North Pole and in the Amazon, but we need parkas at the North Pole and shorts and mosquito nets in the Amazon.

MySQL needs to be fitted to each of its environments. This is where a lot of the work is done, but in this level of the Stairway to SQL Dialects, I deal only with issues related to the data definition language (DDL) and data manipulation language (DML) that you’ll need to be aware of when porting code to SQL Server.

DDL and Data Types

MySQL is a heavy dialect and the code can be hard to read. It is further complicated by the tendency of MySQL programmers to write SQL in the same style as the host language.

The basic CREATE statements look familiar until you get to the last parenthesis. MySQL follows it with a list of options that tell the engine about the environment in which the entity is being created. The starting point for AUTO_INCREMENT, collations, appear here.

In MySQL, you can create an index in the CREATE TABLE statement. Be ready to remove the index or to write a separate CREATE INDEX statement when you port the code to T-SQL.

The foreign key definition in a MySQL CREATE TABLE statement is also different from SQL Server, as the syntax in Listing 1 shows.

    REFERENCES tale_name (index_col_name, ...)
      [MATCH FULL | MATCH PARTIAL | MATCH SIMPLE]
      [ON DELETE reference_option]
      [ON UPDATE reference_option]

Listing 1: The foreign key syntax in a MySQL CREATE TABLE statement

MySQL supports the full ANSI syntax for foreign keys. T-SQL does not yet have the MATCH options. But that doesn’t matter! Using an explicit MATCH clause in MySQL doesn’t have the specified effect—and also causes ON DELETE and ON UPDATE clauses to be ignored. For these reasons, specifying MATCH in a MySQL CREATE TABLE statement should be avoided.

The MATCH clause in the SQL standard CREATE TABLE statement controls how NULL values in a multi-column foreign key are handled when comparing values to a PRIMARY KEY column. InnoDB essentially implements the semantics defined by MATCH SIMPLE option, which permits a foreign key to be all or partially NULL. In that case, the referencing table row containing such a foreign key is permitted to be inserted, even if it doesn’t match any rows in the referenced table.

Additionally, MySQL and InnoDB require that the referenced columns be indexed for performance. However, the system does not require that the referenced columns be UNIQUE or be declared NOT NULL. Arrgh! Look for this kind of thing to be done using triggers in MySQL.

The CREATE PROCEDURE and CREATE FUNCTION statements include the ANSI options in the header. You can drop these options in T-SQL. They are signals to the compiler.

Numeric Data Types

MySQL supports the usual ANSI standard numeric data types: BIGINT, INTEGER, DECIMAL, SMALLINT and the obvious extensions MEDIUMINT and TINYINT. But there are also the optional SIGNED, UNSIGNED and ZEROFILL options that need explaining.

Some numeric data types support the SIGNED and UNSIGNED attributes. SIGNED numbers can be positive or negative; UNSIGNED numbers can only be positive. This means that these attributes also affect the range of values! You’ll need to use a CHECK(col BETWEEN low_range AND high_range) constraint when porting your code to T-SQL. And it gets worse; the ZEROFILL option affects how data is displayed and not the values. For example if I declare a column as INT(5) and assign it the value five, you will get the output 00005, with UNSIGNED implied.

This is easier to understand with an example that shows a pattern that is a mix of C and COBOL, that is, a mix of low-level machine binary for the math and character strings to be displayed. Consider the following syntax:

  INTEGER [(n)] [UNSIGNED] [ZEROFILL]

The SIGNED range is -2, 147, 483, 648 to +2, 147, 483, 647, just as you would expect. But the UNSIGNED range is 0 to 4, 294, 967, 295. The default value of (n) is 11 digits, but I can trim it with lower values.

MySQL also supports the FLOAT and DOUBLE PRECISION data types for floating point numbers. You will need to use them when your storage must adhere to IEEE standards. Most of us will never use them.

In MySQL, you’ll also find nonstandard synonyms used for some of their types (SERIAL for BIGINT, DOUBLE for DOUBLE PRECISION, NUMERIC or FIXED for DECIMAL). There are extension for BOOLEAN (done as TINYINT(1)) and BIT. This is straight text edit work when porting to SQL Server.

The math functions are not a great surprise. The modulo operators % and MOD()—an ANSI standard—are synonyms and map to the T-SQL % operator, but you also have the DIV operator in MySQL.

The worst data types are ENUM and SET, which are taken from the C family of languages. An ENUM column in MySQL should be translated into a CHECK (col IN (..)) constraint in T-SQL. The SET data type needs to be normalized to a one-column table or VALUES() table construct. Watch out for special functions used with them!

AUTO_INCREMENT is the MySQL version of IDENTITY, and you can do a direct replacement when porting to T-SQL.

Temporal Data Types

The temporal types in MySQL pretty much follow the ANSI Standard in format but not in their ranges. This is not usually a problem in most applications unless you work for a museum or historical society. Table 1 lists the temporal types available in MySQL, along with how their values are stored and the range of values each one supports.

Data TypeFormatRange
DATEyyyy-mm-dd1000-01-01 to 9999-12-31
DATETIMEyyyy-mm-dd hh:mm:ss1000-01-01 00:00:00 to 9999-12-31 00:00:00
TIMESTAMPyyyy-mm-dd hh:mm:ss1970-01-01 00:00:00 to 2037-12-31 00:00:00
TIMEhh:mm:ss-838:59:59 to +838:59:59
YEAR [(2)|(4)]yy or yyyy1970 to 2069 and 1901 to 2155

Table 1: The temporal functions in MySQL

The temporal functions in MySQL are a mix of ANSI and T-SQL and support a number of synonyms. For example:

  • The ADDDATE (<date>, INTERVAL <int> <temporal unit>) function is a synonym for the DATE_ADD(<date>, INTERVAL <int> <temporal unit>) function.
  • The SUBDATE (<date>, INTERVAL <int> <temporal unit>) function is a synonym for the DATE_SUB(<date>, INTERVAL <int> <temporal unit>) function.
  • The NOW() function is a synonym for the CURRENT_TIMESTAMP()function.

The <temporal unit> placeholder can be anything from YEAR to MICROSECOND. But unlike T-SQL dialect, MySQL has intervals such as HOUR_SECOND, which gives us an interval in the hh:mm:ss format to specify hours, minutes, and seconds.

MySQL also support such functions as CURRENT_DATE() and CURRENT_TIME(), as well as many other functions. I recommend that you keep a table that shows how the functions are converted from MySQL to T-SQL, and then convert them by hand or through text editing when porting your code.

Character Data

MySQL supports the usual CHAR(n) and VARCHAR(n) data types. However, because MySQL is closer to the hardware, it also supports binary, blob, spatial and text data types, and some of them cannot be easily translated to T-SQL. Luckily, the CHAR(n) and VARCHAR(n) data types are the ones most commonly used.

In MySQL, you’ll also find a mix of ANSI and C family string functions. The one that will drive you nuts is CONCAT(<string expression list>). Because the MySQL people could not use the ANSI || or the overloaded + from T-SQL, they came up with the CONCAT() function.

The POSITION(substr IN str) function is standard ANSI and a synonym for LOCATE(substr, str), which is comparable to the CHAR_INDEX() function in T-SQL.

The T-SQL SUBSTRING(str, pos, len) function has a whole family of synonyms in MySQL, including STR(str, pos), SUBSTR(str FROM pos), SUBSTR(str, pos, len), LEFT(), RIGHT() and the ANSI version, SUBSTR(str FROM pos FOR len).

As a generalization, you can expect to see synonyms in MySQL for basic string functions from any language you have seen. Table 2 provides a brief description of each of these functions.

NameDescription
ASCII()Returns the numeric value of the left-most character.
BIN()Returns a string representation of the argument.
BIT_LENGTH()Returns the length of the argument in bits.
CHAR_LENGTH()Returns the number of characters in the argument.
CHAR()Return the character for each integer passed in as an argument.
CHARACTER_LENGTH()A synonym for CHAR_LENGTH().
CONCAT_WS()Returns the concatenated string with the specified separator.
CONCAT()Returns the concatenated string.
ELT()Returns the string from a series of strings, as specified by index number.
EXPORT_SET()Returns a string such that for every bit set in the value bits, you get an on string and for every unset bit, you get an off string.
FIELD()Returns the index (position) of the specified string in a series of string values.
FIND_IN_SET()Returns the index (position) of the specified string in a string list.
FORMAT()Returns a number formatted to the specified number of decimal places.
HEX()Returns a hexadecimal representation of a decimal or string value.
INSERT()Inserts a substring into a string at the specified position up to the specified number of characters.
INSTR()Returns the index of the first occurrence of the substring.
LCASE()Synonym for LOWER().
LEFT()Returns the leftmost number of characters as specified.
LENGTH()Returns the length of a string in bytes.
LIKESimple pattern matching.
LOAD_FILE()Loads the named file.
LOCATE()Returns the position of the first occurrence of a substring.
LOWER()Returns the string argument in lowercase.
LPAD()Returns the string argument, left-padded with the specified string.
LTRIM()Removes leading spaces.
MAKE_SET()Returns a set of comma-separated strings that have the corresponding bit in bits set.
MATCHPerform a full-text search.
MID()Returns a substring starting from the specified position.
NOT LIKENegation of simple pattern matching.
NOT REGEXPNegation of REGEXP.
OCTET_LENGTH()A synonym for LENGTH().
ORD()Returns the character code for the leftmost character of the argument.
POSITION()A synonym for LOCATE().
QUOTE()Escapes the argument for use in an SQL statement.
REGEXPPattern matching using regular expressions.
REPEAT()Repeats a string the specified number of times.
REPLACE()Replaces the occurrences of a specified string.
REVERSE()Reverses the characters in a string.
RIGHT()Returns the specified rightmost number of characters.
RLIKESynonym for REGEXP.
RPAD()Appends a string the specified number of times.
RTRIM()Removes trailing spaces.
SOUNDEX()Returns a Soundex string.
SOUNDS LIKECompares strings for similar sounds.
SPACE()Returns a string of the specified number of spaces.
STRCMP()Compares two strings.
SUBSTR()Returns a substring from a string, as specified
SUBSTRING_INDEX()Returns a substring from a string before the specified number of occurrences of the delimiter.
SUBSTRING()Returns a substring from a string, as specified.
TRIM()Removes the leading and trailing spaces from a string.
UCASE()Synonym for UPPER().
UNHEX()Converts each pair of hexadecimal digits to a character.
UPPER()Converts a string to uppercase.

Table 2: Basic string functions in MySQL

In MySQL, there are regular expression operations that do not map easily to T-SQL. You might have to consider a CLR solution in extreme cases. MySQL follows the ANSI model in its expressions, which uses the POSIX conventions and has a SIMILAR TO predicate. Most of the time the good old LIKE operator with the T-SQL regular expression extensions will work for validating industry standard encoding schemes.

MySQL also borrows the Oracle GREATEST(<expression list>) member of the CASE family.

DML and Queries

MySQL supports the usual inner, outer, and full joins you have in T-SQL. But you also get the ANSI USING and NATURAL options in MySQL, instead of the usual T-SQL ON clause. Conversion is easy and should be done anyway.

The bad news is that you can add index hints as part of the JOIN statements in MySQL. Look for the keywords USE, FORCE and IGNORE in these statements. This requires knowing the underlying storage engine and not trusting the optimizer. Just drop the hints from the MySQL statements, test the statements in SQL Server, and then determine if you need the MySQL hints.

If the original MySQL programmer was also a C family programmer, you’ll likely see a mix of that family syntax in the MySQL, such as:

  • && used as a synonym for AND
  • || used as a synonym for OR
  • ! used as a synonym for NOT

You might also find the XOR operator in your MySQL code. The operator is used for the exclusive OR, which is not in the ANSI standard or in T-SQL, but it can be translated into NOT (NOT a AND NOT b).

MySQL also supports the <=> comparison operator—a version of the ANSI operator IS NOT DISTINCT FROM—which treats NULL values as equal. T-SQL has several ways to do this, and you can pick your favorite one, such as (COALESCE (x, <absurd value>) = COALESCE (y, <absurd value>)).

A major problem with the DML is its supports for function calls that do not map easily from one system to the next. This is an issue with MySQL in particular because it supports bit operations and C family shorthand. Programmers who work at a lower level might use them instead of higher-level constructs.

Bottom Line

Porting MySQL to T-SQL is mostly a matter of removing code that you do not need because the .NET and SQL Server environment handles these issues for you. The main problem is the wider range of functions in MySQL that are meant for all of the environments in which it is embedded.

If you are lucky, the original code will be good SQL. If you are not so lucky, there will be a lot of lower-level assembly language constructs that depend on hardware and not an abstract data model.

You can find the MySQL manuals online at http://dev.mysql.com/doc/refman/5.5/en/index.html.

This article is part of the parent stairway Stairway to SQL Dialects

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating