Migrating Schema Between Platforms

  • Comments posted to this topic are about the item Migrating Schema Between Platforms

  • I use AWS RedShift on a daily basis which is based around PostGres 8.  I like the reduced DBA headaches but there is nothing like using another DB to make you appreciate SQL Server.  My key bug bears are:-

    • Lack of any form of control flow language.  No variables IF..ELSE, WHILE
    • No recursive CTEs
    • A number of functions require literals rather than variables so somethings that should be set based cannot be set based
    • Constraints are metadata only.  You can violate them to your hearts discontent.  You have to roll your own constraint checker.
    • Security appears to be more primitive and less flexible.  I'd really like certain functions NOT to require superuser access.
    • LIKE does not do pattern matching, only wild card matching.

    The plus sides are that Amazon has done a cracking job on producing documentation for it.  As mentioned in previous editorials the open-source software that does well tends to have good documentation.

    MySQL I have a love/hate relationship with.  

    • It tries to be helpful when it should say NO.  i.e. coercing a value from dodgy data
    • INSERT INTO is the syntax used to populate variables!
    • No PATINDEX equivalent
    • Tally table queries involving a number of joins are so slow as to be infeasible
    • Back up and Restore.  Very much a WTF.

    Beyond all these considerations are the peripheral technologies that you have to find if you move away from the Microsoft stack.  What do you use instead of SSIS, SSAS, SSRS and how do you integrate them into your stack.  How do you access them securely and can you still use Active Directory?

    If you are using a SQL Server Replication then you might find Master/Slave simpler than Publisher-Distributor-Subscriber.  Personally I prefer the latter as it gives a great deal of useful flexibility.

    You may wish to consider other 3rd party ODBC/JDBC drivers.  These may incur additional costs that you weren't planning on.

    As a general statement I am happy to use what ever data technology an organisation has chosen but I would advise very strongly against changing that technology without an extremely good reason.  Its like building a filling a septic tank and then deciding that you want to move it 50 feet to the left without spilling anything.

  • It seems to me that there is a trend to have most, if not all, of the business logic layer in code (Java, .NET, etc.) and the database is used only as a data repository.  In those situations changing from one database to another is fairly trivial and works since few set-based operations are performed.  While this fits with many web-oriented applications, this certainly does not fit well with heavy data volume manipulation where database performance is paramount.  In these situations, moving from one database platform to another is certainly non-trivial.  I was involved in such ports previously (SQL Server, Oracle, Netezza, Hadoop, etc.) and I can testify that such ports are difficult.  In those situations, the tendency is to minimize the queries to the lowest common format supported with the result that the queries perform equally poorly on all platforms!

    While development managers prefer to have the same code regardless of platform for ease of maintenance, I believe that is a mistake.  Stored procedures written for a particular platform should take advantage of the strengths of that platform and avoid the weaknesses.  While similar logic can and should be maintained, those strengths and weaknesses will be different for each platform so therefore the code will be different. As Steve mentioned, each database platform has advantages over the others and homogenizing code will only cripple your application and therefore your end result.

  • The slight differences in SQL syntax between database platforms are mostly trivial. However, when porting SQL code between PL/SQL and T-SQL, it is essential to remember that in the parallel Oracle universe, an empty string '' evaluates as NULL. Ponder for a moment the full potential of what that means. For example, in T-SQL we often times pad not-nullable columns with empty strings or assume that 'John' + '' + 'Doe' will equal 'John Doe'. That assumption produces different results in Oracle.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • There's a nasty gotcha in MySQL - UPDATE uses the *current* value of columns so that

    UPDATE x SET c1 = c1 + 1,
         c2 = c1;

    gives different results to

    UPDATE x SET c2 = c1,
         c1 = c1 + 1;

    https://dev.mysql.com/doc/refman/5.7/en/update.html

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Aaron N. Cutshall - Tuesday, June 20, 2017 6:06 AM

    It seems to me that there is a trend to have most, if not all, of the business logic layer in code (Java, .NET, etc.) and the database is used only as a data repository.  In those situations changing from one database to another is fairly trivial and works since few set-based operations are performed.  While this fits with many web-oriented applications, this certainly does not fit well with heavy data volume manipulation where database performance is paramount.  In these situations, moving from one database platform to another is certainly non-trivial.  I was involved in such ports previously (SQL Server, Oracle, Netezza, Hadoop, etc.) and I can testify that such ports are difficult.  In those situations, the tendency is to minimize the queries to the lowest common format supported with the result that the queries perform equally poorly on all platforms!

    While development managers prefer to have the same code regardless of platform for ease of maintenance, I believe that is a mistake.  Stored procedures written for a particular platform should take advantage of the strengths of that platform and avoid the weaknesses.  While similar logic can and should be maintained, those strengths and weaknesses will be different for each platform so therefore the code will be different. As Steve mentioned, each database platform has advantages over the others and homogenizing code will only cripple your application and therefore your end result.

    I would agree with you on this.  It's like having a Bugatti and deliberately taking out 12 spark plugs because most cars only have 4.
    This ends up being a race to the bottom and can destroy a market.  I am not sure that the financial justification bears scrutiny.  MySQL and PostGres are available, SQL Server developer edition is free, Teradata supplies a VM.  Not sure what Oracle does.  If all a development shop did was learn to read the execution plans for each platform it would be a massive step forward over what is done today.  Nirvana is testing under peak load but that would require the more expensive enterprise editions for the proprietary systems.  Even then I'm sure that VARs get huge discounts

  • My first professional job was working for a manufacturer. We didn't even have a database. Instead there were hundreds of thousands of flat files all over the place, grouped by application. I remember, at the time, hearing about this mysterious thing called a "SEQUEL Database". I remember being afraid of it. There was talk of possibly going to use some sort of SQL database, although I don't know what. I left there long before they ever got around to actually migrating to any RDBMS. Doesn't matter now, the place closed.

    Kindest Regards, Rod Connect with me on LinkedIn.

  • I have supported multiple database platforms, MSSQL, MySQL, Postgres, Oracle, Sybase etc. for years. Usually applications used only one database platform, but I have supported one application that had to be able to use MSSQL, Oracle, and Sybase. Creating the DDL to be cross platform requires a little thought but is the easiest challenge to overcome. Stored procedures are another problem altogether. Each platform will require its own set of procedures because of platform variations in how they are created and executed as well as platform specific functions. What I have found to be the biggest and least discussed issue is variation in how the platform operates. Things like backups, security, management of database files, replication, and disaster recovery are usually different because of the differing history and system architecture. This is particularly true of the differences between open source platforms and proprietary platforms. The latter generally have a better integrated solution but the open source platforms tend to be more ad hoc and dependent on third parties for a complete solution.

    I recently worked in a mixed MSSQL and Postgres environment. Postgres requires defining the IP address ranges that are allowed to connect to a server. This can be as restricted or wide open based needs. MSSQL doesn't support this. The backup architecture for Postgres is completely different than MSSQL. You can dump an individual database in Postgres. It creates a text file of SQL commands to recreate the database, objects, and data. It is a logical, point in time backup. Postgres supports continuous archiving using write ahead logs, similar to transaction logs. These write ahead logs are for the entire Postgres server, not a single database. You cannot recover a corrupt database from the write ahead logs, only the entire server. These are only a couple of examples.

    I recommend MSSQL to those who prefer a proprietary and fully supported platform. For those wishing to go the open source route I recommend Postgres. Postgres has a lot to recommend it, performance, extensible, geo spatial support. A Postgres fork is often used as the basis of another product; Netezza and Redshift are examples. Because Postgres is extensible, libraries exist to support the migration of functions and procedures between platforms. I know one exists that allows running T-SQL code on a Postgres server. I have not used it so I can't say how good or bad it is.

  • Back in the early 90’s I had the opportunity to convert Paradox to dBase IV.
    Then a couple of years later the opportunity to convert several dBase IV projects to MS-Access\SQL
    At the time, it was fun, I was young and full of vim and vigor [sic].
    But databases and codebases were nothing like they are today.
    I wish all those involved, good luck and may the bosses bear with you.

  • Aaron N. Cutshall - Tuesday, June 20, 2017 6:06 AM

    It seems to me that there is a trend to have most, if not all, of the business logic layer in code (Java, .NET, etc.) and the database is used only as a data repository.  In those situations changing from one database to another is fairly trivial and works since few set-based operations are performed.  While this fits with many web-oriented applications, this certainly does not fit well with heavy data volume manipulation where database performance is paramount.  In these situations, moving from one database platform to another is certainly non-trivial.  I was involved in such ports previously (SQL Server, Oracle, Netezza, Hadoop, etc.) and I can testify that such ports are difficult.  In those situations, the tendency is to minimize the queries to the lowest common format supported with the result that the queries perform equally poorly on all platforms!

    While development managers prefer to have the same code regardless of platform for ease of maintenance, I believe that is a mistake.  Stored procedures written for a particular platform should take advantage of the strengths of that platform and avoid the weaknesses.  While similar logic can and should be maintained, those strengths and weaknesses will be different for each platform so therefore the code will be different. As Steve mentioned, each database platform has advantages over the others and homogenizing code will only cripple your application and therefore your end result.

    Abso-frikkin-lutely it's a mistake.  Aaron, I couldn't agree more.  I take full advantage of whatever strengths the platform has.  For raw database horsepower, the secret to performance is, many times, in the code.  I think it would be crazy to embrace the myth of portability so applications perform equally badly.  After all, I believe the myth of portability is just that - a myth.

  • I've ported database schemas (tables, stored procedures, and SQL) between RDMS platforms on a handful of occasions, and there is always the urge to refactor as I'm porting. One thing I've noticed that is Oracle developers tend to use the NUMERIC datatype in situations where INT would be more appropriate, and they are also much more inclined to code cursors. Perhaps cursors are more efficient in Oracle than in SQL Server, and I've heard that assertion made, but even if all other things are equal, set based SQL programming is superior to cursors just from a readability and maintainability perspective.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell - Wednesday, June 21, 2017 8:28 AM

    I've ported database schemas (tables, stored procedures, and SQL) between RDMS platforms on a handful of occasions, and there is always the urge to refactor as I'm porting. One thing I've noticed that is Oracle developers tend to use the NUMERIC datatype in situations where INT would be more appropriate, and they are also much more inclined to code cursors. Perhaps cursors are more efficient in Oracle than in SQL Server, and I've heard that assertion made, but even if all other things are equal, set based SQL programming is superior to cursors just from a readability and maintainability perspective.

    Two things, Eric. I'm not familiar with the NUMERIC data type. That's too generic of a name for me. Is it an integer data type? A float? A decimal? All three?

    Second thing, I know what you mean about Oracle developers favoring cursors. There were a few times at my old job, where we had to bring in some consultants to help with our work. One of those times we brought in some students from the main IT department on campus (I was working for a university at the time). They needed to help us write the stored procs and web front end for an application we just didn't have the time to get to, but needed to be done ASAP. I'll give them kudos, they got the job done and on time. But when we looked at the code they left us, we were afraid to touch it. They used so many cursors in their stored procs that it was very hard to figure out what was going on, when. The HTML was just barebones, because the real generation of HTML was occurring in the stored procs. Fortunate for us, their code was solid and ran the couple of years we needed it to run. We were glad that none of us had to modify it. We didn't know why they used cursors. In fact, at the time I don't think I even knew what a cursor was, so I was completely lost. I picked up enough of an understanding of cursors so I could grasp some of what they had done. It wasn't until years later that I also heard that Oracle developers prefer to use cursors. I'm guessing, like you said, that Oracle is just more efficient if you design your queries to use cursors. Well, more power to Oracle developers, for using the strength of their platform. But, just now thinking about it, its said that those student IT developers didn't understand enough of SQL Server (what we were using) to know that programming our SQL database as though it were an Oracle database, wasn't an efficient way to do the job. Probably the queries they wrote, although they did the job, didn't do it as efficiently as it could have been done.

    Kindest Regards, Rod Connect with me on LinkedIn.

  • Rod at work - Wednesday, June 21, 2017 8:46 AM

    Eric M Russell - Wednesday, June 21, 2017 8:28 AM

    I've ported database schemas (tables, stored procedures, and SQL) between RDMS platforms on a handful of occasions, and there is always the urge to refactor as I'm porting. One thing I've noticed that is Oracle developers tend to use the NUMERIC datatype in situations where INT would be more appropriate, and they are also much more inclined to code cursors. Perhaps cursors are more efficient in Oracle than in SQL Server, and I've heard that assertion made, but even if all other things are equal, set based SQL programming is superior to cursors just from a readability and maintainability perspective.

    Two things, Eric. I'm not familiar with the NUMERIC data type. That's too generic of a name for me. Is it an integer data type? A float? A decimal? All three?

    ...

    NUMERIC can be functionally equivalent to integer or decimal depending on it's user defined scale and precision. For example: NUMERIC(6,2)  can store the decimal value (1255.32). If what you really need to do is store decimal values, then maybe it's the best choice. However, in terms of physical storage and indexing, a NUMERIC will consume at least 5 bytes regardless of it's logical scale, which makes it a poor choice for columns that will contain only a discrete range of small integer values. For example NUMERIC(1,0) is stored as 5 bytes while TINYINT is 1 byte.

    Below illustrates the storage in bytes for the value (1) when cast as various data types. So, in terms of physical data modeling, something like GENDER_CODE TINYINT makes more sense than GENDER_CODE NUMERIC(1,0).

    PRINT DATALENGTH( CAST(1 AS TINYINT) );
    PRINT DATALENGTH( CAST(1 AS SMALLINT) );
    PRINT DATALENGTH( CAST(1 AS INT) );
    PRINT DATALENGTH( CAST(1 AS NUMERIC(1,0)) );
    1
    2
    4
    5

    Data modeling tools have a tendency to spit out DDL with columns defined as NUMERIC(x,y), ostensibly because the code is more portable. But again, perhaps in Oracle the physical storage of NUMERIC data type is more efficient than in SQL Server, but I've never looked into it. I only know that NUMERIC should not be the default choice of data type for numeric columns in SQL Server.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I've worked with PostgreSQL, MS SQL and MySQL.  My experience with MySQL ended 3 or 4 years ago; at that time I did convert a couple smallish databases over to PostgreSQL.  I've also converted one MS SQL database to PostgreSQL.  My experience has been that if you know both the original and target systems, the conversion isn't too terrible, though the larger the code base becomes the less true that probably is.

    As it's already been said by Aaron, converting business logic stored in the database is probably the most cumbersome aspect.  MS SQL has stored procedures and functions, PostgreSQL only has functions.  I've never experienced a problem solved in one RDMS that couldn't be solved in another, though the actual method might differ.

    I agree with the sentiment that I'd rather write better code for the current platform to optimize today, rather than worry about possibly converting to another RDMS at some unknown point in the future (maybe).  When I work in MS SQL I do my best to optimize for that platform, and when I'm in PostgreSQL I optimize for it as well.

  • I wish I could find the blog post that addresses business logic in databases.
    The gist of the article was that "Business Logic" was too broad a term to have any clear meaning and needed to be split.
    I seem to remember the author argued that it needed to be split into set based business logic (use the DB) and algorithmic business logic (use the app).  There has to be a cohesive team building your business logic layer because the two things have to work together seamlessly.
    To my mind this would satisfy the argument as to what goes where and give portability.
    Modern IT seems to consist of overcomplicating things and taking absolutist positions.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply