There are occasions when on the top of scripting the database tables, insert statements for data in these tables are also needed. Consider, for example, the following scenario. During the development phase of the project the database objects are designed and created in the development database. Some of the records inserted into tables are needed for initial deployment of the project to a QA environment. These could be configuration related records or default records for catalog tables. The deployment database script should therefore include not only the code for the creation of the objects, but also a number of insert statements for the data. Unfortunately SQL Server Management Studio does not provide the option to "include data" along with object definitions when scripting the database.
There are numerous readily available scripts which will generate insert statements when executed, but typically they suffer from the cursor illness. Cursors had their place back in the last millennium, but the time has come to get rid of them and start using set-based SQL instead. With introduction of new features in SQL Server 2005, such as ranking functions and CTEs, justification of the cursors usage had become even more difficult than before. There are excellent articles by R. Barry Young titled "There Must Be 15 Ways To Lose Your Cursors..." published by this site, please read them if you are not convinced.
Let's first consider the design of a typical query to generate insert statements from the specified table (we will come up with a different approach later in the article):
- Define a string (nvarchar) variable which will hold the statement defining the shape of insert.
- Query the object containing column-related information about the table. For example, information_schema.columns view can be queried to retrieve one record per column in the table, describing such column's ordinal position, data type, name, and length.
- Open cursor and start looping through the records retrieved by this query.
- For each step of the cursor loop add appropriate values from the cursor's record to the string variable. This means that there has to be a logic implemented via select case to figure whether to surround the data by the quotes depending on the column's type. Additionally, presence of the single quotes in the data needs to be handled.
- Once the variable is populated, execute it against the table to generate the insert statements.
The logic described above seems to be a little bit more complex than it deserves to be. Allow me to demonstrate it by a very simple example. Suppose we have a table named t with one int and one nvarchar column in it named c1 and c2 respectively. We will insert a couple of records in it and then check how the insert-generating statement will look:
create table t(c1 int not null primary key clustered, c2 varchar(50) not null); go insert into t(c1, c2) values (1, 'What''s the deal with magnets?'); insert into t(c1, c2) values (2, 'This is a trap, isn''t it?'); -- I will omit the cursor part, and will just spell out the final statement: set @sql = 'select ''insert into t(c1, c2) values (''' + ' + cast(c1 as nvarchar(10)) + '', '''''' + replace(c2, '''''''', '''''''''''') + '''''');'' from t;'; -- at this point we can execute this rather unattractive @sql: exec sp_executesql @sql; -- which will produce the following output: insert into t(c1, c2) values (1, 'What''s the deal with magnets?'); insert into t(c1, c2) values (2, 'This is a trap, isn''t it?');
The insert-generating method described above is clearly error prone. Let's come up with the different approach based on the simple fact that the database engine, just like anything else related to data storage, does not store the values as we see them in the end. It can care less about presence or absence of single quotes, unicode characters etc because it stores everything in zeroes and ones regardless of the data types. The hex representation of any value is therefore always available. Consider the following snippet:
use AdventureWorks; go set nocount on; declare @t table (col1 int not null, col2 varchar(30)); insert into @t (col1, col2) values ( 0x00000001, 0x57686174277320746865206465616c2077697468206d61676e6574733f ); set nocount off; select * from @t; go
Here is the result:
col1 col2 ----------- ----------------------------- 1 What's the deal with magnets? (1 row(s) affected)
Exploiting the fact that insert statements can specify exact binary values rather than spell them out in the "convenient for people" format, the outline for generating the insert statement routine is as follows:
- Declare nvarchar(max) variables to hold the parts of the insert-denerating statement.
- Populate the variables by the means of a single select statement querying the information_schema.columns view. When selecting, cast values to varbinary and use the not excessively documented, but nevertheless available function named master.dbo.fn_varbintohexstr to translate the binary values to their respective hex string representation.
- Execute resulting script to select from the specified table, which will generate insert records.
The advantage of this method is two-fold:
- There is no need to use a cursor.
- There is no need to worry about data formatting and single quotes handling.
Here is the script implemented as a stored procedure. The script is not fast, it takes almost an entire second to generate 2679 insert statements on my box with Windows XP SP3 2 GB of RAM and Intel E6750 @ 2.66 GHz CPU, but it works well.
use AdventureWorks; go if exists ( select * from sys.objects where [object_id] = object_id(N'[dbo].[usp_generate_inserts]') and type in (N'P', N'PC') ) drop proc [dbo].[usp_generate_inserts]; go create proc dbo.usp_generate_inserts ( @table nvarchar(255) ) as begin set nocount on declare @is_identity bit; declare @columns nvarchar(max); declare @values nvarchar(max); declare @script nvarchar(max); if isnull(charindex('.', @table), 0) = 0 begin print 'Procedure dbo.usp_generate_inserts expects a table_name parameter in the form of schema_name.table_name'; end else begin -- initialize variables as otherwise the padding will fail (return nulls for nvarchar(max) types) set @is_identity = 0; set @columns = ''; set @values = ''; set @script = ''; /* The following select makes an assumption that the identity column should be included in the insert statements. Such inserts still work when coupled with identity_insert toggle, which is typically used when there is a need to "plug the holes" in the identity values. Please note the special handling of the text data type. The type should never be present in SQL Server 2005 tables because it will not be supported in future versions, but there are unfortunately plenty of tables with text columns out there, patiently waiting for someone to upgrade them to varchar(max). */ select @is_identity = @is_identity | columnproperty(object_id(@table), column_name, 'IsIdentity'), @columns = @columns + ', ' + '['+ column_name + ']', @values = @values + ' + '', '' + isnull(master.dbo.fn_varbintohexstr(cast(' + case data_type when 'text' then 'cast([' + column_name + '] as varchar(max))' else '[' + column_name + ']' end + ' as varbinary(max))), ''null'')' from information_schema.columns where table_name = substring(@table, charindex('.', @table) + 1, len(@table)) and data_type != 'timestamp' order by ordinal_position; set @script = 'select ''insert into ' + @table + ' (' + substring(@columns, 3, len(@columns)) + ') values ('' + ' + substring(@values, 11, len(@values)) + ' + '');'' from ' + @table + ';'; if @is_identity = 1 print ('set identity_insert ' + @table + ' on'); /* generate insert statements. If the results to text option is set and the query results are completely fit then the prints are a part of the batch, but if the results to grid is set then the prints (identity insert related) can be gathered from the messages window. */ exec sp_executesql @script; if @is_identity = 1 print ('set identity_insert ' + @table + ' off'); end set nocount off end go -- test the proc exec dbo.usp_generate_inserts 'Production.BillOfMaterials' /* Here is the paste from few of the 2679 returned records: insert into Production.BillOfMaterials ([BillOfMaterialsID], [ProductAssemblyID] /* abridged */) values (0x0000037d, null, /* abridged */); insert into Production.BillOfMaterials ([BillOfMaterialsID], [ProductAssemblyID], /* abridged */) values (0x0000010f, null,/* abridged */); insert into Production.BillOfMaterials ([BillOfMaterialsID], [ProductAssemblyID], /* abridged */) values (0x00000022, null,/* abridged */); insert into Production.BillOfMaterials ([BillOfMaterialsID], [ProductAssemblyID], /* abridged */) values (0x0000033e, null,/* abridged */); insert into Production.BillOfMaterials ([BillOfMaterialsID], [ProductAssemblyID], /* abridged */) values (0x0000081a, null,/* abridged */); insert into Production.BillOfMaterials ([BillOfMaterialsID], [ProductAssemblyID], /* abridged */) values (0x0000079e, null,/* abridged */); */
I hope that someone will find this method useful. It works as written in SQL Server 2005, and it will also work in 2000 version with some restrictions related to varchar size limitations - there is no equivalent of varchar(max), meaning that there is no clean way to declare a variable of varchar type greater than 8,000 characters in length in the SQL Server 2000.
About the author
Oleg Netchaev is a principal consultant with Sogeti USA LLC in Houston, TX