I had been importing all my TAB delimited text data as VARCHAR using the SQL_Latin1_General_CP1_CI_AS Column Collation into extract tables, and then implicitly casting the INT data types when inserting into base tables. A business requirement made me revisit how to import the data into the extract tables with the INT data type. Using the BCP program to generate the FORMAT file produced unsatisfactory results for those fields identified in the Host file as SQLINT. The integers were not importing correctly to the extract tables in SQL Server.
I researched this issue online without finding a solution. It was a single sentence, and an unexplained example in Books-OnLine (BOL) that provided the key to understanding the real problem and producing the solution.
In BOL, on the “File Storage Type” page, the different data types and their Host File Data Types are described. In the middle of the page is this sentence: “When noncharacter data types (for example, float, money, datetime, or int) are stored as their database types, the data is written to the data file in the SQL Server internal binary data format.” This indicated to me that integers being imported as SQLINT were being translated to a binary data format.
In BOL, on the “Using Format Files” page, I found the following example:
I identified the field pub_id as an integer data type, so I used the first line as an example for my Format file. I noticed that the Column Collation was SQL_Latin1_General_Cp437_Bin. Searching the MSDN library I found the explanation of this collation type. The first part defines a string identifying the alphabet or language whose sorting rules are applied when dictionary sorting is specified, ie. Latin1_General or Polish. The second part is a one- to four-digit number that identifies the code page used by the collation. The suffix identifies Case sensitivity (CI, CS), Accent sensitivity (AI, AS), and the binary sort order to be used.
By changing the data type in the Format file to SQLCHAR 4 with a column collation of SQL_Latin1_General_Cp437_Bin, the integer data was translated correctly when importing using either BCP or BULKINSERT.