getting started with BCP

  • Hi,

     

     I'm getting started with BCP because I'm trying to import data from a text file into SQL server.

     

     I have an OReily book on how to do this so I'm plugging thru it but I have one pesky little problem.  When I try to use BCP with the standard install it complains that it cannot access the database, user login failure.  I'm using a canned example, user ID is sa, password is blank which is how it appears to be in the user manager.  I just want some simple canned examples to work so I can get started.

     

    -John C.

  • A simple example that works? Why not try Books Online (BOL), which you should have access thru the help menu of QA.

    Type 'bcp utility' and select 'copying data from data file to SQL Server' from the sub-items listed. That should get you started.

  • If you're importing data into SQL Server you should look into using BULK INSERT instead of BCP as it is quicker.

     

    --------------------
    Colt 45 - the original point and click interface

  • Well, it's not a black or white issue, bcp has it's good apsects, as does bulk insert - they are two ways of quickly loading files. Which to use is much up to personal taste, and sometimes one or the other may be preferrable due to other circumstance around the particular load.

    But I agree with earlier poster, to get going, start by opening BOL. It's always the first place to go.

    /Kenneth

  • This is a cookie cutter example from O'Reilly's "Transact-SQL Programming" book.

     

    bcp pubs..authors out authors.txt -Usa -P -Slocalhost -c

     

    Its supposed to copy the contents of the authors table from the pubs database to a text file, authors.txt.  This is the error I get:

    SQLState = 28000, NativeError = 18452

    Error = [Microsoft][ODBC SQL Server Driver][SQL Server]Login failed for user 'sa

    '. Reason: Not associated with a trusted SQL Server connection.

     

    I'm using the default cookie-cutter install of Visual Studio.NET 2003 with the default cookie-cutter pubs example database on a Windows XP box with full admin rights.  I've not changed the user "sa" from default, I've looked on the O'Reilly web site for addendum for this example, and I've also tried to create another user, with admin rights, to do the same thing, but apparenly user admin in SQL server is a major subject all by itself because I got a ton of warnings when I tried to do that.

     

    Can somebody please tell me how to get this basic, starter example to work, WITHOUT telling me to read the fine manual cause I'm already doing that.

     

    -John C.

  • John:

    Make sure that your server is in mixed mode.  If it is set up to only go with Windows authentication you'll receive this error.

    Optionally, you can just replace the -U and -P switches with -T if you're using a valid AD account.

  • Ah, I C said the programmer.  😉

     

    So what you're telling me is that unlike other DBs like mySQL, and Access, which have their own seperate user security system totally seperate from Windows Active Directory, if I "link" into active directory accounts I dont have to manage a seperate list of users, just grant existing users appropriate permissions within a new resource (SQL Server).

     

    Your solution does work, thanks much.

     

    -John C.

  • Yes, that is correct.

    In SQL Server, you can have either separate users/logins that is exclusive to SQL Server, or you can delegate that to Windows 'regular' accounts. This is two separate security models, and you can use either, or both.

    (small caveat is that you also must ensure that the server is configured to deal with both)

    /Kenneth

  • WOW!  Another myth destroyed.  I was always led to believe that BCP was always faster than BULK INSERT.  Phil's post

    prompted me to do some long overdue testing on that fact.  So, first, I have to say "thanks" Phil.

    Phil is correct... BULK INSERT is, in fact, faster than BCP...

    but there's a catch...

    The reason why BULK INSERT is a bit faster than BCP is because BULK INSERT appears to be

    deprecated compared to BCP and will fail on things that BCP will easily fix.

    For example, if the data in the import file looks like this...

    1,21604,CE,A column for kicks,Still another column just for proofing,17.3081,2005-11-19 13:35:24.160
    2,3105,RU,A column for kicks,Still another column just for proofing,20.7074,2008-11-10 13:41:09.060
    3,45326,VU,A column for kicks,Still another column just for proofing,31.6382,2009-09-01 02:19:54.603
    4,15983,WU,A column for kicks,Still another column just for proofing,11.0461,2009-02-05 18:40:54.493
    5,41819,CK,A column for kicks,Still another column just for proofing,21.6887,2007-01-02 20:30:18.087

    ...both BCP and BULK INSERT work fine and BULK INSERT will win the race. 

    However, if your data looks like this (don't laugh too hard... people have actually sent me junk  like this)...

              1,      21604,CE       ,A column for kicks,Still another column just for proofing,              17.3081,2005-11-19 13:35:24.160
              2,       3105,RU       ,A column for kicks,Still another column just for proofing,              20.7074,2008-11-10 13:41:09.060
              3,      45326,VU       ,A column for kicks,Still another column just for proofing,              31.6382,2009-09-01 02:19:54.603
              4,      15983,WU       ,A column for kicks,Still another column just for proofing,              11.0461,2009-02-05 18:40:54.493
              5,      41819,CK       ,A column for kicks,Still another column just for proofing,              21.6887,2007-01-02 20:30:18.087

    ...BCP easily handles it but BULK INSERT may fail because BULK INSERT tries to preserve leading

    and trailing spaces where BCP does not.  If the target table has columns that don't expect this

    (usually a non-numeric column, the 3rd column in this case), BULK INSERT crashes and burns with

    a "Truncation" error.

    Yes, I know that you could easily make a format file to fix the problems in the second example so

    that Bulk Insert would work correctly... just doesn't seem right to have to do that for what should

    be a simple CSV file import.

    Also, when importing only 10,000 rows or so, it looks like BCP is twice as slow as BULK INSERT and

    that's pretty much true.  But, over a million rows, BCP may take only 4-7 seconds longer than the

    equivelent BULK INSERT taking 20-25 seconds.  Yes, BULK INSERT is still faster but only if it works. 

    BCP is a bit slower because it tolerates more formatting problems than BULK INSERT.

    The bottom line is, Phil is correct... BULK INSERT is faster than BCP... but only if there are no

    unexpected formatting anomolies.  AND, we all know that people that provide data in a CSV format

    never ever make a "mistrake"...

    Just to help John and any other "newbies" get started and to demo what I spoke of above,

    here's the test code I used... it's all done in SQL just so the code can travel as a single

    piece.  You will need to replace the server name, the database name, and the drive/pathname

    in the code that follows.

    Usually, this type of thing is NOT done in an SQL script because the use of xp_CmdShell

    is usually restricted by DBA's because of the security risks.  And, as Ken suggested, one

    way may be better than the other simply because of circumstances and the condition

    of the data being imported.

    --=================================================================================================
    --      Create and populate a million row test table
    --=================================================================================================
    --===== If the test table exists, drop it
         IF OBJECT_ID('dbo.jmCSVTest') IS NOT NULL
            DROP TABLE dbo.jmCSVTest
     SELECT TOP 10000 --Change this to a million and see what happens to the times.
            IDENTITY(INT,1,1) AS RowNum,
            CAST(RAND(CAST(NEWID() AS VARBINARY))*50000+1 AS INT) AS UserID,
            CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))
          + CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65)) AS SomeValue,
            'A column for kicks' AS Kicks,
            'Still another column just for proofing' AS StillAnother,
            CAST(RAND(CAST(NEWID() AS VARBINARY))*100 AS MONEY) AS SomeNumber,
            CAST(RAND(CAST(NEWID() AS VARBINARY))*3653.0+36524.0 AS DATETIME) AS ADate
       INTO dbo.jmCSVTest
       FROM Master.dbo.SysColumns sc1,
            Master.dbo.SysColumns sc2
    --=================================================================================================
    --      Declare local variables and other presets
    --=================================================================================================
    --===== Declare a variable to hold our DOS commands
    DECLARE @CMD VARCHAR(8000)
    --===== Declare some timer variables
    DECLARE @StartTime DATETIME
    DECLARE @EndTime DATETIME
    --===== Suppress the auto-display of rowcounts for appearance
        SET NOCOUNT ON
    --=================================================================================================
    --      Create a CSV file from the million row table using a trusted connection.
    --      You will need to fill in the servername\instance, dbname, and pathname
    --      NOTE:  ONLY HAVE ONE OF THE TWO FOLLOWING SECTIONS UNCOMMENTED AT A TIME!!!!
    --=================================================================================================
    --===== Uncomment this section to make a file with leading/trailing spaces depending on datatype
         -- BCP can handle this format but BULK INSERT cannot because it is deprecated compared to BCP.
    --    SET @CMD =        'OSQL -SPutServerNameHere -E -dPutDbNameHere -s"," -w140 -h-1 '
    --    SET @CMD = @CMD + '-Q" SET NOCOUNT ON '
    --    SET @CMD = @CMD + 'SELECT RowNum,UserID,SomeValue,Kicks,StillAnother,SomeNumber,ADate '
    --    SET @CMD = @CMD + 'FROM dbo.jmCSVTest" -o"PutDriveAndPathHere\jmCSVTest.txt" '
    --  EXEC Master.dbo.xp_CmdShell @CMD, No_Output
    --===== Uncomment this section to make a file without leading/trailing spaces
         -- Both BCP and BULK INSERT can handle this format.
        SET @CMD = 'BCP "SELECT * FROM PutDbNameHere.dbo.jmCSVTest" QUERYOUT "PutDriveAndPathHere\jmCSVTest.txt" '
        SET @CMD = @CMD + '-c -t, -S"PutServerNameHere" -T'
       EXEC Master.dbo.xp_CmdShell @CMD, No_Output
    --=================================================================================================
    --      Measure the performance of BCP.
    --      Again, you will need to change the dbname, pathname, etc, to match yours.
    --=================================================================================================
    --===== First, truncate the target table
    TRUNCATE TABLE dbo.jmCSVTest
    --===== Prepare the BCP command
        SET @CMD =        'BCP "PutDbNameHere.dbo.jmCSVTest" IN "PutDriveAndPathHere\jmCSVTest.txt" '
        SET @CMD = @CMD + '-c -t, -r\n -S"PutServerNameHere" -T -E -b50000'
    --===== Run the command encased in timers
        SET @StartTime = GETDATE()
       EXEC Master.dbo.xp_CmdShell @CMD, No_OutPut
        SET @EndTime   = GETDATE()
    --===== Display the duration in milliseconds
      PRINT STR(DATEDIFF(ms,@StartTime,@EndTime)) + ' Milliseconds duration for BCP import.'
    --=================================================================================================
    --      Measure the performance of BULK INSERT.
    --      Again, you will need to change the dbname, pathname, etc, to match yours.
    --=================================================================================================
    --===== First, truncate the target table
    TRUNCATE TABLE dbo.jmCSVTest
    --===== Run the command encased in timers
        SET @StartTime = GETDATE()
       BULK INSERT PutDbNameHere.dbo.jmCSVTest 
       FROM 'PutDriveAndPathHere\jmCSVTest.txt'
       WITH (
            DATAFILETYPE = 'char',
            FIELDTERMINATOR = ',',
            ROWTERMINATOR = '\n',
            KEEPIDENTITY,
            BATCHSIZE = 50000
            )
        SET @EndTime   = GETDATE()
    --===== Display the duration in milliseconds
      PRINT STR(DATEDIFF(ms,@StartTime,@EndTime)) + ' Milliseconds duration for BULK INSERT import.'

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • From what I've been told, the reason that BULK INSERT is promoted as faster than BCP, is that BULK INSERT internally uses a more optimized codepath within the engine, than the codepath used when you use BCP.

    While this is true, I find myself still preferring BCP, probably because old habits die hard, and the actual difference in the end might not be that large anyway. (and also for the reasons that Jeff demonstrates)

    /Kenneth

  • Jeff, thanks its nice to know I'm right sometimes

    As for the example with the leading/trailing spaces, this would generally be dealt with using the standard practice of creating a staging table. That means that the field definistions would match the import file and BULK INSERT wins again

    From experience of importing files from what must be the same bodgy data providers  , I've learnt that it's best to define all staging fields as varchar. That way you can pretty much guarantee that the import will succeed. Once imported you can run a variety of data cleansing routine over the imported records, weed out any crud data and transfer the remaining records to their final destination.

     

    --------------------
    Colt 45 - the original point and click interface

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply