BULK insert several thousand files

  • MVDBA (Mike Vessey)

    SSC-Insane

    Points: 21534

    I have a bunch of logging data that my sysadmin has downloaded from our elastic load balancers at amazon and I need to stick it in a completely bland table for a one off sales pitch.

    I've scripted up all of the unzip, copy, etc etc…  but I've become so reliant on SSIS that I've lost my data import skills.

    I have text files that are space separated with standard CR/LF line endings and are text quoted using "

    I know how to list the files, so the idea is either a batch file that imports (bad - if i get kicked off the workstation/server) or a cursor (also yukky) that loops through my list of files.

    I'm hopelessly lost in the import process though - and the format file documentation is driving me crazy

    I got to the point of looking at openrowset

    SELECT X.* FROM OPENROWSET(BULK N'\\xxxserverxxx\csv\215319143659_elasticloadbalancing_eu-west-1_app.Sign3-Liv-Disp01-Pub-ALB.de861ec4fa295ccb_20200113T0000Z_52.16.148.201_3qh6f4c2.log', single_CLOB) AS X

    which is giving me  a single row, single column spit out of the file

    or alternatively

    BULK INSERT importdata.dbo.import FROM '\\xxxserverxxx\csv\215319143659_elasticloadbalancing_eu-west-1_app.Sign3-Liv-Disp01-Pub-ALB.de861ec4fa295ccb_20200113T0000Z_52.16.148.201_3qh6f4c2.log'

    WITH (FIELDTERMINATOR=' ')

    which is giving me

    The bulk load failed. The column is too long in the data file for row 1, column 27. Verify that the field terminator and row terminator are specified correctly.

    I don't have a field27,  (field 26 is my max column) - but I can't see how to set the row terminator (if that is the problem)

    can anyone help me with either a format file and a command or another way of doing this...…?

    MVDBA

  • frederico_fonseca

    SSChampion

    Points: 14393

    are all files the same format or different formats?

    and when you say they are delimited (space in this case) with double quotes as field delimiter - is that really correct and are any inner double quotes escaped?

    if answer to both is yes and if the files are all on the same folder then using SSIS with a multipleflat file connection will ingest all files without the need for you to create a loop and will be a lot faster than the loop

     

  • MVDBA (Mike Vessey)

    SSC-Insane

    Points: 21534

    same format - it's from AWS logging system, but we aren't exactly sure of the field lengths - some of them are quite long and we've had to set those fields to varchar(8000) - there is every chance we will get to file 200 and then need to modify the length of a field

    they are space delimeted with any text field qualified by " (only if it is liable to contain a space in the string)

    example row (redacted slightly for security)

    https 2020-01-09T12:20:00.904689Z app/SignX-Liv-Disp01-Pub-ALB/de861ec4fa295ccb 213.146.135.190:51471 172.31.10.86:80 0.000 0.078 0.000 200 200 1275 1421 "PUT https://displayX.something-signage.com:443/_s3/Scanner3/SaveToLog HTTP/1.1" "-" ECDHE-RSA-AES128-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-west-1:215319143659:targetgroup/SignX-Liv-Disp01-HTTP-TG/3c412e7972c45aca "Root=1-5e171a70-2e069761f5f9038eb528bf96" "displayX.something.com" "session-reused" 0 2020-01-09T12:20:00.826000Z "forward" "-" "-" "172.31.10.86:80" "200"

    I'm a little cautious on using SSIS for this as I've found it much slower than BCP or bulk insert

    MVDBA

  • anthony.green

    SSC Guru

    Points: 112378

    I get 27 elements of data from that particular snippet Mike.

    Based on that log snippet, I believe the below should work

    CREATE TABLE AWSImport
    (COL1 NVARCHAR(MAX),
    COL2 NVARCHAR(MAX),
    COL3 NVARCHAR(MAX),
    COL4 NVARCHAR(MAX),
    COL5 NVARCHAR(MAX),
    COL6 NVARCHAR(MAX),
    COL7 NVARCHAR(MAX),
    COL8 NVARCHAR(MAX),
    COL9 NVARCHAR(MAX),
    COL10 NVARCHAR(MAX),
    COL11 NVARCHAR(MAX),
    COL12 NVARCHAR(MAX),
    COL13 NVARCHAR(MAX),
    COL14 NVARCHAR(MAX),
    COL15 NVARCHAR(MAX),
    COL16 NVARCHAR(MAX),
    COL17 NVARCHAR(MAX),
    COL18 NVARCHAR(MAX),
    COL19 NVARCHAR(MAX),
    COL20 NVARCHAR(MAX),
    COL21 NVARCHAR(MAX),
    COL22 NVARCHAR(MAX),
    COL23 NVARCHAR(MAX),
    COL24 NVARCHAR(MAX),
    COL25 NVARCHAR(MAX),
    COL26 NVARCHAR(MAX),
    COL27 NVARCHAR(MAX))

    BULK INSERT AWSImport FROM 'C:\Temp\Mike.log'
    WITH(FORMATFILE='C:\Temp\Mike.xml')

    select * from AWSImport

    delete from awsimport
    Attachments:
    You must be logged in to view attached files.
  • MVDBA (Mike Vessey)

    SSC-Insane

    Points: 21534

    which means either I screwed up on the 1st file import, or my files are not consistant…. which would be painfull

    but that is a great spot

    the col26 issue was because I used the data import wizard to build me an SSIS package.

    i think AWS have forgotten to put " identifiers on a field

     

    MVDBA

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply