"validating that files can be imported"...

  • Sorry, but in the context of SQL Server, what does that mean/entail? Sounds like a For Each file loop in SSIS with a lot of filters to validate the individual files. (that way the "successful" files go to one directory and the failures go to another.

    or am I thinking about this wrong?

    Apologies for a serious lack of details, but I didn't receive any. Was just wondering how people would approach a question like this. Thanks!

  • can be lots of things

    • has file been processed - requires another table with list of files already processed (filename, time last written and similar)
    • do filenames match expected pattern
    • has the file all required fields - requires a table with expected number of columns
    • have the columns on the file the correct data types - requires a table with expected number of columns and their datatypes and requires processing the file onto a temp/staging table (or just plain c#) to determine if the input is correct according to expected
    • if file has a trailer/control record does the content match that (again requires preprocessing file)

    and no need for SSIS - powershell/c# would likely be easier to implement.

  • frederico_fonseca wrote:

    can be lots of things

    • has file been processed - requires another table with list of files already processed (filename, time last written and similar)
    • do filenames match expected pattern
    • has the file all required fields - requires a table with expected number of columns
    • have the columns on the file the correct data types - requires a table with expected number of columns and their datatypes and requires processing the file onto a temp/staging table (or just plain c#) to determine if the input is correct according to expected
    • if file has a trailer/control record does the content match that (again requires preprocessing file)

    and no need for SSIS - powershell/c# would likely be easier to implement.

    I do checks very much like this in SSIS, as part of a generic file-import framework – using C# in a Script Task.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Uh oh... so I should learn some basic C# or Powershell (or both).

    I guess it's time to look for a good validation tutorial maybe. Thanks!

  • pietlinden wrote:

    Uh oh... so I should learn some basic C# or Powershell (or both).

    I guess it's time to look for a good validation tutorial maybe. Thanks!

    Heh... Phil already knows what I'm going to say...

    Gosh, I've seen two posts recently on this subject and this is what I actually cut my SQL teeth on... imports and exports for the telephone business.  It also seriously helped me in the internet advertising and "spotlight pixel tracking" business with all the bloody files that looked like variable width, ever-changing, CSV exports of spreadsheets. That's make me a bit jealous of you and the other person for the tasks at hand because I thought it was a lot of fun and still do (except for bloody XML, EDI, and JSON).

    I don't mean to sound contrary but I don't even know how to spell C# or use Powershell (although I did look at it in the past and thought it was unnecessary) for this type of thing.  They're just different ways of doing what you already know how to do or can quickly learn to do.  Yeah... it can be done through SSIS but I've never needed to.

    I also sent you a response to your PM on this subject.

    You know how to read a Temp Table of file names using a While loop, right?  You know how to use BULK INSERT, right?  And even if some of the files are spreadsheets, you could quickly learn how to use the ACE drivers and learn how to write a little creative T-SQL to separate the data (even on ever-changing spreadsheets) from the headers and other stuff.

    Yes... you can use C# and Powershell in conjunction with SSIS but it's not necessary, IMHO.  BUT... I'll also say that if that's the route you choose (and it's not a bad route... it's just not for me), then Phil is definitely the right guy to help for this type of thing.

    Shifting gears a bit, you'll find that whatever you do to get the data into database tables, that's the easy part.  Getting people to accurately describe what they want do to and with the data is another story and I do mean "story"... as in never ending and pure fiction with a heavy dose of wishful thinking.  Taking the grey-colored-mass they call "requirements" and boiling it down to a colorful and appealing soup is the real challenge and you're already good at that.

    "Validation" is just making sure all the "gazintas" are in the right places and all have the proper ranges of data and the right declared referential integrity.  You've already good at that, as well.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Found and article that does most of what I want. Now to build it and practice... oh, and break it, just so I know what it looks like when it goes sideways.

    SQL Server Bulk Insert for Multiple CSV Files from a Single Folder (mssqltips.com)

    FWIW, this is what I came up with. Still not quite what I want, but it's a start:

    IF OBJECT_ID('TEMPDB..#TEMP_FILES') IS NOT NULL DROP TABLE #TEMP_FILES;

    CREATE TABLE #TEMP_FILES(
    FileName VARCHAR(MAX),
    DEPTH VARCHAR(MAX),
    [FILE] VARCHAR(MAX)
    );
    GO

    /* insert all the files in the directory into the table */
    INSERT INTO #TEMP_FILES
    EXEC master.dbo.xp_DirTree 'C:\Users\User\Documents\ImportMe\',1,1

    /* remove the unwanted files */
    DELETE FROM #TEMP_FILES WHERE RIGHT(FileName,4) != '.CSV';

    /* create result table and insert data */
    IF OBJECT_ID('TEMPDB..#TEMP_RESULTS') IS NOT NULL DROP TABLE #TEMP_RESULTS;

    CREATE TABLE #TEMP_RESULTS (
    [A] VARCHAR(MAX)
    );
    GO

    DECLARE @FileName VARCHAR(MAX),
    @SQL VARCHAR(MAX);

    WHILE EXISTS (SELECT * FROM #TEMP_FILES)
    BEGIN
    BEGIN TRY
    SET @FileName = (SELECT TOP 1 FileName FROM #TEMP_FILES)
    SET @SQL = 'BULK INSERT #TEMP_RESULTS
    FROM ''C:\Users\User\Documents\ImportMe\' + @FileName + '''
    WITH (FIRSTROW = 2, FIELDTERMINATOR='','', ROWTERMINATOR=''\n'');'

    PRINT @SQL
    EXEC(@SQL)
    END TRY
    BEGIN CATCH
    PRINT 'Failed processing: ' + @FileName;
    END CATCH

    DELETE FROM #TEMP_FILES WHERE FileName = @FileName;

    END

    (Mostly for Jeff Moden... see? I actually did my homework! <g>)  I was thinking of changing the code a bit to record whether the import succeeded or not, but it's a start. Then I should do same in SSIS. variables and for each file loops! Gotta get on that. (Thanks Phil!)

    • This reply was modified 2 years, 5 months ago by  pietlinden.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply