Acessing Unix files directly through SSIS

  • Hello All,

    I am new to SSIS, and have started using it from the past 3 weeks. I have a task where in I am supposed to access the flat files which are on Unix and apply transformations on the files and create equivalent CSV files. Initially I downloaded the files from Unix to windows local drive through FTP task and started applying the transformations. But, our client informed that there will be around 10 different flat files in different folders on Unix and creating 10 FTP tasks for downloading the files will consume time as each file consists of around 10 million records. Hence, our client suggested us to read the Unix flat files directly through SSIS (without downloading them) and apply transformations.

    I GOOGLEd and discussed with my office-mates as well but couldn't find a solution for the same. Could the experts here please help me out and let me know if accessing the Unix files directly from SSIS is possible at all?

    Thanks in advance.

    Tapasvi.

  • lvbntapasvi (9/6/2014)


    Hello All,

    I am new to SSIS, and have started using it from the past 3 weeks. I have a task where in I am supposed to access the flat files which are on Unix and apply transformations on the files and create equivalent CSV files. Initially I downloaded the files from Unix to windows local drive through FTP task and started applying the transformations. But, our client informed that there will be around 10 different flat files in different folders on Unix and creating 10 FTP tasks for downloading the files will consume time as each file consists of around 10 million records. Hence, our client suggested us to read the Unix flat files directly through SSIS (without downloading them) and apply transformations.

    I GOOGLEd and discussed with my office-mates as well but couldn't find a solution for the same. Could the experts here please help me out and let me know if accessing the Unix files directly from SSIS is possible at all?

    Quick thought, the content has to be transferred to the system on which the SSIS is executed as most of the SSIS transformation are performed in memory, hence this doesn't sound like a proper solution. There is an option, depending on network topology, connectivity etc. to mount the remote drives using SMB (samba) or other similar, still the content has to be transferred over.

    😎

  • lvbntapasvi (9/6/2014)


    Hello All,

    I am new to SSIS, and have started using it from the past 3 weeks. I have a task where in I am supposed to access the flat files which are on Unix and apply transformations on the files and create equivalent CSV files. Initially I downloaded the files from Unix to windows local drive through FTP task and started applying the transformations. But, our client informed that there will be around 10 different flat files in different folders on Unix and creating 10 FTP tasks for downloading the files will consume time as each file consists of around 10 million records. Hence, our client suggested us to read the Unix flat files directly through SSIS (without downloading them) and apply transformations.

    I GOOGLEd and discussed with my office-mates as well but couldn't find a solution for the same. Could the experts here please help me out and let me know if accessing the Unix files directly from SSIS is possible at all?

    Thanks in advance.

    Tapasvi.

    Check with your colleagues to find out if the files are exposed via UNC path, e.g. if a Samba share is setup on the Unix server that would allow you direct access from a Windows machine. You can refer to the files on a Samba share from SSIS same as you would a share on a Windows server, that is the point of the Samba software. Once you have that, no need for FTP and you can read the files directly off the Unix server and process them in SSIS, i.e. you do not need to copy them anywhere first. Of course this assumes the Samba share and network are reliable to the point where SSIS will not have problems maintaining an open file handle while it reads through the millions of rows.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (9/8/2014)


    lvbntapasvi (9/6/2014)


    Hello All,

    I am new to SSIS, and have started using it from the past 3 weeks. I have a task where in I am supposed to access the flat files which are on Unix and apply transformations on the files and create equivalent CSV files. Initially I downloaded the files from Unix to windows local drive through FTP task and started applying the transformations. But, our client informed that there will be around 10 different flat files in different folders on Unix and creating 10 FTP tasks for downloading the files will consume time as each file consists of around 10 million records. Hence, our client suggested us to read the Unix flat files directly through SSIS (without downloading them) and apply transformations.

    I GOOGLEd and discussed with my office-mates as well but couldn't find a solution for the same. Could the experts here please help me out and let me know if accessing the Unix files directly from SSIS is possible at all?

    Thanks in advance.

    Tapasvi.

    Check with your colleagues to find out if the files are exposed via UNC path, e.g. if a Samba share is setup on the Unix server that would allow you direct access from a Windows machine. You can refer to the files on a Samba share from SSIS same as you would a share on a Windows server, that is the point of the Samba software. Once you have that, no need for FTP and you can read the files directly off the Unix server and process them in SSIS, i.e. you do not need to copy them anywhere first. Of course this assumes the Samba share and network are reliable to the point where SSIS will not have problems maintaining an open file handle while it reads through the millions of rows.

    Also - just because you CAN doesn't mean you should. Samba does avoid the FTP client issue: however in some cases it may still be better to copy the file locally first then process the contents, write the new CSV files locally, then push them back up to UNIX. Writing especially would be much slower "over the wire" than locally, even without having to take into account the overhead from Samba.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (#4) (9/8/2014)


    Also - just because you CAN doesn't mean you should.

    If only more people took that one to heart.

    Samba does avoid the FTP client issue: however in some cases it may still be better to copy the file locally first then process the contents, write the new CSV files locally, then push them back up to UNIX. Writing especially would be much slower "over the wire" than locally, even without having to take into account the overhead from Samba.

    No doubt Samba can be difficult to work with. I have found some Samba shares to be unreliable but not sure if that was due to network problems in those specific environments, a buggy version, a poor implementation, or some strange brew causing problems. I have found them to be good in some cases too. Based on the requirements (I did not see specifically in the post that the transformed files should end up on Unix) it may still be desirable to copy the files locally, work on them, then copy them to their final location.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • In order to advice any further on the subject, more information is needed, without the info on network topology, connectivity etc., it is pure guesswork.

    Keep in mind that in order to process any data with SSIS, it has to be loaded into memory of the system running the SSIS service/package.

    Samba is normally very reliable and often faster than both windows shares and ftp, but yet again it depends on the environment.

    😎

  • Eirikur Eiriksson (9/8/2014)


    In order to advice any further on the subject, more information is needed, without the info on network topology, connectivity etc., it is pure guesswork.

    Keep in mind that in order to process any data with SSIS, it has to be loaded into memory of the system running the SSIS service/package.

    Samba is normally very reliable and often faster than both windows shares and ftp, but yet again it depends on the environment.

    😎

    Yep, it will depend on a lot of things. In this case the thread is more for providing guidance rather than a concrete solution. There are things we could never learn about the environment on a forum thread that could change the solution that would fit best. It is up to the OP to make that call.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Thanks everyone for your valuable inputs. I have started looking into Samba as one of the options and have informed the same to the client.

    After transformation, my project doesn't require the files to be written back to Unix. The transformed files have to be downloaded as CSV files and stored on Windows server.

    Eirikur Eiriksson, could you please let me know what information is required when you quoted "network topology and connectivity"? I will try to gather these from my fellow-mates.

    Thanks.

  • If you have ten files, you can create ten dataflows and have them run in parallel OR have ten different packages and call them in parallel with a master package. Or a combination of both. The problem you are facing is that if your download errs at any point you would have to start over. So it is not a bad idea to import into a raw file locally and then have the next dataflow do the transformations sourcing from the raw file.

    I am curious, are any of these file you are importing going to have duplicate rows across batches? If you can find a way to only import what you need that could be a big resource saver by not moving millions of records across a network by parallel transactions.

    ----------------------------------------------------

  • lvbntapasvi (9/6/2014)


    Hello All,

    I am new to SSIS, and have started using it from the past 3 weeks. I have a task where in I am supposed to access the flat files which are on Unix and apply transformations on the files and create equivalent CSV files. Initially I downloaded the files from Unix to windows local drive through FTP task and started applying the transformations. But, our client informed that there will be around 10 different flat files in different folders on Unix and creating 10 FTP tasks for downloading the files will consume time as each file consists of around 10 million records. Hence, our client suggested us to read the Unix flat files directly through SSIS (without downloading them) and apply transformations.

    I GOOGLEd and discussed with my office-mates as well but couldn't find a solution for the same. Could the experts here please help me out and let me know if accessing the Unix files directly from SSIS is possible at all?

    Thanks in advance.

    Tapasvi.

    i am doing the same and also get the same problem in SSIS. and one more thing that happen to me is when i downloading files from unix to my local machine then it will be encoded with some character like "ý" that. please tell me anything how to remove this.

  • mitesh.temre (12/17/2014)


    lvbntapasvi (9/6/2014)


    Hello All,

    I am new to SSIS, and have started using it from the past 3 weeks. I have a task where in I am supposed to access the flat files which are on Unix and apply transformations on the files and create equivalent CSV files. Initially I downloaded the files from Unix to windows local drive through FTP task and started applying the transformations. But, our client informed that there will be around 10 different flat files in different folders on Unix and creating 10 FTP tasks for downloading the files will consume time as each file consists of around 10 million records. Hence, our client suggested us to read the Unix flat files directly through SSIS (without downloading them) and apply transformations.

    I GOOGLEd and discussed with my office-mates as well but couldn't find a solution for the same. Could the experts here please help me out and let me know if accessing the Unix files directly from SSIS is possible at all?

    Thanks in advance.

    Tapasvi.

    i am doing the same and also get the same problem in SSIS. and one more thing that happen to me is when i downloading files from unix to my local machine then it will be encoded with some character like "ý" that. please tell me anything how to remove this.

    "ý" is a valid non-Unicode-character in SQL Server.

    DECLARE @v-2 VARCHAR(10) = 'ý';

    SELECT ASCII(@v);

    If you want to replace it then you can use REPLACE:

    DECLARE @v-2 VARCHAR(10) = 'ý';

    SELECT REPLACE(@v, 'ý', 'y');

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • mitesh.temre (12/17/2014)


    lvbntapasvi (9/6/2014)


    Hello All,

    I am new to SSIS, and have started using it from the past 3 weeks. I have a task where in I am supposed to access the flat files which are on Unix and apply transformations on the files and create equivalent CSV files. Initially I downloaded the files from Unix to windows local drive through FTP task and started applying the transformations. But, our client informed that there will be around 10 different flat files in different folders on Unix and creating 10 FTP tasks for downloading the files will consume time as each file consists of around 10 million records. Hence, our client suggested us to read the Unix flat files directly through SSIS (without downloading them) and apply transformations.

    I GOOGLEd and discussed with my office-mates as well but couldn't find a solution for the same. Could the experts here please help me out and let me know if accessing the Unix files directly from SSIS is possible at all?

    Thanks in advance.

    Tapasvi.

    i am doing the same and also get the same problem in SSIS. and one more thing that happen to me is when i downloading files from unix to my local machine then it will be encoded with some character like "ý" that. please tell me anything how to remove this.

    What you are seeing is unicode. This character can be seen with the statement

    SELECT CHAR(253);

    SSIS is well prepared to handle unicode data. Why would you want to remove it? The OP wanted to handle multiple file transfers. If the files are all spread out then run a dataflow for each in parallel. If they are all in the same directory you could use a for each loop transformation, ForEach File Enumerator as the enumerator. I would need more information to elaborate further.

    Keep one thing in mind, one purpose of first downloading a flat file is for archiving it for possible future auditing, on your server. All processes take time and you should own the responsibility of having a say on how to best engineer the process.

    ----------------------------------------------------

  • Just for the record, CHAR(253) is not Unicode.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Orlando Colamatteo (12/17/2014)


    Just for the record, CHAR(253) is not Unicode.

    Hi Orlando,

    Hmm, I guess I am not clear on what you mean... I just saw that unicode decimal 253 is the symbol depicted in the OP comment. It is not in the ascii table and I found it in this list >

    http://unicode-table.com/en/ not too far down.

    ----------------------------------------------------

  • Check out the Windows 1252 map, number 253.

    http://en.m.wikipedia.org/wiki/Windows-1252

    If you can represent a character with a single byte (253<255, max number storable in a byte) then you do not need Unicode in SQL Server, i.e. NVARCHAR or NCHAR, to store it you just have to pick the correct collation and the proper methods in your app to render the intended character. To SQL Server it's number 253.

    Unicode data types allow us to represent numbers bigger than 255 by employing 2 bytes per character bringing the possible options up from 256 to upwards of 65k. Coupled with collations which can represent different language characters per position the possible number of characters goes way up.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply