Validating meta data of xls file

  • Eirikur Eiriksson wrote:

    David Burrows wrote:

    Eirikur Eiriksson wrote:

    Phil Parkin wrote:

    David Burrows wrote:

    This is why I moved away from SSIS and ACE for spreadsheets with mixed data (ie numeric and alphanumeric) in the same column, unless I can guarantee the data. Even a single column with numeric data with leading zeros will be changed by the drivers.

    To get round this, and I will not put office on my SQL server, I wrote a program(run on an application server) to export the first sheet to tab delimited text file without changing the data, then I could import the data by any method I choose.

    That's interesting. How does your program access the XLSX data, if not via the awful 'ACE' driver?

    XLSX is in the OOXML format, not too hard to work with and there are several other spreadsheet programs using the same format. The good thing about OOXML is that one only has to have some understanding of XML to digest the files into any RDBMS, no ACE drivers or other such needed.

    😎

    You will still need to unzip the file first.

    Unzipping is the simple part, only needs a path to a command line program in the path variable, much simpler than installing a driver. I normally use 7zip, works like a charm every time.

    😎

    True.Yet another step and more software just to get around MS poor interfaces 🙁

    Far away is close at hand in the images of elsewhere.
    Anon.

  • David Burrows wrote:

    Eirikur Eiriksson wrote:

    David Burrows wrote:

    Eirikur Eiriksson wrote:

    Phil Parkin wrote:

    David Burrows wrote:

    This is why I moved away from SSIS and ACE for spreadsheets with mixed data (ie numeric and alphanumeric) in the same column, unless I can guarantee the data. Even a single column with numeric data with leading zeros will be changed by the drivers.

    To get round this, and I will not put office on my SQL server, I wrote a program(run on an application server) to export the first sheet to tab delimited text file without changing the data, then I could import the data by any method I choose.

    That's interesting. How does your program access the XLSX data, if not via the awful 'ACE' driver?

    XLSX is in the OOXML format, not too hard to work with and there are several other spreadsheet programs using the same format. The good thing about OOXML is that one only has to have some understanding of XML to digest the files into any RDBMS, no ACE drivers or other such needed.

    😎

    You will still need to unzip the file first.

    Unzipping is the simple part, only needs a path to a command line program in the path variable, much simpler than installing a driver. I normally use 7zip, works like a charm every time.

    😎

    True.Yet another step and more software just to get around MS poor interfaces 🙁

    You took the words out of my mouth my friend, but then again, just like the physics of motions in fluids, it's all about the path of least resistance.

    😎

    MS's viscosity somewhat alters ones perception of "fluid" 😉

     

  • Phil Parkin wrote:

    Jeff, thank you for taking the time to write all of this stuff out.

    Importing from Excel using SSIS is almost always a massive pain, so I am going to check this method out.

    The biggest problems I have tend to be related to data types, or the lack of them. If what you are doing gets round those issues, I'll be converted.

    Thanks again!

    I've not yet had a problem with datatypes, possibly because of the IMEX=1 thing and the fact that there are 3 header rows.  Not by choice but the resulting EAV file has always contained VARCHAR(510) for almost everything.  That's fine by me because I never import directly to a final target table.  I always use a staging table like the EAVs that are produced by the code I provided.

    And, no... I don't blame the ACE drivers for the inability to more intelligently return the correct datatypes.  We are, after all, always importing user stuff and lord knows what users might put on the spreadsheets.

    Just to be sure, the code I provided was written specifically for 3 row headers in a 2/1 format.  Most of that type of information is in the flowerbox of the code.  It does, as you'll find out in the demo, auto-magically adjust for added and changed columns including but not limited to the common row columns.  It's not totally AI but we could get it pretty close with a little work.  I just didn't want to clutter things up when I didn't need the benefit.

    Phil... where do you hail from?  I ask because I'm giving the "Automating "Excel Hell"" presentation at the Cleveland SQL Saturday on the 1st of February 2020 and would love to meet you in person.  It would also be my pleasure to meet anyone else reading this thread whether or not they've posted on it.

    p.s.  You might want to "convert" even though I just use the datatypes that ACE throws at me because of the automatic unpivot and the automatic mapping part of it, which most people would have to change a package for each month due to column "MADness" (Moves, Adds, Deletes)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Unzipping files is natively available in .NET, so there should be no need to install additional software.

    See here for more info.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Eirikur Eiriksson wrote:

    David Burrows wrote:

    Eirikur Eiriksson wrote:

    David Burrows wrote:

    Eirikur Eiriksson wrote:

    Phil Parkin wrote:

    David Burrows wrote:

    This is why I moved away from SSIS and ACE for spreadsheets with mixed data (ie numeric and alphanumeric) in the same column, unless I can guarantee the data. Even a single column with numeric data with leading zeros will be changed by the drivers.

    To get round this, and I will not put office on my SQL server, I wrote a program(run on an application server) to export the first sheet to tab delimited text file without changing the data, then I could import the data by any method I choose.

    That's interesting. How does your program access the XLSX data, if not via the awful 'ACE' driver?

    XLSX is in the OOXML format, not too hard to work with and there are several other spreadsheet programs using the same format. The good thing about OOXML is that one only has to have some understanding of XML to digest the files into any RDBMS, no ACE drivers or other such needed.

    😎

    You will still need to unzip the file first.

    Unzipping is the simple part, only needs a path to a command line program in the path variable, much simpler than installing a driver. I normally use 7zip, works like a charm every time.

    😎

    True.Yet another step and more software just to get around MS poor interfaces 🙁

    You took the words out of my mouth my friend, but then again, just like the physics of motions in fluids, it's all about the path of least resistance.

    😎

    MS's viscosity somewhat alters ones perception of "fluid" 😉

    BWAAAA-HAAAA!!!! Seems like even healthy doses of Kaopectate won't fix it either! 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Phil... where do you hail from?  I ask because I'm giving the "Automating "Excel Hell"" presentation at the Cleveland SQL Saturday on the 1st of February 2020 and would love to meet you in person.  It would also be my pleasure to meet anyone else reading this thread whether or not they've posted on it.

    I'd like that. Currently working in Grand Cayman (it's a hard life), but I'll see whether my boss is open to it.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Phil Parkin wrote:

    Unzipping files is natively available in .NET, so there should be no need to install additional software.

    See here for more info.

    Just to be sure, my solution didn't include unzipping any files.  I provided them that way just to "group" them all together and allow someone to unzip them in one "swell foop" to the directory of their choice in an easier fashion than downloading a half dozen separate files.

    When I do incorporate unzipping into anything real, I do like 7ZIP and xp_CmdShell.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Phil Parkin wrote:

    Unzipping files is natively available in .NET, so there should be no need to install additional software.

    See here for more info.

    I was thinking the same and there are XML/JSON additions as well.

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Jeff Moden wrote:

    Eirikur Eiriksson wrote:

    David Burrows wrote:

    Eirikur Eiriksson wrote:

    David Burrows wrote:

    Eirikur Eiriksson wrote:

    Phil Parkin wrote:

    David Burrows wrote:

    This is why I moved away from SSIS and ACE for spreadsheets with mixed data (ie numeric and alphanumeric) in the same column, unless I can guarantee the data. Even a single column with numeric data with leading zeros will be changed by the drivers.

    To get round this, and I will not put office on my SQL server, I wrote a program(run on an application server) to export the first sheet to tab delimited text file without changing the data, then I could import the data by any method I choose.

    That's interesting. How does your program access the XLSX data, if not via the awful 'ACE' driver?

    XLSX is in the OOXML format, not too hard to work with and there are several other spreadsheet programs using the same format. The good thing about OOXML is that one only has to have some understanding of XML to digest the files into any RDBMS, no ACE drivers or other such needed.

    😎

    You will still need to unzip the file first.

    Unzipping is the simple part, only needs a path to a command line program in the path variable, much simpler than installing a driver. I normally use 7zip, works like a charm every time.

    😎

    True.Yet another step and more software just to get around MS poor interfaces 🙁

    You took the words out of my mouth my friend, but then again, just like the physics of motions in fluids, it's all about the path of least resistance.

    😎

    MS's viscosity somewhat alters ones perception of "fluid" 😉

    BWAAAA-HAAAA!!!! Seems like even healthy doses of Kaopectate won't fix it either! 😀

    Agreed, Bismuth subsalicylate isn't the "cure"

    😎

     

  • Jeff Moden wrote:

    Phil Parkin wrote:

    Unzipping files is natively available in .NET, so there should be no need to install additional software.

    See here for more info.

    Just to be sure, my solution didn't include unzipping any files.  I provided them that way just to "group" them all together and allow someone to unzip them in one "swell foop" to the directory of their choice in an easier fashion than downloading a half dozen separate files.

    When I do incorporate unzipping into anything real, I do like 7ZIP and xp_CmdShell.

    I have tested "almost" all methods of handling compressed files within the realm of the SQL Server, found that it only handles non-hierarchical (no folders) files, otherwise only external programs via xp_cmdshell are reliable.

    😎

    Did use CLRs in the passed until I found out that there is a problem with internal streaming/scheduling which limits the reliability of the process's output.

     

  • Eirikur Eiriksson wrote:

    Jeff Moden wrote:

    Phil Parkin wrote:

    Unzipping files is natively available in .NET, so there should be no need to install additional software.

    See here for more info.

    Just to be sure, my solution didn't include unzipping any files.  I provided them that way just to "group" them all together and allow someone to unzip them in one "swell foop" to the directory of their choice in an easier fashion than downloading a half dozen separate files.

    When I do incorporate unzipping into anything real, I do like 7ZIP and xp_CmdShell.

    I have tested "almost" all methods of handling compressed files within the realm of the SQL Server, found that it only handles non-hierarchical (no folders) files, otherwise only external programs via xp_cmdshell are reliable.

    😎

    Did use CLRs in the passed until I found out that there is a problem with internal streaming/scheduling which limits the reliability of the process's output.

    What problems did you have, I have two CLR, one to import a text file and another to import a file via FTP, and have not found any issues.

    Far away is close at hand in the images of elsewhere.
    Anon.

  • David Burrows wrote:

    Eirikur Eiriksson wrote:

    Jeff Moden wrote:

    Phil Parkin wrote:

    Unzipping files is natively available in .NET, so there should be no need to install additional software.

    See here for more info.

    Just to be sure, my solution didn't include unzipping any files.  I provided them that way just to "group" them all together and allow someone to unzip them in one "swell foop" to the directory of their choice in an easier fashion than downloading a half dozen separate files.

    When I do incorporate unzipping into anything real, I do like 7ZIP and xp_CmdShell.

    I have tested "almost" all methods of handling compressed files within the realm of the SQL Server, found that it only handles non-hierarchical (no folders) files, otherwise only external programs via xp_cmdshell are reliable.

    😎

    Did use CLRs in the passed until I found out that there is a problem with internal streaming/scheduling which limits the reliability of the process's output.

    What problems did you have, I have two CLR, one to import a text file and another to import a file via FTP, and have not found any issues.

    Simply the data streaming process can and will occasionally cut of so when using it, one can expect only half the basket, not the full content.

    😎

     

  • Eirikur Eiriksson wrote:

    David Burrows wrote:

    Eirikur Eiriksson wrote:

    Jeff Moden wrote:

    Phil Parkin wrote:

    Unzipping files is natively available in .NET, so there should be no need to install additional software.

    See here for more info.

    Just to be sure, my solution didn't include unzipping any files.  I provided them that way just to "group" them all together and allow someone to unzip them in one "swell foop" to the directory of their choice in an easier fashion than downloading a half dozen separate files.

    When I do incorporate unzipping into anything real, I do like 7ZIP and xp_CmdShell.

    I have tested "almost" all methods of handling compressed files within the realm of the SQL Server, found that it only handles non-hierarchical (no folders) files, otherwise only external programs via xp_cmdshell are reliable.

    😎

    Did use CLRs in the passed until I found out that there is a problem with internal streaming/scheduling which limits the reliability of the process's output.

    What problems did you have, I have two CLR, one to import a text file and another to import a file via FTP, and have not found any issues.

    Simply the data streaming process can and will occasionally cut of so when using it, one can expect only half the basket, not the full content.

    😎

    [/q

    Eirikur Eiriksson wrote:

    David Burrows wrote:

    Eirikur Eiriksson wrote:

    Jeff Moden wrote:

    Phil Parkin wrote:

    Unzipping files is natively available in .NET, so there should be no need to install additional software.

    See here for more info.

    Just to be sure, my solution didn't include unzipping any files.  I provided them that way just to "group" them all together and allow someone to unzip them in one "swell foop" to the directory of their choice in an easier fashion than downloading a half dozen separate files.

    When I do incorporate unzipping into anything real, I do like 7ZIP and xp_CmdShell.

    I have tested "almost" all methods of handling compressed files within the realm of the SQL Server, found that it only handles non-hierarchical (no folders) files, otherwise only external programs via xp_cmdshell are reliable.

    😎

    Did use CLRs in the passed until I found out that there is a problem with internal streaming/scheduling which limits the reliability of the process's output.

    What problems did you have, I have two CLR, one to import a text file and another to import a file via FTP, and have not found any issues.

    Simply the data streaming process can and will occasionally cut of so when using it, one can expect only half the basket, not the full content.

    😎

    :-/ I'll have to check that when I'm back in the office

    Far away is close at hand in the images of elsewhere.
    Anon.

  • I know I don't have to check on the CLR thing... I don't use them. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • David Burrows wrote:

    Phil Parkin wrote:

    Unzipping files is natively available in .NET, so there should be no need to install additional software.

    See here for more info.

    I was thinking the same and there are XML/JSON additions as well.

    Heh... there are two things I like about JSON... I don't use it but I hate it less than XML when others do. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 31 through 45 (of 58 total)

You must be logged in to reply to this topic. Login to reply