Validating meta data of xls file

  • Hi Team,

    We get xls file every day with so many sheets in it. I need to validate only the sheet named with current year. In this case, it is 2020. the main issue is that the user some times add new columns and due to that package is getting failed. So I want to validate if any extra columns are added or not. If new columns added (some times with out column name but values in the row), I want to send a mail back to user. How do I this ? Any suggestions would be appreciated.Screen shot

    Thanks,
    Charmer

  • This sounds more like a user problem than anything. Perhaps you would be better putting in some validation in the Excel Document as well; for example not letting people type in rows 2 or more, if row 1 is blank. Then they are forced to give the column a header.

    Integration Services expects a static data source; it's not the right tool for validating and checking the integrity and design of data in an Excel Spreadsheet (especially when an xls file is the very old Excel 2003 format).

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • I do this in (a significant chunk of) C#, using a Script Task, for .XLSX files (as well as CSV files and Access files).

    It's not for beginners, but it's not impossible either.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Nothing one cannot do in SQL, but as Phil said, not for the faint-hearted

    😎

     

  • Charmer wrote:

    Hi Team,

    We get xls file every day with so many sheets in it. I need to validate only the sheet named with current year. In this case, it is 2020. the main issue is that the user some times add new columns and due to that package is getting failed. So I want to validate if any extra columns are added or not. If new columns added (some times with out column name but values in the row), I want to send a mail back to user. How do I this ? Any suggestions would be appreciated.Screen shot

    Contrary to what appears to be the popular opinion, this is actually pretty easy to do.  I need to see a slightly more complete example though and I am assuming that you have the ACE drivers loaded as a provider on the T-SQL side of the world.  By a more complete example, I mean I need to see the top part of the 2020 spreadsheet starting at Cell A1 including any blank rows, all the header rows, and the first two or 3 lines of data.  Please be sure to obfuscate any propitiatory information and, certainly, any PII.

    To verify if the ACE drivers have been loaded or not, open the Object Explorer window in SSMS, drill down to {Server Objects}, {Linked Servers}, and {Providers}.  You should see an object like the one I've highlighted in Gray.  It will end with the number 12, 15, or 16.  There may be more than one and that would make for a Martha Stewart moment.  Let us (as opposed to "cabbage") know. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I have tried to achieve this using script task by validating the count of the columns in the sheet. I am validating against a fixed count(using variable from configuration table). If count not matching, then pass the file to the send mail task as an attachment.

     

    Screen shot

    Screen shot

    Thanks,
    Charmer

  • The issue that I have right now is that when I try to pass the entire path of the file along with the file name to a variable , and using the same as an attachment in expression in send mail task, it is not some how passing to send mail task as an attachment.

    Screen shot

     

    Thanks,
    Charmer

  • Finally, issue is fixed which was a silly and it's done. 🙂

    Thanks,
    Charmer

  • Charmer wrote:

    Finally, issue is fixed which was a silly and it's done. 🙂

    OK, so what was the "silly" fix you employed?

    I'll also state that it's a shame that you're resorting to external code to do this.  It's just too damned easy to do in T-SQL but whatever.  Glad you got a fix.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Charmer wrote:

    Finally, issue is fixed which was a silly and it's done. 🙂

    OK, so what was the "silly" fix you employed?

    I'll also state that it's a shame that you're resorting to external code to do this.  It's just too damned easy to do in T-SQL but whatever.  Glad you got a fix.

    Jeff, I, for one, would like to see how you would do this.

    Let's imagine that you have an XLSX file with a sheet called MyData. There may be many other sheets in this file, but those are of no interest here.

    In that sheet, you are expecting data in a tabular format.

    The expected column headings are 'Col1', 'Col2', 'Col3' in cells B2, C2 and D2.

    How would you validate the following?

    1. That the column names match those expected, in the expected order
    2. That there are no additional columns.

     

     

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • I declared a variable to hold the file to pass to send mail task. I had a given a value like 'The value is generated by scrip task'  to the variable. During running time, it will be replaced by the path of the file. But some how this did not work. I emptied the variable and tried. It worked.

    Jeff, I am also keen to know how to do this with T-SQL ?

    • This reply was modified 4 years, 4 months ago by  Charmer.

    Thanks,
    Charmer

  • Charmer wrote:

    I declared a variable to hold the file to pass to send mail task. I had a given a value like 'The value is generated by scrip task'  to the variable. During running time, it will be replaced by the path of the file. But some how this did not work. I emptied the variable and tried. It worked.

    I don't follow how that validates the integrity of the data within said file though. What is your validation mechanism?

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Phil Parkin wrote:

    Jeff Moden wrote:

    Charmer wrote:

    Finally, issue is fixed which was a silly and it's done. 🙂

    OK, so what was the "silly" fix you employed?

    I'll also state that it's a shame that you're resorting to external code to do this.  It's just too damned easy to do in T-SQL but whatever.  Glad you got a fix.

    Jeff, I, for one, would like to see how you would do this.

    Let's imagine that you have an XLSX file with a sheet called MyData. There may be many other sheets in this file, but those are of no interest here.

    In that sheet, you are expecting data in a tabular format.

    The expected column headings are 'Col1', 'Col2', 'Col3' in cells B2, C2 and D2.

    How would you validate the following?

     

      <li style="list-style-type: none;">

      <li style="list-style-type: none;">

    1. That the column names match those expected, in the expected order

     

      <li style="list-style-type: none;">

      <li style="list-style-type: none;">

    1. That there are no additional columns.

     

    Specifying a specific sheet is trivial.

    I did this once before with a user controlled spreadsheet to do validation before updates, I even started do an article about it.

    IIRC I imported one line (header) to a temp table containing 255 varchar(255) columns named F1 to F255. You can ignore row one with openrowset. I then validated the row for heading names.

    Once validated I could then do a second import using the column names.

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Thom A wrote:

    Charmer wrote:

    I declared a variable to hold the file to pass to send mail task. I had a given a value like 'The value is generated by scrip task'  to the variable. During running time, it will be replaced by the path of the file. But some how this did not work. I emptied the variable and tried. It worked.

    I don't follow how that validates the integrity of the data within said file though. What is your validation mechanism?

    That was going to be my question, as well.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Phil Parkin wrote:

    Jeff Moden wrote:

    Charmer wrote:

    Finally, issue is fixed which was a silly and it's done. 🙂

    OK, so what was the "silly" fix you employed?

    I'll also state that it's a shame that you're resorting to external code to do this.  It's just too damned easy to do in T-SQL but whatever.  Glad you got a fix.

    Jeff, I, for one, would like to see how you would do this.

    Let's imagine that you have an XLSX file with a sheet called MyData. There may be many other sheets in this file, but those are of no interest here.

    In that sheet, you are expecting data in a tabular format.

    The expected column headings are 'Col1', 'Col2', 'Col3' in cells B2, C2 and D2.

    How would you validate the following?

     

      <li style="list-style-type: none;">
      <li style="list-style-type: none;">

     

      <li style="list-style-type: none;">

      <li style="list-style-type: none;">

    1. That the column names match those expected, in the expected order
    2. That there are no additional columns.
      <li style="list-style-type: none;">

    A bit of clarification please, Phil.  Is there anything in column A to get in the way? Also, is row 1 blank or does it haven a title in it?  As I asked the OP, perhaps a demonstration would be easier if you could attach a sample spreadsheet for me to have a go with.  Either way, I'll make it interesting. 😀

     

     

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 58 total)

You must be logged in to reply to this topic. Login to reply