Import data in parquet format

  • I have a situation where I need to import data from a Hadoop file that is in parquet format. Preferably, I'd like to do this from within a stored procedure that can truncate the staging table, import the data, then process the data as needed. I know that polybase features are in 2016, but our server will be running 2014. Any suggestions?

  • Hi Aaron,

    I have no idea what the "parquet" format is. I also don't know what the source is. Would it be from a text file?

    If it's from a text file and you can explain what the "parquet" format is and maybe even attach such a file (no proprietary or PII info, please), I could take a whack at it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Heh... I just looked up "Parquet Format" online. You would probably be better off writing a magic decoder ring for this in Java to expand the data into a CSV file and import that with SQL.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • That's the thing I like about you Jeff. You're always full of optimism and hope! 😛

    However, I've not given up hope yet. Who know, since support for XML is here and JSON nearly so, who knows? I've also considered a Linked Server approach although I generally don't favor those due to performance issues.

  • Just trying to use the right tool for the right thing. As with most things, shredding the parquet format in SQL Server could be done but, like using even built in features for XML and Jason, SQL Server probably isn't the right place to do it.

    Can't Hadoop do the data expansion into a nice neat high performance TAB delimited file? I'd be disappointed if it couldn't.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The team did identify a method of pushing the data to SQL Server but this approach requires coordinating the stored procedure execution on SQL Server to occur AFTER the data push. I was hoping to avoid such a scheduling/dependency nightmare by having it all coordinated within the SQL Server SP, but I may have to take a different approach. Of course, if this were 2016, we'd have Polybase at our disposal...

  • Aaron N. Cutshall (12/5/2016)


    The team did identify a method of pushing the data to SQL Server but this approach requires coordinating the stored procedure execution on SQL Server to occur AFTER the data push. I was hoping to avoid such a scheduling/dependency nightmare by having it all coordinated within the SQL Server SP, but I may have to take a different approach. Of course, if this were 2016, we'd have Polybase at our disposal...

    Why wouldn't a trigger do it for you? For that matter, why couldn't Hadoop call a batch file that fires off SQLCMD?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply