How can I create a table schema based on reading samples from a text file?

  • I have done this in Hadoop with csvkit to generate a table's schema by reading certain numbers of data from the csv file, not sure if there is similar way to do that in SQL?

    The csv or text source data is quite special in terms of columns it contains (100+ columns), it is not realistic to create the table schema one  by one column which is both tedious and error-prone, is there a way in SQL to do the same thing?

    Thank you.

  • halifaxdal - Wednesday, February 15, 2017 7:58 PM

    I have done this in Hadoop with csvkit to generate a table's schema by reading certain numbers of data from the csv file, not sure if there is similar way to do that in SQL?

    The csv or text source data is quite special in terms of columns it contains (100+ columns), it is not realistic to create the table schema one  by one column which is both tedious and error-prone, is there a way in SQL to do the same thing?

    Thank you.

    Have you tried at all? It looks like it's supported with csvsql:
    csvsql

    Check the link at the bottom of the before the examples: For information on connection strings and supported dialects refer to the SQLAlchemy documentation.

    A couple of people used the same approach here:
    csvkit: A Swiss Army Knife for Comma-Delimited Files

    Sue

  • Sue_H - Wednesday, February 15, 2017 8:26 PM

    halifaxdal - Wednesday, February 15, 2017 7:58 PM

    I have done this in Hadoop with csvkit to generate a table's schema by reading certain numbers of data from the csv file, not sure if there is similar way to do that in SQL?

    The csv or text source data is quite special in terms of columns it contains (100+ columns), it is not realistic to create the table schema one  by one column which is both tedious and error-prone, is there a way in SQL to do the same thing?

    Thank you.

    Have you tried at all? It looks like it's supported with csvsql:
    csvsql

    Check the link at the bottom of the before the examples: For information on connection strings and supported dialects refer to the SQLAlchemy documentation.

    A couple of people used the same approach here:
    csvkit: A Swiss Army Knife for Comma-Delimited Files

    Sue

    Thanks Sue.
    No I did not try to install on Windows yesterday because I was afraid there would be lots of prerequisites or incompatibility issues.

    I just tried it anyway, and just like I expected, there is problem installing pip which is needed to install csvkit, the company's proxy prevent get-pip.py from working, most likely I have to try it tonight at home.

    Thank you for your suggestion though, please let me know if you have any workaround to download/install csvkit.

  • halifaxdal - Thursday, February 16, 2017 9:06 AM

    Thanks Sue.
    No I did not try to install on Windows yesterday because I was afraid there would be lots of prerequisites or incompatibility issues.

    I just tried it anyway, and just like I expected, there is problem installing pip which is needed to install csvkit, the company's proxy prevent get-pip.py from working, most likely I have to try it tonight at home.

    Thank you for your suggestion though, please let me know if you have any workaround to download/install csvkit.

    Python ships with pip - not sure if that's enough to get everything up and running but that might be worth a try.
    I only know of one person who ever used that on Windows and he was using Cygwin.

    Sue

  • Sue_H - Thursday, February 16, 2017 11:34 AM

    halifaxdal - Thursday, February 16, 2017 9:06 AM

    Thanks Sue.
    No I did not try to install on Windows yesterday because I was afraid there would be lots of prerequisites or incompatibility issues.

    I just tried it anyway, and just like I expected, there is problem installing pip which is needed to install csvkit, the company's proxy prevent get-pip.py from working, most likely I have to try it tonight at home.

    Thank you for your suggestion though, please let me know if you have any workaround to download/install csvkit.

    Python ships with pip - not sure if that's enough to get everything up and running but that might be worth a try.
    I only know of one person who ever used that on Windows and he was using Cygwin.

    Sue

    python 3.5 does come with pip however, installing csvkit requires pip talk to external website and it seems pip cannot get over proxy, I will have to do this again from home

  • I am able to install csvkit at home, however, how to use that in Windows? I believe there are really barely people using it in Windows as my search doesn't give me a useful hint. 

    It doesn't start in command line, nor in python shell. I remember I used it in terminal directly in CentOS, correct me if I am wrong.

    Thank you.

  • halifaxdal - Thursday, February 16, 2017 7:37 PM

    I am able to install csvkit at home, however, how to use that in Windows? I believe there are really barely people using it in Windows as my search doesn't give me a useful hint. 

    It doesn't start in command line, nor in python shell. I remember I used it in terminal directly in CentOS, correct me if I am wrong.

    Thank you.

    Yup. Probably close to that would be to use it in Cygwin shell that a former coworker used.
    But in python, did you do the import csvkit ? After importing, I think the command syntax will work.

    Sue

  • Sue_H - Thursday, February 16, 2017 8:06 PM

    halifaxdal - Thursday, February 16, 2017 7:37 PM

    I am able to install csvkit at home, however, how to use that in Windows? I believe there are really barely people using it in Windows as my search doesn't give me a useful hint. 

    It doesn't start in command line, nor in python shell. I remember I used it in terminal directly in CentOS, correct me if I am wrong.

    Thank you.

    Yup. Probably close to that would be to use it in Cygwin shell that a former coworker used.
    But in python, did you do the import csvkit ? After importing, I think the command syntax will work.

    Sue

    Thanks Sue, no I didn't write python code to use the csvkit, can you share some code here?

    I am installing cygwin, so cumbersome compared to using csvkit in CentOS.

    Thank you.

  • Don't know what happened to Cgywin, I started the installation last night and it is still hanging there not completed, I have no choice but to give up.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply