Exporting Data to Excel

  • Man, this place is pretty dead. Let me stir some stuff up being it's been a YEAR since I've posted here. Since then, I've been slinging Python on this forums trying to get people to use it more with the Microsoft stack because I absolutely love Python + SQL Server (now Azure technically).

    So, this past week, I've been playing a lot with Pandas. This is a module for Python that allows you to easily create data structures or data frames. It also comes packed with tools to clean, conform, and analyze the data once you have loaded it. One for example is the ExcelWriter that uses the xlwt and Openpyxl modules. This allows Pandas to read and write Excel documents in both xls (xlwt) and xlsx (Openpyxl) files.

    import pandas as pd

    # Create a data frame with some sample data with two fields, CustomerID and CustomerName
    myData = pd.DataFrame({'CustomerID': [1, 2, 3, 4, 5], 'CustomerName': ["John", "Mike", "Jane", "Joe", "Sarah"]})

    # Create the Pandas Excel writer using XlsxWriter as the engine
    writer = pd.ExcelWriter('Customers.xlsx', engine='xlsxwriter')

    # Convert the dataframe to an XlsxWriter Excel object
    myData.to_excel(writer, sheet_name='Customers')

    # Close the Pandas Excel writer and output the Excel file

    Hopefully if that works, you should have a new Excel document in the xlsx format with two fields on a sheet name for Customers. Essentially, allowing you to export data from somewhere such as a SQL Server database to Excel without the need of SSIS.

    Some other cool things, which I'll steal from the help site, you can add a graph with your data too.

    # Get the xlsxwriter objects from the dataframe writer object.
    workbook = writer.book
    worksheet = writer.sheets['Sheet1']

    # Create a chart object.
    chart = workbook.add_chart({'type': 'column'})

    # Configure the series of the chart from the dataframe data.
    chart.add_series({'values': '=Sheet1!$B$2:$B$8'})

    # Insert the chart into the worksheet.
    worksheet.insert_chart('D2', chart)

    You can also add conditional formatting super easy with.

    # Apply a conditional format to the cell range.
    worksheet.conditional_format('B2:B8', {'type': '3_color_scale'})

    And even populate multiple sheets at once.

    # Write each dataframe to a different worksheet.
    df1.to_excel(writer, sheet_name='Sheet1')
    df2.to_excel(writer, sheet_name='Sheet2')
    df3.to_excel(writer, sheet_name='Sheet3')

    Or multiple sheets at multiple position

    # Position the dataframes in the worksheet.
    df1.to_excel(writer, sheet_name='Sheet1') # Default position, cell A1.
    df2.to_excel(writer, sheet_name='Sheet1', startcol=3)
    df3.to_excel(writer, sheet_name='Sheet1', startrow=6)

    # Write the dataframe without the header and index.
    df4.to_excel(writer, sheet_name='Sheet1',
         startrow=7, startcol=4, header=False, index=False)

    All-in-all, pretty easy and neat stuff to extract data to Excel format as well reading it into Pandas. Outside of Excel stuff, once in Pandas, you can pretty much do all your normal ETL basics like cleaning records, conforming them, creating calculated metrics and all that jive in one location. For bigger datasets, it gets a bit tricky, but you have ways of chunking data, serializing it, and all that goodness to help you do a lot of things that you may find in other expensive ETL tools or those you wish you had like in my case, in Azure with no SSIS.

    (Note: This was done in Python 2.7 using Pandas 0.17.1)

    Have fun!

  • This is great, thanks so much for the tutorial!

    Can you please share what resources you used Python for data science? Did you use any of those online courses like DataCamp or DataQuest? I am a beginner but would really love if I can find good resources to guide me on my journey. Thanks!

  • Is this code written around a fixed Excel format?

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply