Whether you work as a Data Engineer or a Data Scientist, a Jupyter Notebook is a helpful tool. One of the projects I was working required a comparison of two parquet files. This is mainly a schema comparison, not a data comparison. Though the two .parquet were created from two different sources, the outcome should be completely alike, schema wise. At the beginning I was manually comparing them then I thought there must be a tool to do that. Well, that's how I found a Jupyter notebook can be useful to compare two .parquet files' schema.
The Jupyter Notebook can be used for data cleaning and transformation, data visualization, machine learning, statistical modeling and much more. This post will describe the step by step installation process of Jupyter notebook.
Step 1: Install python version 3.7.9
Python is a prerequisite for running a Jupyter notebook, so we need to install python first. Please follow this URL and choose right version to install: https://www.python.org/downloads/.
I have chosen 'Windows x86-64 executable installer' for my Windows 64 bit OS. Please choose the version as per your computer Operating system.
Fig 1: Windows Executable
You can download the executable file and save in any location at your computer.
Now next step is to create a 'Python' folder under the C: drive, we will use this folder as installation location at later step.
Find out the downloaded executable file, I have saved the executable file under Downloads folder (shown in below figure 3). Now double click the executable file to initiate the installation process.
Fig 3: Python Execution file
Make sure to choose 'Customize Installation' and check mark 'Add Python 3.9 to PATH' as shown in figure 4. I followed the customization method to avoid setting up environment variable.
Fig 4: Python Installation wizard
As below figure 5 shown, the Customize installation location, where make sure you put the installation location folder C:\Python\Python39. We have created 'Python' folder in C drive in earlier step (Fig 2)