Azure Data Factory is an Important tool for Azure Data engineers and organizations to fulfill their data transfer needs. Data Engineers need to understand how to work on ADF and the nitty and gritty of ADF.
In this article, we will see the Azure Data Factory Interview Questions and development scenarios which can help readers to understand the possible scenarios and prepare for the Interview.
Q1. What are the Building blocks of an ADF?
Answer: Key components or building blocks of Azure Data Factory are Pipelines, Activities, Datasets, Linked Services, Dataflows, and Integration Runtimes.
If you are new to the Azure Data Factory, you can visit the link to see how to create your first Data Factory pipeline: https://www.sqlservercentral.com/articles/azure-data-factory-your-first-data-pipeline
Q2. What are the activities you have used in ADF Data Pipeline with some example scenarios?
Answer: In this answer, you need to explain the pipelines you have created and the activity used in those pipelines. For example, you have created the pipeline to move data from Azure Blob Storage to Azure SQL database. In that pipeline how do you create the Linked Services, Datasets, and Activities, like Copy Activity or Get Metadata activity you have used? Be ready to explain the proecss.
Q3. What is the Integration Runtime in Azure Data Factory?
Answer: The Integration Runtime is the compute infrastructure which Azure Data Factory uses for its various activity to perform data Integration or transformations. Integration Runtime are of three types:
- Azure Integration Runtime supports connecting to data stores and computes services with public accessible endpoints
- Self-hosted Integration Runtime
- SSIS integration Runtime: To lift and shift existing SSIS workload or execute SSIS packages, SSIS integration run time provides Infrastructure.
Q4. What is SSIS integration run time? How to configure it?
Answer: If you are applying for some company that has heavy workloads they want to know if you are well versed with SSIS Integration Runtime. Below are a few pointers for answering this.
The Azure-SSIS IR is a fully managed cluster of Azure VMs dedicated to running your SSIS packages. We can bring our own Azure SQL Database or SQL Managed Instance for the catalog of SSIS projects/packages (SSISDB). We can scale up the power of the compute by specifying node size and scale it out by specifying the number of nodes in the cluster.
Click on the link to read more about SSIS integration Runtime. https://www.sqlservercentral.com/articles/migrating-ssis-packages-to-azure-lift-and-shift-using-azure-data-factory
Q5. Have you used Azure Custom Activity and what is its use?
Answer: The Custom Activity is configured with your data movement or transformation logic and used in a pipeline. The Custom Activity runs your customized code logic on an Azure Batch pool of virtual machines. For Example, if we want to execute a Python Script of C# or any other script for your data transformation or movement logic you can execute those in Batch VMs in Azure Data Factory. These VMs are cheaper options compared to Azure Data Bricks.
See the link for example where Azure Custom activity is used to perform a specific task: https://www.sqlservercentral.com/articles/using-adf-to-upload-disparate-csv-files-into-azure-mysql
Q6. What does Execute Pipeline Activity do?
Answer: The Execute Pipeline activity allows a Data Factory or Synapse pipeline to invoke another pipeline. The further drill-down question for this particular activity is how to pass parameters from the master pipeline to invoked pipeline.
Q7. Have you used Lookup Activity in Azure Data Factory?
Answer: Lookup activity can retrieve a dataset from any of the data sources supported by the data factory. We can do an update and delete operations in Azure SQL Database using lookup activity.
There are below limitations that you need to know about lookup activity for interview purposes.
- The Lookup activity can return up to 5000 rows; if the result set contains more records, the first 5000 rows will be returned.
- The Lookup activity output supports up to 4 MB in size, activity will fail if the size exceeds the limit.
- The longest duration for Lookup activity before the timeout is 24 hours.
Q8. Have you used Key Vault with Azure Data Factory? What is the use of Key Vault in your pipeline?
Answer: Azure Key Vault is a cloud service for securely storing and accessing secrets. A secret is anything that you want to tightly control access to, such as API keys, passwords, certificates, or cryptographic keys.
In a Data Factory Pipeline, we can use the password of the database and any other access key directly from the Key Vault.
Q9. Have to trigger an error notification email in Azure Data Factory?
Answer: We can trigger email notifications using the logic app and Web activity. We can define the workflow in the logic app and then can provide the Logic App URL in Web activity with other details. You can send the dynamic message also using Web activity after failure or completion of any event in the Data Factory Pipeline.
Q10.How to Implement parallel processing in Azure Data Factory Pipeline?
Answer: For Each Loop Activity in Azure Data Factory provides you with parallel processing functionality. In For Each Loop Activity, there is property to process workflow inside the For Each loop in sequential or parallel fashion.
The property, isSequential, specifies whether the loop should be executed sequentially or in parallel. A maximum of 50 loop iterations can be executed at once in parallel).
For example, if we have a For Each activity iterating over a copy activity with 10 different sources and sink datasets with the isSequential property set to False, all copies are executed at once. The default is False.
Q11: How to implement Error Handling in Azure Data Factory Copy Activity to avoid failure of the entire pipeline in case of any issue with a specific record from the source?
Answer: Copy Activity provide the option of Fault Tolerance option. In this option, you can redirect the error rows to Blob Storage. When a row in the file at the source has some unwanted data which can fail the entire copy activity, by enabling this option you can redirect failure rows to a specific destination and other correct rows to the desired destination.
Q12: How to schedule or run an Azure Data Factory Pipeline? What are the different types of triggers in Data Factory?
Answer: By creating Triggers in Azure Data Factory you can schedule a Pipeline. Triggers are of three types.
- Schedule trigger: A trigger that invokes a pipeline on a specific schedule.
- Tumbling window trigger: A trigger that operates on a periodic interval, while also retaining a state.
- Event-based trigger: A trigger that responds to an event. For example, uploading a file in data storage.
Q13: What is Mapping Data Flow?
Answer: Mapping Data are visually designed data transformation in Azure Data Factory. We can do the transformation without writing the code.
Below links talks in detail about mapping data flow and how to implement them.
Q14: How will you move CSV files with different schemas to SQL Database using Azure Data Factory? Column names or no of columns are different at each source file but at destination there is one table where certain columns of files will get saved. The mapping of column from source file and destination will be provided for each File.
Answer: There are many ways to do it using Azure Data Factory. One of this method is explained in this link: https://www.sqlservercentral.com/articles/using-adf-to-upload-disparate-csv-files-into-azure-mysql