What is Azure Data Factory?
It is a cloud-based data integration service that allows you to create data-driven workflows in the cloud that orchestrate and automate data movement and data transformation.
Using Azure Data Factory, you can do the following tasks:
- Create and schedule data-driven workflows (called pipelines) that can ingest data from disparate data stores.
- Process or transform the data by using compute services such as Azure HDInsight Hadoop, Spark, Azure Data Lake Analytics, and Azure Machine Learning.
- Publish output data to data stores such as Azure SQL Data Warehouse for business intelligence (BI) applications to consume.
It’s more of an Extract-and-Load (EL) and Transform-and-Load (TL) platform rather than a traditional Extract-Transform-and-Load (ETL) platform. The transformations process data by using compute services rather than by adding derived columns, counting the number of rows, sorting data, and so on.
Work Flow:

How does it work?
The pipelines (data-driven workflows) in Azure Data Factory typically perform the following three steps:
Connect and collect: Enterprises have data of various types that are located in disparate sources. The first step in building an information production system is to connect to all the required sources of data and processing. These sources include SaaS services, file shares, FTP, and web services. Then move the data as-needed to a centralized location for subsequent processing.
Transform and enrich: After data is present in a centralized data store in the cloud, process or transfer it by using compute services such as HDInsight Hadoop, Spark, Data Lake Analytics, or Machine Learning. You want to reliably produce transformed data on a maintainable and controlled schedule to feed production environments with trusted data. Publish: Deliver transformed data from the cloud to on-premises sources such as SQL Server. Alternatively, keep it in your cloud storage sources for consumption by BI and analytics tools and other applications.
Key components:
Azure Data Factory is composed of four key components. These components work together to provide the platform on which you can compose data-driven workflows with steps to move and transform data.
Pipeline: A data factory can have one or more pipelines. A pipeline is a group of activities. Together, the activities in a pipeline perform a task.
Activity: A pipeline can have one or more activities. Activities define the actions to perform on your data.
Data movement activities: Copy Activity in Data Factory copies data from a source data store to a sink data store. Data from any source can be written to any sink.
Data transformation activities: Azure Data Factory supports the following transformation activities that can be added to pipelines either individually or chained with another activity.
Relationship between Data Factory entities:


Leave a comment