Azure Data Factory - Source Data Split Into Two Destination Tables

Shamen Paris
Jan 11, 2021
3 min read

What is Azure Data Factory?

Azure Data Factory is a fully managed data migration and integration service that enables Microsoft Azure users to bring together data from a variety of sources in the Azure public cloud.

The Azure Data Factory service allows users to integrate both on-premises data in Microsoft SQL Server, as well as cloud data in Azure SQL Database, Azure Blob Storage, and Azure Table Storage.

Once Azure Data Factory collects the relevant data, it can be processed by tools like Azure HDInsight (Apache Hive and Apache Pig). Azure Data Factory automates and orchestrates the entire data integration process from end to end, so that users have a single pane of glass into their ETL data pipelines.

Let get Start!

Step 1

Create an Azure free Account then login to Azure web portal. Frist you have to create an Azure Resource Group. After that create a Blob Storage Account and Data Factory.

In Blob Storage Account Create two Containers called input and output. (You can also create one Container)

Then upload the files into input and output containers. You can refer below files.

For input container upload OrderDetails.csv file and other two files upload into output container.

Step 2

Go to Data Factory and hit the Author & Monitor button.

In Data Factory go to Manage section and create a Linked services. Linked service is connection to the data source. For create new Linked service you need to click to "+" icon.

Then in Data store you need click Azure Blob Storage.

Fill the details according to Azure Blob Storage you created early.

Then test the connection to verify whether connection is successful or not. Then hit the crate button to create the connection.

Step 3

Now its time to create Dataset. Go to Author section and click Datasets to create a new dataset. If you need you can create a folder also.

After that in Data factory portal new interface popup. Find the Azure Blob Storage and click it.

Select DelimitedText as file format.

Then file the property fields and set file path correctly.

Likewise set other two datasets and give proper names.

Step 4

Its time to create pipeline. For that go to Pipelines in Author section and select New pipeline

In properties you can add a name to pipeline and give a nice Description to get idea to other developers.

In Move & transform drag and drop the Data flow.

In Adding data flow click Create new data flow radio button then click the Data flow button.

It's time to create data flow mapping. First component in data flow is data source. We can add more data sources but for this process we have to create one source. So in source settings fill the mandatory fields (indicate in *). Select Source type as Dataset and Dataset as OrderDetailsSurce.

Make sure to turn on Data flow debug because to track record move correct way. You can see it in Data preview tab.

Normally csv file contain all string data type. So numeric values also in string data type. Next step is change strings values to numeric data type. For that we can use derived column. Click the "+" icon in source and select the derived column and fill the all mandatory fields in Derived column's settings field. In columns you need to give a name to new column and give a Expression. In over case we need to covert to floating data type. (Expression: toFloat(Amount))

Next step is split data into two separate flow. For this use the Conditional Split component. I will split Amount into two path which is grater than 5000 and less than 5000.

This will separate into two path.

Now we have split the data into two path and need to store them in target files. For that click the "+" icon and select sink. Fill the mandatory fields and select the correct target dataset.