top of page

Azure Data Factory - Source Data Split Into Two Destination Tables

What is Azure Data Factory?


Azure Data Factory is a fully managed data migration and integration service that enables Microsoft Azure users to bring together data from a variety of sources in the Azure public cloud.


The Azure Data Factory service allows users to integrate both on-premises data in Microsoft SQL Server, as well as cloud data in Azure SQL Database, Azure Blob Storage, and Azure Table Storage.


Once Azure Data Factory collects the relevant data, it can be processed by tools like Azure HDInsight (Apache Hive and Apache Pig). Azure Data Factory automates and orchestrates the entire data integration process from end to end, so that users have a single pane of glass into their ETL data pipelines.


Let get Start!


Step 1

Create an Azure free Account then login to Azure web portal. Frist you have to create an Azure Resource Group. After that create a Blob Storage Account and Data Factory.


In Blob Storage Account Create two Containers called input and output. (You can also create one Container)


ree

Then upload the files into input and output containers. You can refer below files.

For input container upload OrderDetails.csv file and other two files upload into output container.


Step 2

Go to Data Factory and hit the Author & Monitor button.

ree

In Data Factory go to Manage section and create a Linked services. Linked service is connection to the data source. For create new Linked service you need to click to "+" icon.

ree

Then in Data store you need click Azure Blob Storage.

ree

Fill the details according to Azure Blob Storage you created early.

ree

Then test the connection to verify whether connection is successful or not. Then hit the crate button to create the connection.

ree

Step 3

Now its time to create Dataset. Go to Author section and click Datasets to create a new dataset. If you need you can create a folder also.

ree

After that in Data factory portal new interface popup. Find the Azure Blob Storage and click it.

ree

Select DelimitedText as file format.

ree

Then file the property fields and set file path correctly.

ree

Likewise set other two datasets and give proper names.


Step 4

Its time to create pipeline. For that go to Pipelines in Author section and select New pipeline

ree

In properties you can add a name to pipeline and give a nice Description to get idea to other developers.

ree

In Move & transform drag and drop the Data flow.

ree

In Adding data flow click Create new data flow radio button then click the Data flow button.

ree

It's time to create data flow mapping. First component in data flow is data source. We can add more data sources but for this process we have to create one source. So in source settings fill the mandatory fields (indicate in *). Select Source type as Dataset and Dataset as OrderDetailsSurce.

ree

Make sure to turn on Data flow debug because to track record move correct way. You can see it in Data preview tab.

ree
ree

Normally csv file contain all string data type. So numeric values also in string data type. Next step is change strings values to numeric data type. For that we can use derived column. Click the "+" icon in source and select the derived column and fill the all mandatory fields in Derived column's settings field. In columns you need to give a name to new column and give a Expression. In over case we need to covert to floating data type. (Expression: toFloat(Amount))

ree

Next step is split data into two separate flow. For this use the Conditional Split component. I will split Amount into two path which is grater than 5000 and less than 5000.

ree

This will separate into two path.

ree

Now we have split the data into two path and need to store them in target files. For that click the "+" icon and select sink. Fill the mandatory fields and select the correct target dataset.

ree

After that need to map correct columns with target columns. Go to mapping tab and turn off the Auto mapping.

ree

And mapping the correct columns.

ree

Do the same process for the other target source.

ree

Finally go to Data preview and check the records are correct or not.

ree

There is trigger now option available in ADF. Click it and check to output container in Azure Blob Storage whether data ingest.


I hope you all get some idea about Azure Data Factory.


Good Luck!

 
 
 

Comments


Subscribe Here.

Thanks for subscribing!

+94766088392

Colombo, Sri Lanka.

  • LinkedIn
  • Facebook
  • Instagram
bottom of page