Integrating crucial business data manually from different sources is both time-taking and energy-draining. Also, there is always the risk of errors and security vulnerabilities. To put a stop to this error-prone process and handle the data integration task more efficiently, you can opt for the Azure Data Factory pipeline.
Azure Data Factory pipeline assists in seamless data transformation, running queries, and copying business data from one datastore to another. Additionally, these pipelines allow you to administer, deploy, and schedule activities in groups instead of individually, easing your job.
By creating ADF pipelines, you can secure your sensitive business data, as only your team members will have access to it.
If you are new to this cloud-based technology and need a guide to create your first ADF pipeline, this write-up can help. Here, you can easily gain insight into how to create an ADF pipeline in 7 steps.
What is Pipeline in Azure Data Factory?
Azure Data Factory pipeline is a logical accumulation of activities that lets you execute a particular task in a group. For instance, a pipeline might include an array of activities that clear log data before running a map data flow.
Like ADF, AWS Glue is also an excellent cloud technology you can integrate into your business for smooth data integration. To decide which will be an ideal fit for your venture, you can quickly check our guide Azure Data Factory vs AWS Glue: in-depth comparison tutorial for a quick comparison.
What are the Different Types of Pipelines in Azure?
Primarily, there are two fundamental types of pipelines in ADF, including:
- ADF Pipelines: ADF pipelines assist in automating the tasks of data integration. With these pipelines, you can conveniently copy data, transform it, and run queries.
- Azure Pipelines: You can use Azure pipelines to automate tasks like developing, testing, and launching a software, website, or application.
Not to mention, ADF pillars and activities let you build necessary pipelines and other building blocks for creating end-to-end data-driven workflows. Here is a detailed Azure Data Factory tutorial that you can refer to 3 essential activities in Azure Data Factory.
How Do I Create a Pipeline in Azure Data Factory?
Through these 7 simple steps, you can easily create your ADF pipeline and tackle the data transformation tasks error-freely!
Step 1: Prerequisites
Before getting started with the Azure Data Factory tutorial for pipeline creation, make sure you do these three things:
- Azure subscription
- Azure storage account
- Azure SQL database
Step 2: Prepare a SQL Table & Blob Storage
The next step involves creating a SQL database and a blob storage.
Create a Blob Storage
First, open your Notepad, copy the below-listed text, and save it on your disk with the file name emp.txt.
FirstName,LastName
Peter,Johnson
Joey,Johnson
Then, create a container in your blob storage; you can name it adfdemo. Next, you need to make a folder in that pre-created container called input. Then copy your pre-saved emp.txt file and paste it into the input folder. To execute these tasks easily, you can employ the Azure portal or tools such as Azure Storage Explorer.
Create a SQL Table
Next, you need to create the dbo.emp table in your database, for which you need to use the following SQL script:
CREATE TABLE dbo.emp
(
ID int IDENTITY(1,1) NOT NULL,
FirstName varchar(50),
LastName varchar(50)
)
GO
CREATE CLUSTERED INDEX IX_emp_ID ON dbo.emp (ID);
To complete the tasks, enable Azure to access your SQL server. Make sure that the option ‘Allow access to ‘Azure services’ for your SQL server is ‘ON.’ To cross-check whether it is on, visit ‘logical SQL server’ > ‘Overview’ > ‘Set server firewall’> Set the ‘Allow’ access to ‘Azure services’ option to ‘ON.’
Step 3: Create an ADF Factory
Now, you will require creating a Data Factory and launch its UI to start creating Azure Data Factory pipelines. For that,
- Open any of the web browsers between Google Chrome and Microsoft Edge (ADF UI is available only in these two browsers for now).
- Then, log into the ‘Microsoft Azure Portal’ or you can open the portal from Chrome, and locate the option,‘Create a Resource’. Then click on ‘Create a Resource’> ‘Integration’ > ‘Data Factory’ (you can find it in the left menu).
- Tap on ‘Azure Subscription’ and then select ‘Create Data Factory’ page to create your Data Factory. You can find this option under the tab ‘Basics.’
- To prepare Resource Group, you can proceed with any one option from the following ones:
Locate the dropdown menu, and opt for any existing resource group.
Or, create a new resource group and name it from the ‘Create new’ option.
- After that, you need to select a location under the dropdown list, ‘Region,’ for your newly created Data Factory. In the dropdown list, you can only find those locations your Data Factory supports. ‘Azure Storage’ and ‘SQL Database’ might not appear there; you can find them in different locations.
- Then, you would require entering a name, for instance: ADFTutorialDataFactory, under the ‘Name’ section.
#NOTE: The name you enter for your Azure Data Factory must be globally unique. So, in case you receive an error message for the name value, replace it with a unique name of your choice.
- Then, tap on V2, which you can find under the ‘Version’ dropdown list.
- You can see the ‘Git configuration’ tab on the top of the screen below ‘Basics.’ Tap on it, and select the ‘Configure Git later’ check box.
- Creating a Data Factory page ends at this stage, and you can get an instant notification alert after completing it. To navigate your Data Factory page, tap the Go to ‘Resource’ option.
- Finally, select ‘Open’ from the ‘Open Azure Data Factory Studio’ tile and launch your ADF UI in a new tab.
Step 4: Create an ADF Pipeline
You need to create a pipeline in your newly created Data Factory using a copy action. The copy action assists in transferring data to the SQL database from the blob source. The steps to construct an ADF pipeline include:
- Linked service creation
- Input and output datasets creation
- Pipeline creation
- First, you need to log into the Azure Data Factory Portal from your web browser (Microsoft or Chrome)and locate the home page.
- Once the home screen opens, locate ‘Orchestrate,’ and click on it.
- Then locate the ‘General panel’ under the section ‘Properties’ and enter CopyPipeline for Name. After that, tap on the Properties icon in the top-right corner to collapse the panel.
- Go to the ‘Activities’ toolbox and expand the Move and Transform category. Next, drag the ‘Copy Data’ activity to the pipeline designer surface from that toolbox. You can specify the name as CopyFromBlobToSql.
Configure a Source
Once you complete the 4th step, you need to configure a source. Follow these steps:
- Click on the ‘Source Tab’, which you can find beside the ‘General’ tab, and add a new dataset source. Then tap on the ‘Add New’ button.
- Click on ‘Azure Blob Storage’ from the ‘New Dataset’ dialogue box, and tap on the ‘Continue’ button.
- Navigate the Select ‘Format’ dialogue box, opt for the data format type, and then proceed with the ‘Continue’ option.
- Select the ‘Set Properties’ dialogue box, and enter the name, SourceBlobDataset. Then, check the box named ‘First row as header.’ Navigate to the ‘Linked Service’ text box and click on the +New option.
- Then, select the ‘New Linked Service’ dialogue box and enter AzureStorageLinkedService as the name. Then navigate to the ‘Storage’ account name list and choose your storage account. Test the connections and deploy your associated service by selecting the ‘Create’ option.
- Finally, browse the ‘File path,’ choose the ‘emp.txt file’ from the adfdemo/input folder, and click ‘OK’. It will redirect you to the pipeline page. All you need to do is confirm that ‘SourceBlobDataset’ is selected from the ‘Source’ tab.
Configure Sink
- To create a sink dataset, navigate to the ‘Sink’ tab (beside ‘Source’ tab) and click the ‘+New’ option.
- To refine the ‘New dataset’ dialogue box connectors, open the search field and type SQL. After that, pick ‘Azure SQL Database’ and select the ‘Continue’ button.
- Go to the ‘Set Properties’ dialogue box, and type OutputSqlDataset in the ‘Name’ section. After that, tap on the +New option from the ‘Linked Service’ dropdown list.
- Now open the ‘New Linked Service’ (Azure SQL Database) dialogue box and proceed with the following steps:
-
- Open the ‘Name field’ and type AzureSqlDatabaseLinkedService
- Then go to ‘Server name’ and click on your SQL Server.
- Select ‘Database name’ and click on your database.
- Navigate ‘User name,’ and enter your username
- Navigate ‘Password’ and enter your password
- Tap on the ‘Test connection’ option to assess the connection
- Finally, select ‘Create’ to deploy the associated service.
- Open the ‘Name field’ and type AzureSqlDatabaseLinkedService
You will be redirected to the ‘Set Properties’ dialogue box. There, navigate to the ‘Table’ dropdown menu and choose [dbo].[emp]. Then, tap on ‘OK.’ Select the ‘pipeline’ tab and check whether OutputSqlDataset is selected in Sink Dataset.
Step 5: Validate the ADF Pipeline
Once you create your Azure Data Factory pipeline, you must validate it. For that:
- Navigate the ‘toolbar,’ then choose the ‘Validate’ option.
- You will come across the ‘Code’ option on the upper right side of the screen and notice a JSON code interlinked with that pipeline.
Step 6: Debug Your ADF Pipeline and Publish It
Now it’s almost time to publish your ADF pipeline. But before that, you must test its performance and debug it if necessary. So, before publishing your pipeline, datasets, and associated services to the Data Factory, undertake the following steps to debug them:
- Look for the ‘Debug’ option from the toolbar, and click on it. After clicking on it, you can notice the ‘Output’ tab, where you can check the pipeline run.
- Go to the toolbar again, and click on the ‘Publish all’ button
- You will receive a message, ‘Successfully published.’
Step 7: Trigger Your ADF Pipeline Manually
This is the last step of the Azure Data Factory tutorial for pipeline creation, where you must manually trigger the pipeline. For that:
- Go to the toolbar, and click ‘Trigger’> ‘Trigger Now.’ After that, tap on the OK button on the Pipeline Run page.
- On the left side of the screen, you can find the ‘Monitor’ tab and notice that a pipeline is running as you manually trigger it.
- Click on the ‘CopyPipeline’ link under the ‘PIPELINE NAME’ column to see the ‘Activity runs’ page. Tap the Details link (eyeglasses icon) under the ‘ACTIVITY NAME’ column to learn more about the activity run. If you want to return to the ‘Pipeline Runs’ page, click ‘All pipeline runs’ at the top. To refresh the page, select ‘Refresh.’
- Now, you can see two more rows in the emp table.
This is how you can create your ADF pipeline and handle your data transformation tasks hassle-freely. In case you want to learn more about how ADF works, its key components, and its benefits, you can check our detailed guide on Azure Data Factory.
TL,DR
- Azure Data Factory Pipeline Tutorial:
- What is Pipeline in Azure Data Factory?
- What are the Different Types of Pipelines in Azure?
- How Do I Create a Pipeline in Azure Data Factory?
- Step 1: Prerequisites
- Step 2: Prepare a SQL Table & Blob Storage
- Step 3: Create an ADF Factory
- Step 4: Create an ADF Pipeline
- Step 5: Validate the ADF Pipeline
- Step 6: Debug Your ADF Pipeline and Publish It
- Step 7: Trigger Your ADF Pipeline Manually
Team Up with Inferenz To Get Excellent ADF Solutions
If you create them correctly, Azure Data Factory pipelines can shower your business with multiple benefits. In case you find it intricate to handle the Azure Data Factory pipeline creation task independently, the Inferenz experts can handle it!
Besides helping you create your ADF pipelines, we can provide you with the best cloud solutions and automate tasks that are challenging to administer manually. For a detailed discussion, feel free to call us anytime!