Loading on premise data to Azure SQL Database in Talend

Loading on premise data to Azure SQL Database

Talend Open Studio for Data Integration Version: 6.3.1
Java Compiler: 1.7
OS: Windows 8

This article will walk through the following steps to load data from on premise Oracle instance to Sql Server instance running on Microsoft Azure.

1. Find out the JDBC details for Sql Server on Azure
2. Download jdbc driver to configure in Talend
3. Create Open JDBC connection using external jar
4. Update the component properties for salesforce and connect to it to target
5. Complete the mapping and execute the job

This article will not cover the installation and setup of Talend Open Studio. The assumption is talend already installed and is working correctly. For details on how to install and configure Talend Open Studio see this post.

In the previous post we saw how to load data from Salesforce to oracle database using SalesforceInput, tMap, tOracleOutput, tLogRow. Today we will try to connect a sql server instance running on Microsoft Azure and loads a huge chunk of data from on-premise to cloud instance. If you do not have an account then sign it up here. Once it is successfully created you can see $ 150 credited to your account. This amount can be used for the resources that you are planning to use and has a validity of one month :).

As you can see below I have created two instances of SQL Server located at South India and South UK. The sole purpose of two different geographic location was to test speed and latency.

Get the database connection details to use inside Talend.

I had tried using the default driver comes with Talend and found that it is extremely slow and loads just 20 records per second. Below screenshots are from default sql server connection.

Retrieve schema will show you schema and objects details.

Since it was damn slow I decided to use the latest JDBC driver provided by Microsoft.  To use this driver we have to create generic jdbc connection. Update the azure instance parameters as shown below. Notice I have added the external jar “sqljdbc42.jar”. The only limitation with this driver is that you cannot retrieve the schema like we did above.

At the moment I have a customer table on both the database and they have one to one mapping with a simple transformation to uppercase the customer name.

After execution you can see the number of rows processed per second. This time this was super fast :). It took around around 12 minutes to process 710K records. I am sure this can be further tuned if I could have used a LAN instead of WiFi. There are many other parameters that can be taken in to consideration to bring it down from 12 to 6 min.

I was also monitoring the usage of DTU consumption, CPU percentage and few other parameters that was relevant to my POC. And you can notice it was hitting around 8 to 9% of  total DTU allocated as part of S2 Standard (50 DTU). If you want to know more about DTU refer this link.

Thats all for today.

Let me know if you have any questions.

About Bhabani 86 Articles
Bhabani has 12 plus years of experience in Data warehousing and Analytics projects that has span across multiple domains like Travel, Banking and Financial, Betting and Gaming Industries. Solution areas he focuses on designing the data warehouse and integrating it with cloud platforms like AWS or GCP. He is also a Elite level contributor at OTN forum more than 9 years. He loves to do experiment and POC on different integration tools and services. Some of his favorite skills are Redshift, Big Query, Python, Apache Airflow, Kafka, HDFS, Map Reduce ,HIVE, Habse, Sqoop, Drill, Impala.


  1. Thanks. It is very helpful.

    Had a question though.

    Where is Talend installed in this case? On Premise or on Azure Cloud/ Azure VM?

    • Hi Venkatesh,
      Welcome to Dw Team!!
      Its a place where you can ask for help as well as contribute to the questions raised by others.

      Regarding your question, Its installed on on-premise machines. However in an enterprise architecture this will be normally installed on a cloud machine as part of IAAS. And this can vary from organization to organization based on their architecture.

  2. Thanks This is very helpful.

    I tried to configure database connection in Talend. I followed the same steps but I get error as Network error IO exception. is there any specific settings to be done on the cloud db.

Comments are closed.