File Opeartion on Amazon S3 using Talend Open Studio

File Opeartion on Amazon S3 using Talend Open Studio:

Environment:
Talend Open Studio for Data Integration Version: 6.3.1
Java Compiler: 1.7
OS: Windows 8

 

In the previous post we learned what is Amazon RDS, how to see the running instance on cloud and how to load data from local instance to cloud instance. In this post I will demonstrate what is Amazon S3 and how to do some file operation on cloud. File operation in the sense I will be loading a local file to Amazon S3 and download it back to local directory  with the help of TOS components.

What is Amazon S3:  Amazon Simple Storage Service is storage for the Internet. It is designed to make web-scale computing easier for developers.

Amazon S3 has a simple web services interface that you can use to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites. The service aims to maximize benefits of scale and to pass those benefits on to developers.

Now login to amazon console and click on services. Then click on S3 from storage section.

1

 

Like we have folder in local directory, in S3 we call it as buckets. For this exercise I have created one bucket called talendbucketons3.

2

If you open above bucket you can see two files that have been uploaded previously.

3

Now to configure S3 in talend open studio we need some credentials from amazon console. Go to the security credentials under profile section and then click on “Continue to Security Crdentials”.

45

 

 

 

 

 

If you are signing in for the first time do not forget to download and save the access key, access secret key in your local folder. So I will note down both of them and use  it talend s3 component.

6

Now its time to create a job inside repository explorer. Login to Talend and right click on  Job Designs to create a new job as job_Put_Get_File_From_Amazon_S3_Bucket. Then drag tS3Connection, tS3Put, tS3Get, tS3Close.  Either you search them in the palette or start typing directly in the design workspace, a drop down will be popped up immediately.

7

Lets configure the properties of each components and save it. Also dont forget to create a text file employee.txt with some sample text in the local folder.

8

Now press F6 to build and execute the job. If you go back to Amazon console and browse the bucket under s3 you will see a file called employee_demo.txt. Similarly if you see the local folder, the same file has been downloaded successfully.

This is the output from execute job console.

Starting job job_Put_Get_File_From_Amazon_S3_Bucket at 17:47 26/03/2017.

[statistics] connecting to socket on port 3366
[statistics] connected
[statistics] disconnected
Job job_Put_Get_File_From_Amazon_S3_Bucket ended at 17:47 26/03/2017. [exit code=0]

Thats it for today. Let me know if you face any issues.

Thank you!!!

 

About Bhabani 86 Articles
Bhabani has 10 plus years of experience in Data warehousing and Analytics projects that has span across multiple domains like Travel, Banking and Financial, Betting and Gaming Industries. Solution areas he focuses on designing the data warehouse and integrating it with cloud platforms like AWS or GCP. He is also a Elite level contributor at OTN forum more than 9 years. He loves to do experiment and POC on different integration tools and services. Some of his favorite skills are Redshift, Big Query, Python, Apache Airflow, Kafka, HDFS, Map Reduce ,HIVE, Habse, Sqoop, Drill, Impala.

2 Comments

  1. Hi Bhabani,

    Could you please let me know what does the key and file mean in tS3get_1. Also if I want to pull the text from a file placed in a folder in a bucket , what path and where shall I mention it in the component to access the data . here as u have the files in the bucket , what if the files where in folder in the bucket, what connection details should i have mentioned.

Leave a Reply

Your email address will not be published.


*