Implementation Of RDBMS Via Sqoop In A Small Hadoop Cluster

Chia, Li Yen (2015) Implementation Of RDBMS Via Sqoop In A Small Hadoop Cluster. Project Report. UTeM, Melaka, Malaysia. (Submitted)

Text (24 Pages)
Implementation Of RDBMS Via Sqoop In A Small Hadoop Cluster 24 Pages.pdf - Submitted Version
Download (260kB)

Official URL: http://library.utem.edu.my:8000/elmu/index.jsp?mod...

Abstract

Hadoop is an open source project for distributed storage and processing of large sets of data on commodity hardware. Hadoop works well with structured as well as unstructured data. Basically, Hadoop is not a database, it is a distributed file system (HDFS) to let user store large amount of data on a cloud of machines and handling data redundancy. On top of it, Hadoop provides an API for processing the stored data, which is known as Map-Reduce. The basic idea is, since the data is stored in many nodes, so better process the data in a distributed way where each node can process the data stored on it rather than spend a lot of time moving it over the network. Sqoop (SQL-to-Hadoop) is used to extract data from non-Hadoop data stores, transform the data into a form usable by Hadoop, and then load the data into HDFS. This process is called ETL, for Extract, Transform, and Load. For those existing big company that want to use Hadoop for data storage, Sqoop will be used to maintain the old existing data and bring those old data into Hadoop. Sqoop also can export the data from Hadoop to non-Hadoop data stores. Therefore, it provides bi-directional data transfer between Hadoop and non-Hadoop data stores. This project is a research on implementation of RDBMS via Sqoop in a small Hadoop cluster. The existing old data in RDBMS will be imported into Hadoop cluster by using Sqoop component of Hadoop and the new data will be inserted into HDFS of Hadoop. After the import process is carried out, an application is created and designed to show how does the old data from those standalone databases can integrate well with each other and combine with the new data that will be inserted via interface.

Item Type:	Final Year Project (Project Report)
Uncontrolled Keywords:	Electronic data processing, Cloud computing, Apache Hadoop
Subjects:	Q Science > Q Science (General) Q Science > QA Mathematics > QA76 Computer software
Divisions:	Library > Final Year Project > FTMK
Depositing User:	Nor Aini Md. Jali
Date Deposited:	21 Nov 2016 00:38
Last Modified:	21 Nov 2016 00:38
URI:	http://digitalcollection.utem.edu.my/id/eprint/17644

Actions (login required)

View Item

Altmetric

Download Statistics

Downloads

Downloads per month over past year