Practical Issues using Distributed Computing Environments – Apache Hadoop

Authors

  • Toma Cristian Bucharest Academy of Economic Studies, Romania

Keywords:

Apache Hadoop, HTC – High Throughput Computing/HPC – High Performance Computing, distributed computing, map-reduce, distributed file system

Abstract

The paper presents practical results obtained with sequential standard programming, RPC/RMI mechanism and Apache Hadoop distributed computing platform for a problem of computation time and power that might be used in e-mail text searching. First section is about Distributed Computing technologies and middleware introduction. In second and third section are shown few details about RPC/RMI and Apache Hadoop approaches. The fourth section presents the results of the computation for a classic problem such as word counting from large text files using standard versus remote procedure call versus map-reducing approach. In the end are shown the main advantages of the distributed systems and computing environments.

Author Biography

Toma Cristian, Bucharest Academy of Economic Studies, Romania

Faculty of Cybernetics, Statistics and Economic Informatics

Department of IT&C Technologies

References

Tom White, Hadoop – The Definitive Guide, O'Reilly Media, 528 pp, ISBN-10: 0596521979, ISBN-13: 978-0596521974, US 2009.

Chuck Lam, Hadoop in Action, Manning Publishing, 325 pp, ISBN-10: 1935182196 , ISBN-13: 978-1935182191, US 2010.

Apache Hadoop Project, http://hadoop.apache.org/

Hadoop Project, HDFS Architecture, available at http://hadoop.apache.org/docs/stable/hdfs_design.html

Vincent McBurney, So what is better, ETL or ELT?, Toolbox.com, Available at: http://it.toolbox.com/blogs/

Gopalan Suresh Raj, A Detailed Comparison of CORBA, DCOM and Java/RMI, available at: http://my.execpc.com/~gopalan/misc/compare.html

Wikipedia, Distributed computing, available at: http://en.wikipedia.org/wiki/Distributed_computing

Yahoo, Yahoo! Hadoop Tutorial, available at: http://developer.yahoo.com/hadoop/tutorial/

The Outline of Science, Vol. 1 (of 4) by J. Arthur Thomson - http://www.gutenberg.org/ebooks/20417

The Notebooks of Leonardo Da Vinci — Complete by Leonardo da Vinci - http://www.gutenberg.org/ebooks/5000

James Joyce, Ulysses, available at: http://www.gutenberg.org/ebooks/4300

Wikipedia, Cloud Computing, avaialble at: https://en.wikipedia.org/wiki/Cloud_computing

Google, Map-Reduce Programming, available at: https://developers.google.com/appengine/docs/python/dataprocessing/

Downloads

Published

2013-03-30

How to Cite

Cristian, T. (2013). Practical Issues using Distributed Computing Environments – Apache Hadoop. Journal of Mobile, Embedded and Distributed Systems, 5(1), 18-28. Retrieved from http://jmeds.eu/index.php/jmeds/article/view/92