Table of Contents
List of Figures 3
Literature Review 4
History of Hadoop Technology 4
Applications of Hadoop 6
Main Components of Hadoop 6
MapReduce. 6
Map Step. 7
Reduce Step. 8
Hadoop Distributed File System (HDFS) 8
Advantages and Disadvantages of using Hadoop 11
Advantages. 11
Disadvantages. 11
Competitors to the Hadoop Technology 12
Conclusion 13
References 15
List of Figures
Figure 1: MapReduce Programming Model 7
Figure 2: HDFS architecture 9
Figure 3: HDFS Operation Process 10
Literature Review
Hadoop is a free, java based programming framework that usually supports processing of large data sets that are in diverse computing environment. Hadoop is a cluster computation framework. Apache Hadoop is a programming framework that provides support to data intensive distributed applications with a free license. The program was inspired by Google file and Google’s MapReduce system (Eadline, 2013). According to Eadline (2013), the Hadoop technology was designed to solve various problems such as to provide a fast and reliable analysis for both complex and clustered data. Consequently, different organizational enterprises deployed Hadoop with the existing IT systems, thereby, allowing them to combine old and new data in a strong framework. The major industrial players who used Hadoop technology include IBM, Yahoo, and Google (Lavalle, Lesser, Shockley, Hopkins & Kruschwitz, 2011).
History of Hadoop Technology
Hadoop technology was created and developed by Doug Cutting who is recognized as the brain behind the Apache Lucene, a popularly known text search library. Originally, Hadoop had its source from Apache Nutch which was an Open Source search engine and also formed part of different Lucene projects. According to the project creator, Doug Cutting, the name Hadoop was not an acronym but just a makeup name. For instance, the ‘contrib’ module and other subprojects provided names that were basically unrelated to the functions of the program (Krishnan, 2013).
According to Lynch (2008), creating a web based search engine from scratch was an ambitious objective for the software requirement and the index website. The process of developing the system was costly but Doug Cutting and Mike Cafarella believed it was worth the cost. The success of this project unlocked the ultimately democratized algorithm of search engine system. After the success of this project, Nutch was started in 2002 as a working crawler and gave rise to the emergence of various search engines.
However, the developers came to realize that the Nutch systems architecture could not scale up the billions of pages available on the web. In 2003, the publication of Google File System (GFS) described the architecture as one of the most productive web hosting’s storage that is needed for the mega files generated.
a.k.a. a.k Web. The Web. The Web. 16 Apr. Foner, Eric, and John A. Garraty.
In the past number of years data has grown exponentially. This growth in data has created problems that and a race to better monitor, monetize, and organize it. Oracle is in the forefront of helping companies from different industries better handle this growing concern with data. Oracle provides analytical platforms and an architectural platform to provide solutions to companies. Furthermore, Oracle has provided software such as Oracle Business Intelligence Suite and Oracle Exalytics that have been instrumental in organizing and analyzing the phenomenon known as Big Data.
This entire thing only is available due to this engine. When the digital explosion happened it sent a wide quantity and selection of data and information all over the world. It was so easy for people to get a hold of but no one knew how to get to it. The book quotes in Chapter 4 “The search tools that help us find needles in the digital haystack have become the lenses through which we view the digital landscape. Businesses and governments use them to distort our picture of reality” When this search tool was invented the uses for it was incredible and Universities from all over the World wanted it.
Big Data is a term used to refer to extremely large and complex data sets that have grown beyond the ability to manage and analyse them with traditional data processing tools. However, Big Data contains a lot of valuable information which if extracted successfully, it will help a lot for business, scientific research, to predict the upcoming epidemic and even determining traffic conditions in real time. Therefore, these data must be collected, organized, storage, search, sharing in a different way than usual. In this article, invite you and learn about Big Data, methods people use to exploit it and how it helps our life.
The fascination about information management, the seminar on ‘Hadoop vs RDBMS’ as well as the exposure to data-ware housing made me realize the need of a concrete base in MIS. My long term goal is to conduct research in the field of Information Systems and I look forward to develop my career in the field of MIS and a graduate degree at University of ______, _______will be the right step in that direction.
Big Data is a term used to describe the large volume of data whether structured or unstructured that inundates a given operation on a daily basis (http://www.SAS.com). Big Data consists of data sets that are so huge and complex that the customary data processing applications would not adequately handle them. Of late, the concept of Big Data has been used to describe the use of predictive analysis, user behaviour analytics and other complex data analytics techniques for the extraction value from data. The concept of Big Data can be understood through the description of the three V’s as advanced by Doug Laney, who is an industry analyst. First, Big Data can be understood in terms of Volume, whereby organizations collect large data from a variety
In today’s society, technology has become more advanced than the human’s mind. Companies want to make sure that their information systems stay up-to-date with the rapidly growing technology. It is very important to senior-level executives and board of directions of companies that their systems can produce the right and best information for their company to result in a greater outcome and new organizational capabilities. Big data and data analytics are one of those important factors that contribute to a successful company and their updated software and information systems.
In today’s fast paced technology, search engines have become vastly popular use for people’s daily routines. A search engine is an information retrieval system that allows someone to search the...
In the society we live in, search engines are becoming a bigger
Having greater speed and capacity is important; however how a business uses this new power determines the success of big data. Several advantages are factored into the use and application of big data. First of all, working with the information gathered can now be managed with less complicated steps than with previous programs. Since there is a large quantity of information cultivated, a requirement is to allow end users to find the appropriate information quickly. Big data applications can be run by non-information technology employees. The information is able to be manipulated efficiently as self-service analytic programs are being cultivated for all executives to have the ability to gain easy access. Charts, infographics, and dashboards are now accessible to more than the information technology specialists. The next application advantage is the evolution of data analysis methods and ...
Search engines, specifically Google, have probably contributed more to the distribution of knowledge than any other invention since the creation of the printing press. Google was created by Larry Page and Serge...
Internet commerce is one of the fastest growing industries today. With the wide range of capabilities the web has it make it easier and cost efficient for businesses to make transactions with other businesses. One factor that allows businesses to find each other is search engines. Search engines are part of the reason the web is growing so rapidly.
One of the most prominent of these inventions is the use of Google. Google is the web based search engine which instantly searches for web content using algorithms. In a general sense users are able to, "hop from one source to another," endlessly through a swarm of search results (Carr, 1). Some would argue that the open access of information has allowed a fast and efficient research process. Despite the lauds, Google is influencing humanity in some form.
Big data originated with web search companies that encountered problems with querying large amounts of both structured and unstructured data. With regard to its background, “big data came into being when web search companies developed ways to perform distributed computing on large data sets on computer clusters” Floyer (2014: 1). Big data then spread to enterprises due to their adoption of developing, processing and dissemination of data.
Web 3.0 also means that if the user was to search for something such as ‘man’ it would not just display results just for ‘man’ it will also know to display ...