Four Articles about Big Data.
PART I — Cloud Computing, Hardware and Software
PART IV — Digital Transformation
PART II — Big Data Technologies — Hadoop System Technology

Hadoop is the technology that enabled data scalability in Big Data.
It is a free software platform developed in Java language for cluster-oriented distributed computing and processing large volumes of data, with attention to fault tolerance.
If Windows is the operating system of microcomputers, Hadoop would be the Big Data Operating System.
Doug Cutting, the inventor of Hadoop, relied on two Google articles published in the years 2003/04.
One on file systems and another on a programming method called MapReduce to create Hadoop.
He is Chief Architect at Cloudera, the most significant Hadoop distributor in the world.
CURIOSITIES
- Doug Cutting, nicknamed the system he was developing in honor of a toy elephant of his son named "Hadoop."
- Hadoop removes the complexity of high-performance computing and can be installed on conventional machines, making use of parallel processing capabilities in a computer network, being fault tolerant, requiring few administrators and developers.
- Hadoop was developed in Java (open source) and inspired by Google's GFS and MapReduce.
- Hadoop joined Apache Lucene in 2006, and in 2008 became a top-level Apache Foundation project.
- Yahoo has long proved to be the most significant contributor to the Hadoop Project.
- Facebook in 2010 announced Hadoop installations in 2,900 servers with 30 PB of data.
1 — Hadoop Installations

Hadoop is excellent for working with huge amounts of data in Big Data.
Describing Hadoop installations.
1 — Yahoo
Yahoo has over 120,000 servers running Hadoop and 800 PB of data storage. Few world companies have an infrastructure comparable.
Doug Cutting started Hadoop in a project within Yahoo.
2 — Facebook
One of the massive Hadoop clusters of the world is Facebook. It have over 4,000 Clusters and hold hundreds of millions of GigaBytes.
Developers use Hive, a subset of SQL (database query language) to search data on Hadoop servers.
3 — Hortonworks
According to Hortonworks, which provides Hadoop platform, only one leading Hadoop services customers have resources with 4,500 servers and 200 PB of data being handled with over one billion files and blocks of data on the Hadoop platform.
Cloudera and Hortonworks merged in one company, with the name Cloudera.
CURIOSITIES
- Google, Yahoo, and Facebook are related historically with Hadoop. They flitted from the ground up with this technology, developing new Big Data knowledge for the free software market.
- Hadoop has become available to small and medium businesses from Big Data Cloud Computing services.
- Petabyte (PB) is the world most used Big Data measured. 1 PB = 1 000 000 000 000 000 Bytes = 10ˆ15bytes = 1000 Terabytes)
- Hadoop is the technology that can process all these amounts of data.
2 — Hadoop Distributions
Hadoop Distributions (credits pixabay)
Big Data, Hadoop, and Data Science are connected technologies. Many professionals working in these areas on Google, Yahoo, and other Silicon Valley companies, such as LinkedIn have turned away from these companies to create new Hadoop-based companies.
Three Hadoop distributors have stood out, which are:
- Cloudera
- MAPR
- HortonWorks
They allow you to download Hadoop for free and install it on your computer for testing and study.
Amazon Elastic Map Reduce, Microsoft Azure HDInsight, Google Cloud, IBM, SAP (Altiscale) and DELL among others, offer their Hadoop products, services and support from one of these Hadoop distributors.
Another essential company is DataBricks, which developed and popularized the use of Spark (real-time memory processing) by solving batch processing issues with MapReduce.
Cloudera and Hortonworks have merged.
CURIOSITIES
- Three engineers from Google, Yahoo, and Facebook (Christophe Bisciglia, Amr Awadallah, and Jeff Hammerbacher) teamed up with Mike Olson, a retired Oracle executive to create Cloudera in 2008.
- Cloudera is one of the fastest growing companies in North America (Deloitte's 2017 Technology Fast 500).
- Hortonworks was established in the 2011 and founded by 24 Engineers originated from Hadoop and Yahoo. The original Engineers team accumulate more experience in Hadoop than any other organization in the world.
- MapR, founded in 2009 with private equity of US $9 million financing from Lightspeed Venture Partners and New Enterprise Associates. Founders come from Google, Lightspeed Venture Partners, and EMC Corporation.
Support the Author's work and Subscribe to email.
More information about this article
This article has been selected from the book "Big Data for Executives and Market Professionals — Second Edition".

Read the other Articles: