Send
Close Add comments:
(status displays here)
Got it! This site "robinsnyder.com" uses cookies. You consent to this by clicking on "Got it!" or by continuing to use this website. Note: This appears on each machine/browser from which this site is accessed.
Hadoop, Mahaut, MapReduce, etc.
1. Hadoop, Mahaut, MapReduce, etc.
This page has as introduction to Hadoop, Mahaut, MapReduce and related technologies.
2. Apache Hadoop
Apache Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Wikipedia, 2020-04-29.
The elephant symbol came from a stuffed elephant of the creator's son.
3. Apache Mahaut
Apache Mahout(TM) is a distributed linear algebra framework and mathematically expressive Scala DSL designed to let mathematicians, statisticians, and data scientists quickly implement their own algorithms. Apache Spark is the recommended out-of-the-box distributed back-end, or can be extended to other distributed backends. Apache Mahaut web site, 2020-04-29.
The Apache Mahaut web site is at
https://mahout.apache.org/.
4. Elephant rider
The term "
Mahout" comes from the Indian name for an elephant rider or trainer.
A mahout is an elephant rider, trainer, or keeper. Usually, a mahout starts as a boy in the family profession when he is assigned an elephant early in its life. They remain bonded to each other throughout their lives. Wikipedia, 2020-04-29.
The word mahout derives from the Hindi words mahaut (महौत) and mahavat (महावत), and originally from the Sanskrit mahamatra (महामात्र). Wikipedia, 2020-04-29.
Part of the motivation of the name "
Mahaut" (elephant rider) was that it was software to help manage and run "
Hadoop" (elephant) to solve problems using massive parallel architectures involving code and data.
5. MapReduce
MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster. Wikipedia, 2020-04-29.
6. Map and reduce
A MapReduce program is composed of a map procedure, which performs filtering and sorting (such as sorting students by first name into queues, one queue for each name), and a reduce method, which performs a summary operation (such as counting the number of students in each queue, yielding name frequencies). Wikipedia, 2020-04-29.
7. Google
The name MapReduce originally referred to the proprietary Google technology, but has since been genericized. Wikipedia, 2020-04-29.
8. Overview
The ideas of map and reduce come from functional programming.
It is also based on the idea of moving code to the data (in a functional programming sense) rather than moving the data to the code (in an imperative programming sense).
Here is a common starting point example from Wikipedia, 2020-04-29.
Think of a large network of distributed and connected computers.
9. Map
function map(String name, String document):
// name: document name
// document: document contents
for each word w in document:
emit (w, 1)
(Wikipedia, 2020-04-29)
10. Reduce
function reduce(String word, Iterator partialCounts):
// word: a word
// partialCounts: a list of aggregated partial counts
sum = 0
for each pc in partialCounts:
sum += pc
emit (word, sum)
(Wikipedia, 2020-04-29)
11. End of page