Cloud Computing, Hadoop, MapReduce

This module will provide an introduction to the Utility Cloud. It will also describe a relatively new technique for scaling the processing of massive data in cloud.

â€¢ Describe the principles of the utility cloud.
â€¢ Use the MapReduce approach to write pseudocode that manipulates large data on the cloud.
â€¢ Explain the elements of the MapReduce architecture.

Apache Hadoop

1. Discuss where the MapReduce paradigm fits into the Utility Cloud.
2. Discuss the characteristics of distributed algorithms that are best supported by the MapReduce paradigm verses those that are not. Provide some examples.

1. Suppose that our Hadoop system contains a large terabyte-sized file with Ï€ written out to 10Â¹Â² places. HDFS divides the file into many shards. Write MapReduce pseudocode to determine the number of 3 digit combinations of digits contained in the decimal portion of Ï€. For example, given Ï€ written as:

3.1415926535897932384626433832795028841971693993751
05820974944592307816406286208998628034825342117067
98214808651328230664709384460955058223172535940812
84811174502841027019385211055596446229489549303819
64428810975665933446128475648233786783165271201909
14564856692346034861045432664821339360726024914127
37245870066063155881748815209209628292540917153643
67892590360011330530548820466521384146951941511609
43305727036575959195309218611738193261179310511854
80744623799627495673518857527248912279381830119491

The result file should consist of <key, value> pairs that look like the following:

<141, 2>
<415, 2>
<159, 1>
<592, 2>
<926, 1>
â€¦

Write the Map and Reduce pseudocode. Do not concern yourselves with three digit combinations that span shards, where for example one digit is at the end of one shard and two digits are at the beginning of the next shard.

a) Map algorithm
b) Reduce algorithm

MapReduce, Hadoop & HDFS

This module will provide an in-depth description of the Hadoop Data File System (HDFS) as well as discuss some design patterns for advanced MapReduce techniques.

â€¢ Describe the underpinnings of HDFS.
â€¢ Use some basic HDFS command line calls.
â€¢ Write HDFS code with data compression and decompression.
â€¢ Write MapReduce code using advanced design patterns.

1. Discuss modifications to the HDFS architecture that would enable better performance for operations that require random access to the data. Are these modifications worth it?

Cloud Computing, Hadoop, MapReduce This module will provide an i...