Overview

This is a ~3 hour class that will introduce Hadoop to HPC users with a background in numerical simulation. We will walk through a brief overview of:

The Hadoop File System (HDFS)
Map Reduce
Pig
Spark

Most examples will be written in Python.

VM Instructions

This course will feature hands-on work with a 1-node Hadoop cluster running on your laptop. The VMs are created with Vagrant. Before the course, ensure this is up and running:

Install VirtualBox on your laptop
Download the virtual machine image you want to use:
- Full Size VM with GUI (require peak of ~8GB free disk space)
- Smaller, Text-only (require peak of ~6GB free disk space)
Start VirtualBox
"Import Appliance", and select the downloaded image; this will uncompress the image which will take some minutes.
Start the new virtual machine.

The GUI VM will start up a console with a full desktop environment; you can open a terminal and begin working. For the text VM, you will have to login to the console; the username/password is vagrant/vagrant. For either machine, you can also ssh into the VM from your laptop from the terminal:

ssh vagrant@192.168.33.10

or to the laptop from the VM with

ssh [username]@192.168.33.1

.

Then make sure everything is working:

From a terminal, start up the hadoop cluster by typing
```
~/bin/init.sh
```
You may have to answer "yes" a few times to start up some servers.
Go to one of the example directories by typing
```
cd ~/examples/wordcount/streaming
```
Then start the example by typing
```
make
```

You've now run your (maybe) first Hadoop job!

Hadoop for HPCers

Overview

VM Instructions

Navigation menu

Search