Overview

This is a ~3 hour class that will introduce Hadoop to HPC users with a background in numerical simulation. We will walk through a brief overview of:

The Hadoop File System (HDFS)
Map Reduce
Pig
Spark

Most examples will be written in Python.

VM Instructions

This course will feature hands-on work with a 1-node Hadoop cluster running on your laptop. The VMs are created with Vagrant. Before the course, ensure this is up and running:

Install VirtualBox on your laptop
Download the virtual machine image you want to use:
- Full Size VM with GUI (require peak of ~8GB free disk space)
- Smaller, Text-only (require peak of ~6GB free disk space)
Start VirtualBox
"Import Appliance", and select the downloaded image; this will uncompress the image which will take some minutes.
Start the new virtual machine.

The GUI VM will start up a console with a full desktop environment; you can open a terminal and begin working. For the text VM, you will have to login to the console; the username/password is vagrant/vagrant. For either machine, you can also ssh into the VM from your laptop from the terminal:

ssh vagrant@192.168.33.10

or to the laptop from the VM with

ssh [username]@192.168.33.1

Then make sure everything is working:

From a terminal, start up the hadoop cluster by typing
```
~/bin/init.sh
```
You may have to answer "yes" a few times to start up some servers.
Go to one of the example directories by typing
```
cd ~/examples/wordcount/streaming
```
Then start the example by typing
```
make
```

You've now run your (maybe) first Hadoop job!

If you'd like, you can also create the virtual machine image yourself by downloading Vagrant and the Vagrantfile for the GUI or text image and running "vagrant up".

Hadoop for HPCers

Overview

VM Instructions

Navigation menu

Search