Hadoop for HPCers
Overview
This is a ~3 hour class that will introduce Hadoop to HPC users with a background in numerical simulation. We will walk through a brief overview of:
- The Hadoop File System (HDFS)
- Map Reduce
- Pig
- Spark
Most examples will be written in Python.
VM Instructions
This course will feature hands-on work with a 1-node Hadoop cluster running on your laptop. The VMs are created with Vagrant. Before the course, ensure this is up and running:
- Install VirtualBox on your laptop and start it. (Note! At time of writing, the newest version, 4.3.14, is broken on at least Mac and Windows; you'll want to install 4.3.12 from "older builds".)
- Under Settings or Preferences, go to Network, then Host-only networks, and add/create two host-only networks.
- Then download the virtual machine image you want to use:
- "Import Appliance", and select the downloaded image; this will uncompress the image which will take some minutes.
- Start the new virtual machine.
If you get any warnings about shared folders not existing, that's fine.
The GUI VM will start up a console with a full desktop environment; you can open a terminal and begin working. For the text VM, you will have to login to the console; the username/password is vagrant/vagrant. For either machine, you can also ssh into the VM from your laptop from the terminal:
ssh vagrant@192.168.33.10
(or
ssh -p 2222 vagrant@localhost
) or to the laptop from the VM with
ssh [username]@192.168.33.1
.
(If that particular address pair doesn't work, from a window within the VM, type "ifconfig" to find a line like "inet addr: 192.168...." or "inet adde: 10. .."; that's the VMs IP address)
Then make sure everything is working:
- From a terminal, start up the hadoop cluster by typing
~/bin/init.sh
You may have to answer "yes" a few times to start up some servers. - Go to one of the example directories by typing
cd ~/examples/wordcount/streaming
- Then start the example by typing
make
You've now run your (maybe) first Hadoop job!
If you'd like, you can also create the virtual machine image yourself by downloading Vagrant and the Vagrantfile for the GUI or text image and running "vagrant up". If you vagrant-up the GUI VM, you will have to "vagrant reload" after installation is completed to restart with all the software installed.
If you can't get the VM working for whatever reason, please contact us and we will make alternate arrangements.
Updated Examples
If you've downloaded the image before Wednesday morning, from within the VM you may want to download the updated examples from https://support.scinet.utoronto.ca/~ljdursi/Hadoop/examples.tgz