Hadoop for HPCers

From oldwiki.scinet.utoronto.ca
Revision as of 13:28, 3 September 2014 by Ljdursi (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Overview

This is a ~3 hour class that will introduce Hadoop to HPC users with a background in numerical simulation. We will walk through a brief overview of:

  • The Hadoop File System (HDFS)
  • Map Reduce
  • Pig
  • Spark

Most examples will be written in Python.

VM Instructions

This course will feature hands-on work with a 1-node Hadoop cluster running on your laptop. The VMs are created with Vagrant. Before the course, ensure this is up and running:

If you get any warnings about shared folders not existing, that's fine.

The GUI VM will start up a console with a full desktop environment; you can open a terminal and begin working. For the text VM, you will have to login to the console; the username/password is vagrant/vagrant. For either machine, you can also ssh into the VM from your laptop from the terminal:

ssh vagrant@192.168.33.10

(or

ssh -p 2222 vagrant@localhost

) or to the laptop from the VM with

ssh [username]@192.168.33.1

.

(If that particular address pair doesn't work, from a window within the VM, type "ifconfig" to find a line like "inet addr: 192.168...." or "inet adde: 10. .."; that's the VMs IP address)

Then make sure everything is working:

  • From a terminal, start up the hadoop cluster by typing
    ~/bin/init.sh
    You may have to answer "yes" a few times to start up some servers.
  • Go to one of the example directories by typing
    cd ~/examples/wordcount/streaming
  • Then start the example by typing
    make

You've now run your (maybe) first Hadoop job!

If you'd like, you can also create the virtual machine image yourself by downloading Vagrant and the Vagrantfile for the GUI or text image and running "vagrant up". If you vagrant-up the GUI VM, you will have to "vagrant reload" after installation is completed to restart with all the software installed.

If you can't get the VM working for whatever reason, please contact us and we will make alternate arrangements.

Updated Examples

If you've downloaded the image before Wednesday morning, from within the VM you may want to download the updated examples from https://support.scinet.utoronto.ca/~ljdursi/Hadoop/examples.tgz

Slides

You can download the slides from here.