Hadoop for HPCers

From oldwiki.scinet.utoronto.ca
Revision as of 08:18, 29 August 2014 by Ljdursi (talk | contribs) (Created page with " =Overview= This is a ~3 hour class that will introduce Hadoop to HPC users with a background in numerical simulation. We will walk through a brief overview of: * The Hadoop...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Overview

This is a ~3 hour class that will introduce Hadoop to HPC users with a background in numerical simulation. We will walk through a brief overview of:

  • The Hadoop File System (HDFS)
  • Map Reduce
  • Pig
  • Spark

Most examples will be written in Python.

VM Instructions

This course will feature hands-on work with a 1-node Hadoop cluster running on your laptop. The VMs are created with Vagrant. Before the course, ensure this is up and running:

The GUI VM will start up a console with a full desktop environment; you can open a terminal and begin working. For the text VM, you will have to login to the console; the username/password is vagrant/vagrant. For either machine, you can also ssh into the VM from your laptop from the terminal:

ssh vagrant@192.168.33.10

or to the laptop from the VM with

ssh [username]@192.168.33.1

.

Then make sure everything is working:

  • From a terminal, start up the hadoop cluster by typing
    ~/bin/init.sh
    You may have to answer "yes" a few times to start up some servers.
  • Go to one of the example directories by typing
    cd ~/examples/wordcount/streaming
  • Then start the example by typing
    make

You've now run your (maybe) first Hadoop job!