AWS for deep learning

Setting up Key (for 1st time):

Get penaltykick.pem file from John.
Copy to ~/.ssh folder in your own directory.
chmod 400 penaltykick.pem.

Log in to Amazon Console for the lab with MFA. Navigate to EC2 > {X} Running Instances

How to start up a computer on Amazon

click on PK{X}
Actions --> Instance State --> Start --> Yes, Start (this will take a few minutes to boot up)
copy (under the Descriptions tab) the Public DNS to your clipboard.
open a terminal on your computer (from home directory).

ssh -i ~/.ssh/penaltykick.pem ubuntu@{DNS address}

You're in!

Starting a new type of instance

Please google Amazon EC2 Instance types to see the various types of instances Amazon provides and what they mean (i.e. m = memory optimize, c = compute optimize, etc.).
For our experiments, run a P3.2xlarge for a gpu-type experiment and run a c5.2xlarge for a cpu-type experiment.
Launch Instance --> AWS Marketplace
Search for "Amazon Deep Learning", then select "Deep learning AMI (Ubuntu)", then hit Continue.
Choose your desired instance type, then Configure Instance Details.
Keep all defaults or change as desired, then add "penaltykick" as a tag. Then configure security group.
Select existing security group, then select Deep Learning AMI ubuntu.
Point to your existing key pair, then launch.

Mounting Data Drive

Amazon --> Storage --> EFS
click on penalty kick data, then copy the "DNS name"
go back to terminal. ls ~/data to see if this directory exists, if not make one with mkdir.
execute the following, with changing DNS with the string saved from clipboard.

sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 <DNS>:/ ~/data

This should be enough. If you ls -al inside the home directory and find that data/ is not owned by ubuntu, you may also need to run

sudo chown ubuntu:ubuntu data

Cloning Code

You'll have to add to the instance whatever code you need that isn't available as part of one of the installed frameworks. If this is github code (i.e., ours), you'll need to (using the code directory as an example):

mkdir ~/code (if it doesn't exist)
cd ~/code
Clone the repo:
```
git clone git://github.com/<user>/<repo>.git
```
This syntax allows us to clone any public repo without the need for authentication (so no need to set up ssh keys for GitHub, etc.).
git checkout to whatever branch you want
Activate the Anaconda environment corresponding to your desired framework. E.g., for Tensorflow on Python 3.6:
```
source activate tensorflow_p36
```

Running Code

Add the directory of git repository to python path

export PYTHONPATH=$PYTHONPATH:~/code

install Edward if needed

pip install edward

Navigate to ~/code/tf_gbds
Run whatever commands you want as you would in a normal terminal (save all results to ~/data).

Launching Tensorboard and Forwarding Port

After you mount data drive and activate tensorflow environment, launch tensorboard on ubuntu

tensorboard --logdir='/home/ubuntu/data'

The default port for tensorboard is 6006. You could also specify another port to use.

Forward the port where tensorboard is running to your local machine

ssh -i ~/.ssh/penaltykick.pem -N -f -L localhost:<target port on your machine>:localhost:6006 ubuntu@<public DNS>

You should be able to see tensorboard forwarded to the target port.

If the target port is already is in use, the forwarding will fail. You could either try forwarding to another port or find out the processes running in that port by sudo lsof -i :<target port> and end all processes by kill <PID>.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly