-
Notifications
You must be signed in to change notification settings - Fork 15
AWS for deep learning
John Pearson edited this page Nov 27, 2017
·
13 revisions
- Get penaltykick.pem file from John.
- Copy to
~/.ssh
folder in your own directory. -
chmod 400 penaltykick.pem
.
Log in to Amazon Console for the lab with MFA. Navigate to EC2 > {X} Running Instances
- click on PK{X}
- Actions --> Instance State --> Start --> Yes, Start (this will take a few minutes to boot up)
- copy (under the Descriptions tab) the Public DNS to your clipboard.
- open a terminal on your computer (from home directory).
ssh -i ~/.ssh/penaltykick.pem ubuntu@{DNS address}
You're in!
- Please google Amazon EC2 Instance types to see the various types of instances Amazon provides and what they mean (i.e. m = memory optimize, c = compute optimize, etc.).
- For our experiments, run a P3.2xlarge for a gpu-type experiment and run a c5.2xlarge for a cpu-type experiment.
- Launch Instance --> AWS Marketplace
- Search for "Amazon Deep Learning", then select "Deep learning AMI (Ubuntu)", then hit Continue.
- Choose your desired instance type, then Configure Instance Details.
- Keep all defaults or change as desired, then add "penaltykick" as a tag. Then configure security group.
- Select existing security group, then select Deep Learning AMI ubuntu.
- Point to your existing key pair, then launch.
- Amazon --> Storage --> EFS
- click on penalty kick data, then copy the "DNS name"
- go back to terminal. ls ~/data to see if this directory exists, if not make one with mkdir.
- execute the following, with changing DNS with the string saved from clipboard.
sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 <DNS>:/ ~/data
This should be enough. If you ls -al
inside the home directory and find that data/
is not owned by ubuntu
, you may also need to run
sudo chown ubuntu:ubuntu data
You'll have to add to the instance whatever code you need that isn't available as part of one of the installed frameworks. If this is github code (i.e., ours), you'll need to (using the code
directory as an example):
-
mkdir ~/code
(if it doesn't exist) cd ~/code
- Clone the repo:
This syntax allows us to clone any public repo without the need for authentication (so no need to set up ssh keys for GitHub, etc.).
git clone git://github.com/<user>/<repo>.git
-
git checkout
to whatever branch you want - Activate the Anaconda environment corresponding to your desired framework. E.g., for Tensorflow on Python 3.6:
source activate tensorflow_p36
- Add the directory of git repository to python path
export PYTHONPATH=$PYTHONPATH:~/code
- install Edward if needed
pip install edward
- Navigate to ~/code/tf_gbds
- Run whatever commands you want as you would in a normal terminal (save all results to ~/data).
- After you mount data drive and activate tensorflow environment, launch tensorboard on ubuntu
tensorboard --logdir='/home/ubuntu/data'
The default port for tensorboard is 6006. You could also specify another port to use.
- Forward the port where tensorboard is running to your local machine
ssh -i ~/.ssh/penaltykick.pem -N -f -L localhost:<target port on your machine>:localhost:6006 ubuntu@<public DNS>
You should be able to see tensorboard forwarded to the target port.
- If the target port is already is in use, the forwarding will fail. You could either try forwarding to another port or find out the processes running in that port by
sudo lsof -i :<target port>
and end all processes bykill <PID>
.