Erick Matsen (@ematsen, matsen.group)
Now that you have some experience with shell, we'll dive a little more deeply into what remote and cloud computing are, including the modern container-based approach to high-performance computing. However, we will start by going over some of the questions from last time.
After this course, you should be able to:
- Be able to run shell commands from Python
- Understand local versus remote execution
- Understand the basics of what cloud computing is and what it offers
- Understand what a linux container is
- Know how to copy, move, and delete files on the command line
- This directory has the class materials for the course. The file
quickref.md
is a quick reference file with basic commands we've used so far. - You should be able to execute all commands in this lesson on Rhino or on your local computer.
- Homework 7 will be available next week.
- Homework 8 (capstone) will be available the week of December 2.
- Don't forget to
git pull
in the directory containing yourtfcb_2019
repository to get the latest goodies.
In the scripting component from last time, we used wildcards like so:
ls input_*.bam | parallel file
The *
just represents any set of characters.
Thus this ls
call just lists anything that starts with input_
and ends with .bam
.
Now it's your turn.
You can combine wildcards, using them for example for both directory and file names.
Try using a wildcard to list all of the .md
files in ../lecture09/directories/purchase
.
There is also a Python glob
module which implements wildcards.
Here's an example you can copy and paste into the shell:
python -c "import glob; print(glob.glob('ps*'))"
Here we give the glob.glob
command a string to expand, which in this case is like using ls ps*
.
glob.glob
returns a Python list.
There are other symbols you can use if you want more precise control over your wildcard matches but *
is a fine start.
Hidden directories and files
The shell has a special way of hiding files that you don't want to see: any file or directory starting with a period (.
) is hidden when you ls
.
It's also not matched with *
.
Try doing
echo ~/*
which compactly lists all of the non-hidden files in your home directory (which is ~
).
Then try
echo ~/.*
which shows all of the hidden files.
You can also use
ls -a ~
which will list all of the files in ~
.
Removing files happens with
rm FILENAMES
where FILENAMES
can be one or more names.
Removing directories happens with
rmdir DIRECTORYNAMES
where DIRECTORYNAMES
is one or more directory names.
However, these directories need to be empty.
It is more common to use the less safe
rm -r DIRECTORYNAMES
which recursively removes DIRECTORYNAMES
and all of their contents.
Unlike "putting something in the trash" on a desktop OS, once linux removes a file, it's gone!
However, there's a safety net on the Fred Hutch linux system, which are the "snapshots."
Every hour the Fred Hutch servers back up your whole home directory into a snapshot folder hosted in ~/.snapshot
.
Try
ls ~/.snapshot
to see all of these files.
Now it's your turn to pretend you made a mistake.
I will assume that you are in tfcb_2019/lectures/lecture17
.
Try removing some non-essential file, say doing
rm ../lecture09/vader.txt
and then restoring it from the snapshot directory.
Making directories is done with
mkdir DIRECTORY
which makes a directory called DIRECTORY
.
Copying files uses the cp
command, like so:
cp SOURCE DESTINATION
where SOURCE
is an extant file, and DESTINATION
is a place the file can go.
If DESTINATION
is a file name, it will copy that file to have that new name.
If DESTINATION
is a directory, it will copy the file with the same name into that directory.
You can also use the form
cp SOURCES DIRECTORY
where SOURCES
is a collection of files and DIRECTORY
is a directory.
Moving/renaming files is just the same, except that you use the mv
command rather than the cp
command.
Now it's your turn.
- Copy this
README.md
file into..
- Rename it
../README.copy.md
- Make a directory called
copies
- Copy
../README.copy.md
and../lecture09/vader.txt
intocopies
- Remove the
copies
directory
The file
command, when run on ../lecture09/slides/images/betty-crocker.jpg
, returns
../lecture09/slides/images/betty-crocker.jpg: JPEG image data, JFIF standard 1.01, resolution (DPI), density 72x72, segment length 16, baseline, precision 8, 460x600, components 3
which as you can see, is the file name, then a colon, then a description of the file.
Make a Python script that runs file
on each file in ../lecture09/
, extracts just the descriptions (hint: see the Python split()
function), sorts them, and prints them.
You can use the ps-count.py
script in this directory as a template (in particular the invocation of subprocess.check_output
).