-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdir_mgmt.qmd
126 lines (83 loc) · 6.58 KB
/
dir_mgmt.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
title: "Directory & File Management"
engine: knitr
---
## Overview
One thing we have not discussed that is incredibly important to any scripted language ([{{< fa brands python >}} Python]{.py} and [{{< fa brands r-project >}} R]{.r} _very_ much included) is computer directory and file management. Code often interacts with external data files that must be imported at the start of a workflow and often preserves a record of its operation as an exported product at the end of the workflow. These concepts are more related to fundamental understanding of how your computer stores information than they are strictly coding language considerations but they are still worth discussing here.
## Crucial Vocabulary
There are a few vocabulary terms we need to introduce before we can dive into the code side of file and directory management. Fortunately, these are terms that apply to computers generally so we do not have to deal with [{{< fa brands python >}} Python]{.py} and [{{< fa brands r-project >}} R]{.r} using different names for the same concept.
- Directory -- A folder on your computer (typically containing other folders and/or files)
particular file/folder
- Working Directory -- The folder your code is "looking at" for a given project
- By default this is the folder in which the code file itself can be found though you _can_ set it elsewhere (though this is sometimes risky)
- Root -- The single folder that contains _everything_ else on your computer
- Absolute Path -- The names of each directory <u>beginning at the root</u> and ending at a - Relative Path -- The names of each directory <u>beginning at the your working directory</u> and ending at a particular file/folder
## Library Loading
We'll need to quickly load any needed libraries before getting into the "actual" coding.
:::panel-tabset
## [{{< fa brands r-project >}} R]{.r}
[{{< fa brands r-project >}} R]{.r} does not require any libraries to perform these operations!
```{r r-libs}
#| echo: false
# Load needed libraries
```
## [{{< fa brands python >}} Python]{.py}
We'll need the `os` library and `glob` library in [{{< fa brands python >}} Python]{.py} to do these operations.
```{python py-libs}
# Load needed library
import os
import glob
```
:::
## The Working Directory
As defined above, the working directory is the primary folder with which your code can easily interact. This includes folders within your working directory but _does not_ include folders "above"/outside of it. It is a good practice to use different folders for different projects such that each project has a different working directory and you don't run the risk of scripts/data from one project accidentally interacting with those form another project. For [{{< fa brands r-project >}} R]{.r} users, the RStudio "R Project" functionality guarantees that you have a different working directory for each R project.
Both languages provide a quick way of checking what your current working directory is to ensure you're not accidentally in the wrong one. Note that either approach will return the _absolute path_ to your working directory which will differ among users/computers.
:::panel-tabset
## [{{< fa brands r-project >}} R]{.r}
Base [{{< fa brands r-project >}} R]{.r} includes the `getwd` function to display your current working directory.
```{r r-wd}
# Check current working directory
getwd()
```
## [{{< fa brands python >}} Python]{.py}
[{{< fa brands python >}} Python]{.py} has a `getcwd` function (from the `os` library) to display your current working directory.
```{python py-wd}
# Check current working directory
os.getcwd()
```
:::
## Operating Systems & File Paths
One minor--but often frustrating--hurdle for collaborative coding occurs when group members use different operating systems (i.e., macOS, Windows, etc.). When defining file paths (absolute or relative), different operating systems use either a slash (`/`) or a backslash (`\`) to separate directories in that path. Unfortunately for us, your OS will only be able to interpret file paths that use the type of slash it uses. This means that any code that reads in external data or exports its outputs needs to account for this OS-level difference **_every_ time a file path is defined**. However, the developers of [{{< fa brands python >}} Python]{.py} and [{{< fa brands r-project >}} R]{.r} know this pain themselves and have given us straightforward tools to handle this.
:::panel-tabset
## [{{< fa brands r-project >}} R]{.r}
[{{< fa brands r-project >}} R]{.r} includes a `file.path` function that automatically detects your computer's OS and inserts the correct type of slash between each directory name.
```{r r-os-paths}
# Build a faux file path
file.path("path", "to", "my", "data")
```
## [{{< fa brands python >}} Python]{.py}
The `os` library in [{{< fa brands python >}} Python]{.py} includes a `join` function that automatically detects your computer's OS and inserts the correct type of slash between each directory name.
```{python py-os-paths}
# Build a faux file path
os.path.join("path", "to", "my", "data")
```
Note that the needed function is in the `path` module of the `os` library.
:::
Using these tools allows us to code collaboratively even when not all group members use the same operating system! Note that nothing will solve the issue of using absolute paths because group members will always have different paths from the root to a particular directory. Because of this <u>it is best to **_always_** use relative paths for maximum reproducibility</u>.
## Finding Files
Once you've confirmed your working directory and figured out how to account for OS idiosyncrasies, its time to actually look at the files you have available! In the `r fontawesome::fa(name = "terminal", fill = "#EB6534", a11y = "sem")` [command line]{.cli}, the `ls` function lists all files in a particular folder. Fortunately both [{{< fa brands python >}} Python]{.py} and [{{< fa brands r-project >}} R]{.r} contain tools for doing this as well.
:::panel-tabset
## [{{< fa brands r-project >}} R]{.r}
We can use the [{{< fa brands r-project >}} R]{.r} `dir` function to identify all files in a particular folder (note this ignores _folders_ inside of the specified folder).
```{r r-ls}
# List files in the "data" folder for this website
dir(path = file.path("data"))
```
## [{{< fa brands python >}} Python]{.py}
We can use the `glob` function (from the library of the same name) to identify all files in a particular folder.
```{python py-ls}
# List files in the "data" folder for this website
glob.glob(pathname = os.path.join("data", "*"))
```
Note that we need to use the wildcard asterisk (`*`) to identify everything in the "data" folder.
:::