-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathrclone.html
196 lines (196 loc) · 9.96 KB
/
rclone.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>My file organization workflow</title>
<link rel="stylesheet" href="style.css">
</head>
<body>
<ul class="link-list">
<li class="link-list">
<image class="invertable" src="back.svg" alt="GitHub Icon"></image>
<a class="icon-link" href="index.html">Back</a>
</li>
</ul>
<h1 class="text">My file organization and backup workflow<br><i>2024.05.28</i></h1>
<p class="text">
I have been using <a href="https://rclone.org/">Rclone</a> to backup my
files for a few years now, and have developed a workflow of sorts for
it. I'll discuss ways in which I organize my files and how that allows
me to automatically create verifiable and "portable" backups with
Rclone.
<br><br>
My setup definitely won't fit most people's use cases, but I hope this
will give you some inspiration on how to incorporate Rclone around your
own workflow. It's an extremely useful tool that can automate
surprisingly many cloud storage and backup related tasks.
</p>
<h2 class="text">General rules</h2>
<p class="text">
In order to make the most out of Rclone and its capability for
automation, I try to stick to the following general rules when
organizing my files:
<br><br>
<b class="dot-item">1.</b>
<br><br>
All files will be categorized under one of six top-level folders:
<code>Audio</code>, <code>Documents</code>, <code>Literature</code>,
<code>Pictures</code>, <code>Software</code> and <code>Videos</code>.
These may vary widely depending on your files, but in general it's
important to categorize every file around a set of generic folders.
<br><br>
The top-level folders can also be verbs if it makes more sense to you,
eg. <code>Listen</code>, <code>Write</code>, <code>Read</code>,
<code>Watch</code>, <code>Program</code> and <code>Study</code>.
<br><br>
<b class="dot-item">2.</b>
<br><br>
All files will be placed in subfolders under the top-level folders, eg.
<code>Audio/Podcasts</code>, <code>Audio/Music</code>,
<code>Videos/Movies</code>, <code>Videos/TV Shows</code>.
<br><br>
Files should never be placed directly under a top-level folder, as
there is always a subcategory they can be placed under. Keeping files
in at least depth 2 or higher in the directory tree will also help with
automation later on.
<br><br>
<b class="dot-item">3.</b>
<br><br>
Files placed under the previously mentioned subfolders need to follow a
consistent file naming convention and folder structure, but <b>only
limited to their respective subfolders</b>.
<br><br>
For example, files placed in <code>Audio/Music</code> could be
categorized further in subfolders <code>"Album Artist/Album
Name"</code>, whereas files placed in <code>Videos/Movies</code>
wouldn't need to be placed in subfolders, instead named as <code>"Movie
Title (Year)"</code>.
<br><br>
The most important part is that each subfolder's files should be placed
at <b>a consistent folder depth</b>. For example, files in
<code>Audio/Music</code> will all be placed at depth 4:
<code>Audio/Music/Album Artist/Album Name/## Track Title.mp3</code>,
whereas files placed in <code>Videos/Movies</code> will all be placed
at depth 2: <code>Videos/Movies/Movie Title (Year).mkv</code>.
<br><br>
I wrote a simple
<a href="https://github.com/patoporh/bash-scripts/blob/main/file-depths">Bash script</a>
to help keep track of this. The script prints out each depth files are
found at for a given directory.
</p>
<h2 class="text">Cataloging file depths</h2>
<p class="text">
When files are organized following the aforementioned rules, some
interesting automation will become possible. Let's say that each of the
subfolders and their common file depths are cataloged in a TSV-file, like
so:
</p>
<pre>
2 Audio/Music
0 Literature/Books
0 Videos/Movies
2 Videos/TV Shows</pre>
<p class="text">
The first field will specify a consistent depth for files <b>relative</b>
to the given subfolder. This means that <code>Audio/Music</code>'s common
subfolders <code>Album Artist/Album Name</code> make files reside at depth
2, whereas files placed directly under <code>Literature/Books</code> are at
depth 0.
<br><br>
With this TSV-file quite a lot of automation has now become possible. The
following sections will contain some examples of this.
</p>
<h2 class="text">Creating checksums</h2>
<p class="text">
In order to create backups, more specifically <b>verifiable</b> backups, we
need to have checksums for everything. This is something backup tools like
<a href="https://www.borgbackup.org/">Borg</a> and
<a href="https://restic.net/">Restic</a> will always create for you.
However, something I dislike about these methods is their own ways of
browsing and managing the resulting backups. These tools have their own
commands for listing, verifying, mounting and restoring files in a backup.
<br><br>
Having become quite familiar with Rclone over the past few years, I wanted
to achieve something similar to these backup tools entirely within Rclone.
Turns out, it's entirely possible to create backups with Rclone when using
it with the option
<a href="https://rclone.org/flags/#sync"><code>--backup-dir</code></a>.
<br><br>
To create backups with Rclone, checksum files need to be created. Unlike
traditional backup tools, Rclone won't automatically create these for
you. This is where the previously mentioned TSV-file comes in handy. Since
the file depth specified for each subfolder means that no files should be
placed above that given depth, we can create checksums for all files under
that depth.
<br><br>
In order to automate this, I wrote a Bash script called
<a href="https://github.com/patoporh/bash-scripts/blob/main/new-md5"><code>new-md5</code></a>,
which can recursively create MD5 files at specified depths.
Instead of creating one large MD5 file directly under <code>Audio/Music</code>,
<code>new-md5</code> can create a separate MD5 file for each <i>album</i>
at depth 2. The benefit of this over large MD5 files is that renaming and
moving directories around remains easy.
<br><br>
Given that every subfolder is specified in the TSV-file with their correct
file depths, automating checksum creation becomes a trivial task with
<code>new-md5</code> or a similar helper script. See
<a href="https://github.com/patoporh/bash-scripts/blob/main/new-md5"><code>new-md5</code></a>'s
documentation for some use cases.
</p>
<h2 class="text">Backups</h2>
<p class="text">
Since every directory now has MD5 files, backing up can be done in a more
"portable" fashion than some traditional backup tools allow for. With this
I mean that the files can simply be copied or synced as is to new drives
without a need to generate new checksums for the resulting backup. Since
<code>new-md5</code> doesn't update hashes in the generated checksum files
by default, there is no danger of files being changed without notice
between multiple new backups.
<br><br>
Rclone ties this all together. All files can be placed in a <code>default</code>-remote,
whether they are in the cloud or on a disk, encrypted or not. Creating backups
can then be automated by creating remotes named <code>backup_*</code>, have
a helper script cycle through each backup remote with
<code>rclone listremotes | grep "^backup_"</code>
and create backups with Rclone's <code>--backup-dir</code>, optionally along with
<code>--suffix</code> and <code>--suffix-keep-extension</code>.
<br><br>
Since <code>--backup-dir</code> shouldn't be in the same path as the
top-level folders, it's a good idea to make every remote's root include a
couple of meta-folders: one for files the given remote contains and one
for deleted files, which <code>--backup-dir</code> handles for you. I tend to
name these simply as <code>0</code> and <code>1</code> to save space in the
path length.
<br><br>
With this setup, a backup can be created with the following command:
<br>
<code>rclone sync default:/<b>0</b> backup_remote:/<b>0</b> --backup-dir backup_remote:/<b><u>1</u></b></code>
<br><br>
Verifying the backups and the default remote is also trivial with Rclone.
All MD5 files in a remote can be found with
<br>
<code>rclone lsf remote_name:/0 -R --files-only --include '*.md5'</code>
<br>
and subsequently checked in a loop with
<br>
<code>rclone md5sum remote_name:/0/path/to/md5 -C remote_name:/0/path/to/md5/.verify.md5</code>.
<br><br>
Although this might not be the most optimal and space-saving method for
creating backups, I've personally been very happy with it. Since everything
revolves around Rclone's remotes, this method is extremely malleable.
Everything can be done locally on hard drives, but can easily be scaled up to
include any cloud storage that Rclone supports, not to mention
<a href="https://rclone.org/crypt/">encryption</a>,
<a href="https://rclone.org/compress/">compression</a>
or combining remotes with the
<a href="https://rclone.org/union/">union remote</a>.
<br><br>
Following the
<a href="https://www.veeam.com/blog/321-backup-rule.html">3-2-1 rule</a>
when creating backups this way is also easy. The default remote can be
a local drive, with one of the backup remotes pointing to another local drive
and another to a cloud storage.
</p>
</body>
</html>