Interactively and manually organize ebook files quickly. This is a Python port of interactive-organizer.sh from ebook-tools written in shell by na--.
With interactive_organizer.py, you can interactively and manually organize ebook files quickly. It allows you to manually check the files that were automatically renamed by organized_ebooks directly in the terminal. Many useful operations can be performed through the terminal for each ebook that you want to check:
- read the file content (text conversion) from the terminal by leveraging the
less
command, - read the book's metadata from the corresponding
.meta
file, - provide another ISBN that will be used to fetch metadata from online sources and use these metadata to rename the file,
- and so on!
This is a Python port of interactive-organizer.sh from ebook-tools written in shell by na--.
Personally, this is the shell script from na--
that I had the most fun porting* to Python since it shows the powerful and interesting
things that you can do through the terminal and all via a Python script: you can even invoke a bash shell directly from the Python script, do your
thing with the bash shell like create a folder over here and a file over there and then go back to the Python script as if nothing happened!
What I also like is how na--
had the great idea of highlighting the differences in the old
filename compared to the new one for a given renamed ebook. Similar words between both filenames are colored green and those missing
in the new filename are colored red. Hence you can quickly see if the file was renamed correctly.
* all na--
shell scripts were a lot of fun to port but this one really is special for its interactivity
⭐ Other related Python projects based on ebook-tools
:
- convert-to-txt: convert documents (pdf, djvu, epub, word) to txt
- find-isbns: find ISBNs from ebooks (pdf, djvu, epub) or any string given as input to the script
- ocr: run OCR on documents (pdf, djvu, and images)
- split-ebooks-into-folders: split the supplied ebook files into folders with consecutive names
- organize-ebooks: automatically organize folders with potentially huge amounts of unorganized ebooks. It leverages the first three previous Python scripts.
You can ignore this section and go straight to pulling the Docker image which contains all the
required dependencies and the Python package interactive_organizer
already installed. This section is more for showing how I setup my system
when porting the shell script interactive-organizer.sh et al.
to Python.
This is the environment on which the Python package interactive_organizer was developed and tested:
Platform: macOS
Python: version 3.7
organize_ebooks: Python package whose library has lots of useful functions for developing ebook organizers/managers. It is a Python port of organize-ebooks.sh from ebook-tools written in shell by na--.
textutil or catdoc: for converting doc to txt
NOTE: On macOS, you don't need
catdoc
since it has the built-intextutil
command-line tool that converts any txt, html, rtf, rtfd, doc, docx, wordml, odt, or webarchive fileDjVuLibre: it includes
djvutxt
for converting djvu to txt⚠️ To access the djvu command line utilities and their documentation, you must set the shell variable
PATH
andMANPATH
appropriately. This can be achieved by invoking a convenient shell script hidden inside the application bundle:$ eval `/Applications/DjView.app/Contents/setpath.sh`
Ref.: ReadMe from DjVuLibre
You need to softlink
djvutxt
in/user/local/bin
(or add it in$PATH
)
poppler: it includes
pdftotext
for converting pdf to txt
ℹ️ epub can be converted to txt by using unzip -c {input_file}
Optionally:
-
Versions 2.84 and above are preferred because of their ability to manually specify from which specific online source we want to fetch metadata. For earlier versions you have to set
ISBN_METADATA_FETCH_ORDER
andORGANIZE_WITHOUT_ISBN_SOURCES
to empty strings.for fetching metadata from online sources
for getting an ebook's metadata with
ebook-meta
in order to search it for ISBNsfor converting {pdf, djvu, epub, msword} to txt (for ISBN searching) by using calibre's ebook-convert
⚠️ ebook-convert
is slower than the other conversion tools (textutil
,catdoc
,pdftotext
,djvutxt
)
Optionally poppler, catdoc and DjVuLibre can be installed for faster than calibre's conversion of
.pdf
,.doc
and.djvu
files respectively to.txt
.Optionally the Goodreads and WorldCat xISBN calibre plugins can be installed for better metadata fetching.
⭐
If you only install calibre among these dependencies, you can still have a functioning program that will enable you to manually organize your ebook collections with the script
interactive_organizer
:
- fetching metadata from online sources (very helpful feature for this particular script) will work: by default calibre comes with Amazon and Google sources among others
- conversion to txt will work: calibre's own
ebook-convert
tool will be used. However, accuracy and performance will be affected as explained in the list of dependencies above.
ℹ️
It is recommended to install the Python package interactive_organizer with Docker because the Docker
container has all the many dependencies already installed along with the Python package interactive_organizer
.
Pull the Docker image from hub.docker.com:
docker pull raul23/organize:latest
Run the Docker container:
docker run -it -v /host/input/folder:/books_to_check raul23/organize:latest
ℹ️
/host/input/folder
is a directory within your OS that can contain all the ebooks to be manually organized and is mounted as/books_to_check
within the Docker container.You can use the
-v
option mulitple times to mount several host output folders within the container, e.g.:docker run -it -v /host/input/folder:/books_to_check -v /host/output/folder:/output-folder raul23/organize:latest
raul23/organize:latest
is the name of the image upon which the Docker container will be created.
Now that you are within the Docker container, you can run the Python script
interactive_organizer
with the desired options:user:~$ interactive_organizer /books_to_check/
ℹ️
- This basic command instructs the script
interactive_organizer
to start manually checking the ebooks within/books_to_check/
. - When you log in as
user
(non-root) within the Docker container, your working directory is/ebook-tools
.
- This basic command instructs the script
ℹ️ The Docker image also comes with the Python package organize_ebooks. You can find more information about the content of the Docker image at github.com/raul23/organize-ebook
You can ignore this section and go straight to pulling the Docker image which contains all the required dependencies and the Python packageinteractive_organizer
already installed. This section is for installing the bleeding-edge version of the Python packageinteractive_organizer
after you have installed yourself the many dependencies.
After you have installed the dependencies, you can then install the development (bleeding-edge) version of the package interactive_organizer:
pip install git+https://github.com/raul23/interactive-organizer#egg=interactive-organizer
NOTE: the development version has the latest features
Test installation
Test your installation by importing
interactive_organizer
and printing its version:python -c "import interactive_organizer; print(interactive_organizer.__version__)"
You can also test that you have access to the
interactive_organizer.py
script by showing the program's version:interactive_organizer --version
To uninstall the development version of the package interactive_organizer along with the dependency organize_ebooks:
pip uninstall interactive_organizer organize_ebooks
To display organize_ebooks.py's list of options and their descriptions:
$ interactive_organizer -h usage: interactive_organizer [OPTIONS] {folder_to_organize} -o {output_folder} [{output_folder}] Interactively and manually organize ebook files quickly. This script is based on the great ebook-tools written in shell by na-- (See https://github.com/na--/ebook-tools). General options: -h, --help Show this help message and exit. -v, --version Show program's version number and exit. -q, --quiet Enable quiet mode, i.e. nothing will be printed. --verbose Print various debugging information, e.g. print traceback when there is an exception. -d, --dry-run If this is enabled, no file rename/move/symlink/etc. operations will actually be executed. -s, --symlink-only Instead of moving the ebook files, create symbolic links to them. --log-level {debug,info,warning,error} Set logging level. (default: info) --log-format {console,only_msg,simple} Set logging formatter. (default: only_msg) Convert-to-txt options: --djvu {djvutxt,ebook-convert} Set the conversion method for djvu documents. (default: djvutxt) --epub {epubtxt,ebook-convert} Set the conversion method for epub documents. (default: ebook-convert) --msword {catdoc,textutil,ebook-convert} Set the conversion method for epub documents. (default: textutil) --pdf {pdftotext,ebook-convert} Set the conversion method for pdf documents. (default: pdftotext) Interactive options: --qm, --quick-mode This mode is useful when `organize_ebooks` was called with `--keep-metadata`. Ebooks that contain all of the tokens from the old file name in the new one are directly moved to the default output folder. --token-min-length LENGTH When files and file metadata are parsed, they are split into words and ones shorter than this value are ignored. By default, single and two character number and words are ignored. (default: 3) --tokens-to-ignore TOKENS A regular expression that is matched against the filename/author/title tokens and matching tokens are ignored. The default regular expression includes common words that probably hinder online metadata searching like book, novel, series, volume and others, as well as probable publication years like (so 1999 is ignored while 2033 is not). (default: ebook|book|novel|series|^ed(ition)?$|^vol(ume)?$|(19[0-9]|20[0-2])[0-9]) -m, ---metadata-fetch-order METADATA_SOURCE [METADATA_SOURCE ...] This option allows you to specify the online metadata sources and order in which the subcommands will try searching in them for books by their ISBN. The actual search is done by calibre's `fetch-ebook-metadata` command-line application, so any custom calibre metadata plugins can also be used. To see the currently available options, run `fetch-ebook-metadata --help` and check the description for the `--allowed-plugin` option. If you use Calibre versions that are older than 2.84, it's required to manually set this option to an empty string. (default: ['Goodreads', 'Google', 'Amazon.com', 'ISBNDB', 'WorldCat xISBN', 'OZON.ru']) --owis, --organize-without-isbn-sources METADATA_SOURCE [METADATA_SOURCE ...] This option allows you to specify the online metadata sources in which the script will try searching for books by non-ISBN metadata (i.e. author and title). The actual search is done by calibre's `fetch-ebook-metadata` command-line application, so any custom calibre metadata plugins can also be used. To see the currently available options, run `fetch-ebook-metadata --help` and check the description for the `--allowed-plugin` option. Because Calibre versions older than 2.84 don't support the `--allowed-plugin` option, if you want to use such an old Calibre version you should manually set `organize_without_isbn_sources` to an empty string. (default: ['Goodreads', 'Google','Amazon.com']) Input/Output options: folder_to_organize Folder containing the ebook files that need to be organized. -o, --output-folders [PATH [PATH ...]] The different output folders to which you can quickly move ebook files. The first specified folder is the default. -c, --custom-move-base-dir PATH A base directory in whose sub-folders files can more easily be moved during the interactive session because of tab autocompletion. (default: ) -r, --restore-original-base-dir PATH If you want to enable the option of restoring files to their original folders (or at least with the same folder structure), set this as the base path. (default: ) --oft, --output-filename-template TEMPLATE This specifies how the filenames of the organized files will look. It is a bash string that is evaluated so it can be very flexible (and also potentially unsafe). (default: ${d[AUTHORS]// & /, } - ${d[SERIES]:+ [${d[SERIES]}] - }${d[TITLE]/:/ -}${d[PUBLISHED]:+ (${d[PUBLISHED]%-*})}${d[ISBN]:+[${d[ISBN]}]}.${d[EXT]}) --ome, --output-metadata-extension EXTENSION This is the extension of the additional metadata file that is saved next to each newly renamed file. (default: meta)
Let's say that we want to manually organize some books that were labeled as uncertain by the script organize_ebooks that automatically organized a collection of ebooks. Here is a basic command that will allow you to manually inspect these uncertain books:
$ interactive_organizer ~/test/test_organize/commons-books/uncertain/
We will go through each of the options in the main menu using this simple command as a starter.
ℹ️
Ebooks that the script organize_ebooks
was able to identify from non-ISBN metadata (e.g. title)
fetched from online sources (e.g. Goodreads) are saved in a folder specified by the option output-folder-uncertain. An ebook considered as "uncertain" means that the
script is not highly confident about the filenames given to these books compared to those books whose ISBNs
could be retrieved directly from their contents and successfully used to retrieve metadata from online sources.
⭐
When you are in a submenu (e.g. moving file with the m option) and want to go back to the main menu, you can do it by pressing the keysCtrl
andC
.
The user will be warned if an ebook file doesn't have an associated metadata file (as created by the script organize_ebooks):
You can still do all of the operations in the main menu except the c option obviously:
Press the key M
to move the current ebook file to another folder:
The script warns you that the 'custom' folder is empty because the basic command that was used to run
the interactive_organizer
script didn't use the option custom-move-base-dir
(by default it is set to empty). Thus,
the new path starts from the current working directory.
The file in this example will be saved relative to the current working directory but you can also give a full path.
By using tab, the script autocompletes the path that you enter so that it is easier for you to navigate through your filesystem:
Pressing the key O
or Enter
will open the given document in an external program which is the default one used
by the OS for this particular file type. On Linux, this default program is called upon by xdg-open
and on macOS, it is done
by open
.
Thus you can check the content of the PDF to make sure that the file was correctly renamed.
For each ebook that the script organize_ebooks
(keep-metadata
option) renames and moves to another folder, a metadata file is created with data fetched from online
sources via calibre's fetch-ebook-metadata
.
This metadata file can be read from within the Python script by pressing the key C
from the main menu:
To open a shell from within the Python script, press the key T
or `
(backtick):
To exit from the shell, press the keys Ctrl
and D
and you will get back to the main menu of the Python script:
You can skip the current ebook file by pressing the key S
. If another file is found in the input folder, it
will be shown in the main menu as the next file to be checked by the user:
This is the part of the menu where you will interact a lot with the script. When pressing the key I
or Backspace
,
you are asked to enter search terms or a new filename within single quotes:
ℹ️ The old file path will be added into the new metadata file since the old metadata file is removed.
The search terms will be used to fetch new metadata from online sources via calibre's fetch-ebook-metadata
. These fetched
metadata will be used to rename the given ebook file. The file will be saved within the same input folder.
You could also enter a new ISBN as the search term and it will be used to rename the file:
The script then fetches metadata based on the provided ISBN and displays the metadata that it found:
It then asks if you want to use these metadata to rename the ebook file and the associated metadata file:
To read the given document from the terminal, press the key L
which will instruct the script to convert
the file (e.g. pdf, djvu, epub) to text and show the content in the terminal through the program less
.
less
will let you move easily through the content (page up and page down) and hence you can quickly take
a peek at the content of the file to check if it was correctly named by the automatic script organize_ebooks
.
Here is the text content of the epub document from the example as shown by less
:
And here is a sample text content from a PDF file when viewing it with less
:
ℹ️ You can then press Q
to exit from less
and get back to the main menu of the Python script.
Press ?
to show the metadata of the given document via calibre's ebook-meta
:
Press the key E
to modify the regex used for ignoring tokens in the old filename when comparing the old
and new filenames:
If the new filename for a given ebook file is not missing word from the old filename, the ebook
file can be quickly moved to a user specified folder by using the flag --qm
which enables quick mode:
interactive_organizer ~/test/test_organize/commons-books/uncertain/ -o output1/ --qm
-o
since it is the first
of the output folders that will be used as the location where the ebook file will be saved.
We can provide a list of output folders that we can use to move ebook and metadata files between them with the command-line option -o:
interactive_organizer ~/test/test_organize/commons-books/uncertain/ -o output0 output1 output2/ output3/
In the main menu, we can see at the beginning of possible actions four options related to the output folders.
The first output folder specified in the option -o
is the default one and given the number 0 as label. The other
output folders in the example command line are labeled with 1, 2 and 3.
If we type 0
, the given ebook file along with the associated metadata file are moved to the output0/
folder:
When using the menu option m, we can provide a custom base folder to the script via
the -c
command line option:
interactive_organizer ~/test/test_organize/commons-books/uncertain/ -c custom/
The script asks to enter the file path where the ebook file will be moved. We can see that this
file path starts from the custom base folder we provided to the script (custom/
).
We can check that the file was moved to the custom base directory by calling a bash shell from the Python script (see the menu option t):
We can restore a given ebook file to the original path (or at least the same folder structure) by providing a base path with the command-line -r:
interactive_organizer ~/test/test_organize/commons-books/uncertain/ -r restore/
The script then asks if we want to modify the file path that will be used for saving the restored ebook file:
In this example, the file will be restored to the original folder structure by saving it relative to the specifed base path (restore/
).
By using tab, the script autocompletes the path that we enter so that it is easier to navigate through the filesystem:
To interactively organize your ebooks, you can start with the following basic command:
interactive_organizer ~/uncertain/
The only required parameter is the input folder that you want to manually check. This input folder can be for instance the folder containing uncertain ebooks as renamed by the script organize_ebooks.
These ebooks and their associated metadata files can be checked to validate the book's filenames. In the main menu and for a given ebook file analyzed, you will see the old filename colored as such:
- red for those words missing in the new filename
- green for those words also found in the new filename
- blue for the date
Hence, you can quickly inspect how good of a job the script organize_ebooks
did with the
renaming of an ebook: if there are many green words and very few or no red words, then you can
move the file to another folder of your choice.
If no missing words were detected, the file can be quickly moved to an output folder if the
flag --qm
(quick mode enabled) was used.