From 7d28df00d45d2d52224e49e9d19f3549def248a0 Mon Sep 17 00:00:00 2001 From: Casey Scarborough Date: Sat, 2 Sep 2023 23:49:06 -0400 Subject: [PATCH] Update README.md --- README.md | 582 ++++++++++++++++++++++++++++++------------------------ 1 file changed, 324 insertions(+), 258 deletions(-) diff --git a/README.md b/README.md index cf8333f..e1feb23 100644 --- a/README.md +++ b/README.md @@ -1,26 +1,24 @@ +# Yomichan Logo Yomichan Audio Server -# Local Audio Server for Yomichan - -> **Announcement**: A new audio collection was released on 2023/06/11! See how to upgrade to the new collection [here](#other) ("Updating to the new audio collection") - -This [Anki add-on](https://ankiweb.net/shared/info/1045800357) -runs a local server that Yomichan fetches audio files from, +This is a self-hosted audio server for Yomichan to fetch audio files from, using a database containing over 250,000 unique expressions. With this setup, you are able to create Anki cards nearly instantaneously, -get word audio without a working internet connection, -and increase the quality and coverage of word audio. +get word audio without a working internet connection (if hosted internally +in your own network), and increase the quality and coverage of word audio. -Core maintainer: [**@Aquafina-water-bottle**](https://www.github.com/Aquafina-water-bottle) +This project was forked from [themoeway/local-audio-yomichan](https://github.com/themoeway/local-audio-yomichan) and has been modified by me to remove the Anki plugin-related files and slightly refactored to run standalone in Docker. All the credits go to the original creator [**@Aquafina-water-bottle**](https://www.github.com/Aquafina-water-bottle) and the others who worked on [local-audio-yomichan](https://github.com/themoeway/local-audio-yomichan). - -P.S. Feel free to check out my other resources if you're interested! - +The purpose of this project is to host the audio server externally +(outside of localhost). If you don't want to mess with Docker, or +don't want to host this on your own server, NAS, Kubernetes cluster, +etc. then you stick with the original project. ## Reasons for and against this setup -
Advantages: (click here) +
+ Advantages: (click here) -1. Most audio is gotten in **almost instantly**. Without the local audio server, +1. Most audio is gotten in **almost instantly**. Without the audio server, fetching the audio can take anywhere from one second to a full minute (on particularly bad days). @@ -39,20 +37,16 @@ P.S. Feel free to check out +### Run the Container using Docker - To determine if the database was properly generated, - navigate to `Tools` → `Local Audio Server` → `Get number of entries per source`. - The expected result is the image to the right: +```bash +# Basic usage for running locally: +docker run \ + -p 5050:5050 + -v /path/to/user_files:/data + caseyscarborough/yomichan-audio-server:latest + +# To run on externally, you will need to configure +# the environment. These environment variables are the +# defaults and can be modified if necessary: +docker run \ + -p 5050:5050 + -e BIND_ADDRESS=0.0.0.0 + -e BIND_PORT=5050 + -e EXTERNAL_URL="http://localhost:5050" + -e DATA_DIRECTORY=/data + -e CONFIG_DIRECTORY=/data + -v /path/to/user_files:/data + caseyscarborough/yomichan-audio-server:latest +``` - If there are missing sources, or you see "Database is empty", that means that - the audio files were either misplaced, or Anki was restarted before moving - the audio files into the proper location. +### Run the Container in Kubernetes + +You can run this in Kubernetes using a setup similar to the following. In this setup I am hosting the audio files on an NFS share and mounting it into the `/user_files` directory in the container. The database files are in a persistent volume mounted at `/data`. + +
+ + deployment.yaml + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: yomichan-audio-server + namespace: yomichan-audio-server + labels: + app: yomichan-audio-server +spec: + replicas: 1 + revisionHistoryLimit: 0 + strategy: + type: Recreate + selector: + matchLabels: + app: yomichan-audio-server + template: + metadata: + labels: + app: yomichan-audio-server + spec: + serviceAccountName: default + containers: + - name: yomichan-audio-server + image: "caseyscarborough/yomichan-audio-server:latest" + imagePullPolicy: Always + env: + - name: TZ + value: America/New_York + - name: PUID + value: "1000" + - name: PGID + value: "1000" + - name: BIND_ADDRESS + value: "0.0.0.0" + - name: BIND_PORT + value: "5050" + - name: EXTERNAL_URL + value: "https://yourdomain.sh" + - name: DATA_DIRECTORY + value: /data + # Host the config file separately + # because we're going to mount it + # with a configmap. + - name: CONFIG_DIRECTORY + value: "/config" + ports: + - name: http + containerPort: 5050 + protocol: TCP + volumeMounts: + - name: pvc + mountPath: /data + - name: downloads + mountPath: /user_files + - name: config + mountPath: /config + volumes: + - name: config + configMap: + name: yomichan-audio-server-cm + items: + - key: config.json + path: config.json + - name: pvc + persistentVolumeClaim: + claimName: yomichan-audio-server-pvc + # You can use this to host the files on an NFS share + - name: downloads + nfs: + path: /path/to/user_files + server: 192.168.1.100 +``` - Ensure that within step 4, your file structure matches the expected file structure, - and then try regenerating the database - by navigating to `Tools` → `Local Audio Server` → `Regenerate database`. +
-* Ensure you haven't copied any files from the torrent outside of `user_files`. - If you have (or suspect you may have): - * Temporarily move the `user_files` folder outside of the add-on folder (to avoid re-downloading the audio files torrent again). - * Delete the add-on. - * Start again from step 3. +
+ configmap.yaml + +The following is my custom configuration file. I've removed the `shinmekai8_files` that is included in the torrent and added the Shinmekai 8 and NHK '99 files from AJATT Tools [here](https://github.com/Ajatt-Tools/?q=mp3&type=all&language=&sort=). Don't try adding the NHK '16 files from AJATT tools, because it won't work properly without some additional setup due to the pronunciations in the index file being Katakana and using the handakuten on "g" sounds to denote nasality. Just stick to the original `nhk16_files` for the NHK '16 source. + +```yaml +apiVersion: v1 +kind: ConfigMap +metadata: + name: yomichan-audio-server-cm + namespace: yomichan-audio-server +data: + config.json: | + { + "sources": [ + { + "type": "nhk", + "id": "nhk16", + "path": "/user_files/nhk16_files", + "display": "NHK16 %s" + }, + { + "type": "ajt_jp", + "id": "nhk98", + "path": "/user_files/nhk_1998_pronunciations_index_mp3", + "display": "NHK98 %s" + }, + { + "type": "ajt_jp", + "id": "shinmekai8", + "path": "/user_files/shinmeikai_8_pronunciations_index_mp3", + "display": "SMK8 %s" + }, + { + "type": "forvo", + "id": "forvo", + "path": "/user_files/forvo_files", + "display": "Forvo (%s)" + }, + { + "type": "jpod", + "id": "jpod", + "path": "/user_files/jpod_files", + "display": "Jpod101" + }, + { + "type": "jpod", + "id": "jpod_alternate", + "path": "/user_files/jpod_alternate_files", + "display": "JPod101 Alt" + } + ] + } +``` + +
+ +
+ service.yaml + +```yaml +apiVersion: v1 +kind: Service +metadata: + name: yomichan-audio-server + namespace: yomichan-audio-server + labels: + app: yomichan-audio-server +spec: + type: ClusterIP + ports: + - port: 5050 + name: http + targetPort: http + protocol: TCP + selector: + app: yomichan-audio-server +``` +
+ +
+ pvc.yaml + +I am using Longhorn for persistent storage but you can use whatever you like. +If this is for your `/data` directory (where the database goes), you should +likely avoid NFS though since SQLite sometimes has issues with NFS due to file locking. + +```yaml +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: yomichan-audio-server-pvc + namespace: yomichan-audio-server +spec: + accessModes: + - ReadWriteOnce + storageClassName: longhorn + resources: + requests: + storage: 1Gi +``` +
-* If nothing else works, you have questions, etc., feel free to contact - me on discord `Aquafina water bottle#3026`, - or [submit an issue](https://github.com/themoeway/local-audio-yomichan/issues). - I exist on the [TheMoeWay](https://learnjapanese.moe/join/) (see [this thread](https://discord.com/channels/617136488840429598/1074057444365443205)) and Refold (Japanese) servers. +
+ ingress.yaml + +This is an example for nginx-ingress, but you can use any Ingress. + +```yaml +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: yomichan-audio-server-ingress + namespace: yomichan-audio-server + annotations: + nginx.ingress.kubernetes.io/proxy-read-timeout: "600" + nginx.ingress.kubernetes.io/proxy-body-size: "0" +spec: + ingressClassName: nginx + tls: + - hosts: + - '*.mydomain.com' + secretName: cluster-wildcard-cert + rules: + - host: yomichan.mydomain.com + http: + paths: + - pathType: Prefix + path: / + backend: + service: + name: yomichan-audio-server + port: + number: 5050 +``` +
## Configuring sources @@ -273,76 +462,32 @@ These are additional instructions and tips if something doesn't work as expected
-## Config File - -If you want even more power, sources can be manually configured using a config file. -On top of changing the priority of sources and removing sources, you can do the following: -- Specify a path for each source folder. You can use this to store audio files in a different drive. -- Add entirely new audio sources - -### Config Setup - -1. Within the same Add-ons window, select the add-on (`Local Audio Server for Yomichan`). -1. Click `View files` to the right. Your file explorer should now be under `Anki2/addons21/1045800357`. -1. Copy `default_config.json` into `user_files`, and rename it as `config.json`. -
Expected file structure (click here) +## Build and Run from Source - 1045800357 - ├── db_utils.py - ├── server.py - ├── default_config.json - ├── ... - └── user_files - ├── config.json <-- Create this file! - ├── forvo_files - │ └── ... - ├── shinmeikai8_files - │ └── ... - ├── jpod_files - │ └── ... - └── nhk16_files - └── ... - -
+This has only been tested on Linux and macOS but it should likely work on Windows too: -### Config Usage Notes -- Whenever you edit your config, make sure you restart Anki and regenerate the database. - This will ensure your changes are fully applied. -- Do NOT edit `default_config.json`, because this file will get overwritten on every add-on update. -- If you want to change the priority of sources, ensure that your custom URL does NOT have the `sources` parameter. - The URL `sources` parameter overrides the config's source priority! - - -## Running without Anki -If you wish to run the server without Anki, do the following: ```bash -git clone https://github.com/themoeway/local-audio-yomichan.git -cd local-audio-yomichan - -# You must fill `plugin/user_files` with the audio files, like with step 3 of the main instructions. -# If you are on a *unix OS and you have already setup the Anki add-on, you can run the command below: -ln -s ~/.local/share/Anki2/addons21/1045800357/user_files ./plugin/user_files - -# After filling in `plugin/user_files` with the audio files, you can now run the server. -# Ensure you have python 3.9 or above. -python3 run_server.py +git clone https://github.com/caseyscarborough/yomichan-audio-server.git +cd yomichan-audio-server +docker build . -t yomichan-audio-server +docker run ... yomichan-audio-server ``` -## Install from Source -- For Windows users, the link script requires a bit of effort to run. - Instructions can be found at the top of the [`link.ps1`](./link.ps1) script. +## Troubleshooting +These are additional instructions and tips if something doesn't work as expected. -- Linux and MacOS users can run: - ```bash - git clone https://github.com/themoeway/local-audio-yomichan.git - cd local-audio-yomichan - ./link.sh - ``` +* Ensure the local audio server is actually running. + You can do this by visiting [http://localhost:5050](http://localhost:5050). + If it says "Local Audio Server (version)", then the server is up and running! +* Ensure you haven't copied any files from the torrent outside of `user_files`. +* If all else fails, remove the `entries*` files from your `DATA_DIRECTORY` and restart the server. ## Credits & Acknowledgements -A lot of people came together, one way or the other, to get this add-on to where it is today. -Huge thanks to everyone who made it happen: + +A huge thanks to [@Aquafina-water-bottle](https://github.com/Aquafina-water-bottle) for creating the original project. This couldn't have been done without that project. + +The following is the list of credits and acknowledgements from the original project ([themoeway/local-audio-yomichan](https://github.com/themoeway/local-audio-yomichan)): * **Zetta#3033**: Creator of the original addon + gave advice for improving query speed * **kezi#0001**: Getting NHK16 audio @@ -358,85 +503,6 @@ Huge thanks to everyone who made it happen: * **[@KamWithK](https://github.com/KamWithK)**: Creator of [Ankiconnect Android](https://github.com/KamWithK/AnkiconnectAndroid). This allows the local audio server to work on Android. Also gave advice for improving the database. * **[@DillonWall](https://github.com/DillonWall)**: Creator of [Generate Batch Audio](https://github.com/DillonWall/generate-batch-audio-anki-addon). This allows you to backfill existing cards with the local audio server, or anything else. - ## License -[MIT](https://github.com/themoeway/local-audio-yomichan/blob/master/LICENSE) - - -## Other - -
Notes on Forvo Audio Sourcing (click here) - -* The following is a slightly edited quote from person who got the Forvo audio: - - > The files for now only includes audio files with an exact 1:1 mapping of a dictionary/Marv's JPDB frequency list term to the name of the file the user uploaded. Just because you don't get audio for an user it does not mean the user has no audio on Forvo. Just because you get audio it does not mean it actually matches the current word/reading. It is also not uncommon that people pronounce multiple readings in the same file. - - The full quote can be found at the bottom of [the legacy instructions](https://github.com/themoeway/local-audio-yomichan/tree/old), under "Original Message for v09". - -
- -
Some Technical Information on the Audio (click here) - -* Opus audio has been encoded at 32k VBR. -* MP3 audio is encoded with LAME `V3` preset. -* The original audio can be found in the [build scripts repo](https://github.com/Aquafina-water-bottle/local-audio-yomichan-build-scripts#local-audio-yomichan-build-scripts) - -
- - - -
Transferring from the deprecated add-on (click here) - -* The expected display name of the addon is "Local Audio Server for Yomichan". - If your addon has the name "Yomichan Local Audio Server", then you are using the deprecated version. - - To transfer from the deprecated addon to this addon, do the following: - - Disable the old addon - - Download the new add-on (`1045800357`) - - Move the `user_files` folder from the old add-on's folder (likely `955441350`) to the new add-on's folder. Do not copy any other files from the old add-on - - Restart Anki - - If that doesn't work for some reason, see the [troubleshooting section](#troubleshooting) (you might have to regenerate the database). - -
- - -
Updating to the new audio collection (click here) - -* New collections of audio for the Local Audio Server has been finally released! These new collections improve on the old collections quite a bit: - * Forvo audio is very inconsistent in raw audio quality. To solve this, **we normalized all of the audio** (so the volume is mostly constant between all files) and stripped most silence from the ends of the audio files. - * We now offer two collections: `opus` and `mp3`. The `opus` provides the most optimal storage format, whereas `mp3` collection provides the most compatible format. Most notably, **if you are using AnkiMobile, you can now use all audio sources** by using the `mp3` collection! - * A new source has been added (thanks to [@tatsumoto-ren](https://github.com/tatsumoto-ren)): 新明解しんめいかい8 (internal id: `shinmeikai8`). - * JPod files were found to be mostly contained of literal duplicate files. To solve this, we changed the internal storage format to simply link the correct words to unique files, which ended up clearing some 30% of the JPod database. - * Using JMdict word variants data (JMdict Forms), we increased word coverage by mapping audio from variants to other variants with the same reading. - - However, credit where credit is due: None of this would have been possible - (hell, none of this would've even started) - if it wasn't for [@Mansive](https://github.com/Mansive), [@tsweet64](https://github.com/tsweet64), - and their hard work on [these pre-processing scripts](https://github.com/Aquafina-water-bottle/local-audio-yomichan-build-scripts#local-audio-yomichan-build-scripts). Thanks once again for everything! - - If you are interested in updating your audio, here's what you'll need to do: - - 1. Ensure the add-on is updated (`Tools` → `Add-ons` → `Check for Updates`) - 2. Navigate to this add-on folder: - * Within the same Add-ons window, select the add-on (`Local Audio Server for Yomichan`). - * Click `View files` to the right. Your file explorer should now be under `Anki2/addons21/1045800357`. - 3. Move the `user_files` folder somewhere findable (i.e. your desktop). - This will serve as a backup in case anything fails. - 4. Download the desired audio from [step 1 of the standard instructions](#steps), extract the archive, and move the extracted `user_files` into the add-on folder. - 5. Restart Anki, and then regenerate the Local Audio Database (`Tools` → `Local Audio Server` → `Regenerate database`) - 6. Change your custom URL (JSON) value to the following: - ``` - http://localhost:5050/?term={term}&reading={reading} - ``` - This URL removes the `sources` parameter, so sources can be added without having to - change the URL in the future. However, please note that the **default source order has changed** - to `nhk16,shinmeikai8,forvo,jpod`, to optimize for Japanese correctness over literal audio quality. - If you want to change the order the sources (i.e. to restore the previous default order), see - [here](https://github.com/themoeway/local-audio-yomichan#configuring-sources). - 7. If you are using AnkiConnectAndroid, make sure to update the app, [regenerate the Android database and send it to your device](https://github.com/KamWithK/AnkiconnectAndroid#additional-instructions-local-audio). - Don't forget to update the URL in Yomichan. - 8. Enjoy your new audio! - -
+[MIT](https://github.com/caseyscarborough/yomichan-audio-server/blob/master/LICENSE)