Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qsv-cache now set to ~/.qsv-cache by default #1265

Merged
merged 5 commits into from
Aug 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 12 additions & 2 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 10 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,7 @@ serde = { version = "1.0.188", features = ["derive"] }
serde_json = { version = "1", features = ["preserve_order"] }
serde_stacker = { version = "0.1", optional = true }
serde_urlencoded = { version = "0.7", optional = true }
simple-home-dir = { version = "0.1", features = ["expand_tilde"], optional = true }
smartstring = { version = "1", optional = true }
snap = "1"
strsim = { version = "0.10", optional = true }
Expand Down Expand Up @@ -261,8 +262,15 @@ fetch = [
]
foreach = []
generate = ["test-data-generation"]
geocode = ["anyhow", "cached", "dynfmt", "geosuggest-core", "geosuggest-utils"]
luau = ["mlua", "sanitise-file-name"]
geocode = [
"anyhow",
"cached",
"dynfmt",
"geosuggest-core",
"geosuggest-utils",
"simple-home-dir",
]
luau = ["mlua", "sanitise-file-name", "simple-home-dir"]
python = ["pyo3"]
to = ["csvs_convert"]
lite = []
Expand Down
2 changes: 1 addition & 1 deletion docs/PERFORMANCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ qsv employs several caching strategies to improve performance:
* The `stats` command caches its results in both CSV and binary formats. It does this to avoid re-computing the same statistics when the same input file/parameters are used, but also, as statistics are used in several other commands (currently - `schema` and `tojsonl`, with [more commands using cached statistics in the future](https://github.com/jqnatividad/qsv/issues/898)).
* The `apply geocode` command [memoizes](https://en.wikipedia.org/wiki/Memoization) otherwise expensive geocoding operations and will report its cache hit rate. `apply geocode` memoization, however, is not persistent across sessions.
* The `fetch` and `fetchpost` commands also memoizes expensive REST API calls with its optional Redis support. It effectively has a persistent cache as the default time-to-live (TTL) before a Redis cache entry is expired is 28 days and Redis entries are persisted across restarts. Redis cache settings can be fine-tuned with the `QSV_REDIS_CONNSTR`, `QSV_REDIS_TTL_SECONDS`, `QSV_REDIS_TTL_REFRESH` and `QSV_FP_REDIS_CONNSTR` environment variables.
* The `luau` command caches lookup tables on disk using the QSV_CACHE_DIR environment variable and the `--cache-dir` command-line option. The default cache directory is `qsv-cache` in the current working directory. The QSV_CACHE_DIR environment variable overrides the `--cache-dir` command-line option.
* The `luau` command caches lookup tables on disk using the QSV_CACHE_DIR environment variable and the `--cache-dir` command-line option. The default cache directory is `~/.qsv-cache`. The QSV_CACHE_DIR environment variable overrides the `--cache-dir` command-line option.

## UTF-8 Encoding for Performance
[Rust strings are utf-8 encoded](https://doc.rust-lang.org/std/string/struct.String.html). As a result, qsv **REQUIRES** UTF-8 encoded files.
Expand Down
15 changes: 10 additions & 5 deletions dotenv.template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,9 @@ QSV_AUTOINDEX = False
# The directory to use for caching various qsv files.
# Used by the `geocode` command for downloaded geocoding resources.
# Used by the `luau`` command for downloaded lookup_table resources using
# the `luau` qsv_register_lookup() helper function.
# QSV_CACHE_DIR = .
# the `luau` qsv_register_lookup() helper function and the `geocode` command
# for downloaded geocoding resources.
# QSV_CACHE_DIR = ~/.qsv-cache

# The CKAN Action API endpoint to use with the `luau` qsv_register_lookup()
# helper function when using the "ckan://" scheme.
Expand Down Expand Up @@ -163,6 +164,10 @@ QSV_TIMEOUT = 30
# (default: <qsv_variant>/<version> (<target>; https://github.com/jqnatividad/qsv)).
# QSV_USER_AGENT = qsv/0.99.1 (x86_64-apple-darwin; https://github.com/jqnatividad/qsv)

# the filename of the Geonames index file
# this will be created in the QSV_CACHE_DIR directory
QSV_GEOCODE_INDEX_FILENAME = qsv-geocode-index.bincode
# the filename of the Geonames index file you wish to use for geocoding.
# If not set, the `geocode` command will download the default index file for
# that qsv version and save it in the QSV_CACHE_DIR directory for future use.
# Set this only if you have prepared your own custom Geonames index file.
# Note that you have to copy the custom index file to the QSV_CACHE_DIR directory
# for it to be used by qsv.
# QSV_GEOCODE_INDEX_FILENAME = my-qsv-geocode-index.bincode
33 changes: 21 additions & 12 deletions src/cmd/geocode.rs
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ geocode options:
--cache-dir <dir> The directory to use for caching the Geonames cities index.
If the directory does not exist, qsv will attempt to create it.
If the QSV_CACHE_DIR envvar is set, it will be used instead.
[default: qsv-cache]
[default: ~/.qsv-cache]

Common options:
-h, --help Display this message
Expand Down Expand Up @@ -173,6 +173,7 @@ use rayon::{
};
use regex::Regex;
use serde::Deserialize;
use simple_home_dir::expand_tilde;

use crate::{
clitypes::CliError,
Expand Down Expand Up @@ -288,26 +289,34 @@ async fn geocode_main(args: Args) -> CliResult<()> {
};

// setup cache directory
let geocode_cache_dir = if let Ok(cache_dir) = std::env::var("QSV_CACHE_DIR") {
let mut geocode_cache_dir = if let Ok(cache_dir) = std::env::var("QSV_CACHE_DIR") {
// if QSV_CACHE_DIR env var is set, check if it exists. If it doesn't, create it.
if !Path::new(&cache_dir).exists() {
fs::create_dir_all(&cache_dir)?;
if cache_dir.starts_with('~') {
// QSV_CACHE_DIR starts with ~, expand it
expand_tilde(&cache_dir).unwrap()
} else {
PathBuf::from(cache_dir)
}
cache_dir
} else {
if !Path::new(&args.flag_cache_dir).exists() {
fs::create_dir_all(&args.flag_cache_dir)?;
// QSV_CACHE_DIR env var is not set, use args.flag_cache_dir
// first check if it starts with ~, expand it
if args.flag_cache_dir.starts_with('~') {
expand_tilde(&args.flag_cache_dir).unwrap()
} else {
PathBuf::from(&args.flag_cache_dir)
}
args.flag_cache_dir.clone()
};
info!("Using cache directory: {geocode_cache_dir}");
if !Path::new(&geocode_cache_dir).exists() {
fs::create_dir_all(&geocode_cache_dir)?;
}

info!("Using cache directory: {}", geocode_cache_dir.display());

let geocode_index_filename = std::env::var("QSV_GEOCODE_INDEX_FILENAME")
.unwrap_or_else(|_| DEFAULT_GEOCODE_INDEX_FILENAME.to_string());
let geocode_index_file = args.arg_index_file.clone().unwrap_or_else(|| {
let mut path = PathBuf::from(geocode_cache_dir);
path.push(geocode_index_filename);
path.to_string_lossy().to_string()
geocode_cache_dir.push(geocode_index_filename);
geocode_cache_dir.to_string_lossy().to_string()
});

// setup languages
Expand Down
31 changes: 20 additions & 11 deletions src/cmd/luau.rs
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@ Luau options:
resources using the qsv_register_lookup() helper function.
If the directory does not exist, qsv will attempt to create it.
If the QSV_CACHE_DIR envvar is set, it will be used instead.
[default: qsv-cache]
[default: ~/.qsv-cache]

Common options:
-h, --help Display this message
Expand Down Expand Up @@ -223,6 +223,7 @@ use indicatif::{ProgressBar, ProgressDrawTarget, ProgressStyle};
use log::{debug, info, log_enabled};
use mlua::{Lua, LuaSerdeExt, Value};
use serde::Deserialize;
use simple_home_dir::expand_tilde;
use strum_macros::IntoStaticStr;
use tempfile;

Expand Down Expand Up @@ -515,20 +516,28 @@ pub fn run(argv: &[&str]) -> CliResult<()> {

// check if qsv_registerlookup_used is set, if it is, setup the qsv_cache directory
if qsv_register_lookup_used {
if let Ok(cache_path) = std::env::var("QSV_CACHE_DIR") {
let qsv_cache_dir = if let Ok(cache_path) = std::env::var("QSV_CACHE_DIR") {
// if QSV_CACHE_DIR env var is set, check if it exists. If it doesn't, create it.
if !Path::new(&cache_path).exists() {
fs::create_dir_all(&cache_path)?;
if cache_path.starts_with('~') {
// expand the tilde
let expanded_dir = expand_tilde(&cache_path).unwrap();
expanded_dir.to_string_lossy().to_string()
} else {
cache_path
}
info!("Using cache directory: {cache_path}");
globals.set("_QSV_CACHE_DIR", cache_path)?;
} else if args.flag_cache_dir.starts_with('~') {
// expand the tilde
let expanded_dir = expand_tilde(&args.flag_cache_dir).unwrap();
expanded_dir.to_string_lossy().to_string()
} else {
if !Path::new(&args.flag_cache_dir).exists() {
fs::create_dir_all(&args.flag_cache_dir)?;
}
info!("Using cache directory: {}", args.flag_cache_dir);
globals.set("_QSV_CACHE_DIR", args.flag_cache_dir.clone())?;
args.flag_cache_dir.clone()
};
if !Path::new(&qsv_cache_dir).exists() {
fs::create_dir_all(&qsv_cache_dir)?;
}

info!("Using cache directory: {qsv_cache_dir}");
globals.set("_QSV_CACHE_DIR", qsv_cache_dir)?;
}

debug!("Main processing");
Expand Down
Loading