Skip to content

Latest commit

 

History

History
202 lines (143 loc) · 9.36 KB

README.md

File metadata and controls

202 lines (143 loc) · 9.36 KB

Bash Cache

Bash Cache provides a transparent mechanism for caching, or memoizing, long-running Bash functions. Although it can be used for scripting its motivating purpose is to cache the results of expensive commands for display in your terminal prompt.

Originally part of ProfileGem and prompt.gem, this functionality has been pulled out into a standalone utility.

This library has also inspired bkt, a standalone binary for caching subprocess invocations. If bash-cache doesn't fit your use case see if bkt does.

Installation

Simply source bash-cache.sh into your script or shell.

Usage

bc::cache FUNCTION TTL REFRESH [ENV_VARS ...]

To cache a function pass its name to bc::cache, along with the amount of time the cached results should persist. This function decorates an existing Bash function, wrapping it with a caching layer that temporarily retains the output and exit status of the backing function.

By default the cache is keyed off the function arguments (meaning some_func, some_func bar, and some_func baz are each cached separately).

Cached data is shared across processes by default; see below for ways to change this.

Some example usages can be seen in the prompt.gem project.

Fidelity

There are many existing command-caching utilities and patterns in the wild, however the cached behavior is typically incomplete (often only caching stdout). bash-cache strives to provide high-fidelity caching, such that cached results are as close to indiscernable as possible.

  • stdout and stderr are both cached, and output separately to stdout and stderr respectively
  • output is lossless; many implementations can't handle trailing whitespace or nul bytes
  • exit status code is preserved
  • positional arguments are respected; naive implementations may conflate foo bar baz (two args) and foo 'bar baz' (one arg with whitespace)

Cache durations

Each cached result is associated with two durations; the TTL deadline and the refresh deadline. Durations can be specified in (s)econds, (m)inutes, (h)ours, and (d)ays, for example 30s, 1d, or 1h24m5s.

  • Once a cached result exceeds its TTL it is eligible for cleanup, and will shortly be removed. Note that until it is cleaned up the cached data may still be returned from the cache. 1m is a recommended TTL duration for functions that will be surfaced in a prompt.
  • If a cached result exceeds its refresh deadline it will be asynchronously updated when the function is invoked. The cached data will continue to be used until the refresh completes. 10s is a recommended refresh duration for functions that will be surfaced in a prompt.

Customizing the cache key

If your function depends on additional state, such as the current working directory, you'll want to ensure the cache is keyed off that state, in addition to the function's arguments. To do so pass any relevant environment variable names to bc::cache after the function name.

  • PWD is often used in order to cache a function based on the current working directory.
  • $ is less common, but can be used to isolate a function's cache to the current process. Note you'll need to single-quote this argument ('$').

Example usage

You can invoke bc::cache at any time, however you're encouraged to do so immediately following the function definition as a form of self-documentation, similar to Python's @decorator notation:

my_expensive_function() {
  ...
} && bc::cache my_expensive_function 1m 10s PWD

Notice in this example PWD is specified, meaning the cache will key off the current working directory in addition to any arguments to the function.

Performance

Cached data is stored on-disk, which means accessing the cache will typically be much slower than directly executing many simple commands. Generally speaking, operations which benefit from caching are accessing the disk themselves or doing network I/O. You should benchmark your functions with and without caching (see bc::benchmark) to ensure you see a meaningful improvement before deciding to cache a particular function.

Caching performance can differ drastically across machines. Notably, if the cache directory (under /tmp or TMPDIR by default) is on a tmpfs partition or a solid-state drive performance will be significantly better than caching to a spinning disk.

Calling the original function

The original function is renamed to bc::orig::[FUNCTION_NAME] (e.g. bc::orig::my_expensive_function). This can be used to bypass caching if needed.

Manually refreshing the cache

Two other functions, bc::warm::[FUNCTION_NAME] and bc::force::[FUNCTION_NAME], are provided to update the cache on demand. Both unconditionally execute and cache the backing function, but differ in how they are intended to be used.

bc::warm::[FUNCTION_NAME] refreshes a cached invocation asynchronously and silently, returning control to the caller immediately. This can be used to ensure invocations are freshly cached before the output is needed. For example, prompt.gem can warm cached functions displayed in the prompt before the prompt is actually constructed, allowing the functions to be refreshed concurrently rather than one at a time.

By contrast bc::force::[FUNCTION_NAME] forces a synchronous cache refresh, blocking until the backing function completes and outputing the cached contents exactly like calling [FUNCTION_NAME].

Cleanup

A cleanup task is run regularly to remove stale cache data, however no attempt is made to clean up the cache directory on exit since by design the cache can be shared by multiple processes. By default, cached data is stored in a temp directory that the OS will clean up from time to time (generally on reboot), but if you override the cache directory via BC_CACHE_DIR you may want to clean up the directory yourself.

Note: cached data is cleaned up asynchronously, therefore data may persist longer than the specified TTL duration.

Locking

By design the caching provided by bash-cache is racy - concurrent invocations may or may not end up reusing the same cached value. For most cases (idempotent functions, to be precise) this should be sufficient.

For cases where concurrent calls to the backing function are problematic, use bc::locked_cache instead of bc::cache. This behaves identically to bc::cache but uses an advisory mutex lock to prevent concurrent invocations of the backing function.

Note that needing mutual-exclusion is a strong signal that you should be using a more powerful language than Bash, and that the locking bash-cache provides is advisory and best-effort only.

Other Functions

bc::benchmark

Benchmarks a function without caching enabled, and with a cold and warm cache. This allows you to see the overhead introduced by Bash Cache and decide if it's beneficial for your function.

This function runs in a subshell against a clean cache directory, and works for any function - you do not need to have previously called bc::cache.

bc::benchmark_memoize provides the same basic benchmarking for bc::memoize.

bc::copy_function

This helper function copies an existing function to a new name. This can be used to decorate or replace a function by first copying the function and then defining a new function with the original name. This is how bc::cache overwrites the function being decorated.

If desired you can stop caching a particular function by copying the bc::orig::... function back to its original name:

bc::copy_function bc::orig::my_expensive_function my_expensive_function

bc::on and bc::off

Enables or disables caching process-wide. If bc::off is called all cached functions will delegate immediately to the original function they decorate and will not attempt to use cached data or cache new data. Call bc::on to re-enable caching.

Configuration

Use an isolated cache directory

By default bash-cache stores cached output in a user-specific directory under /tmp or the path specified by TMPDIR. To use a different path as the cache root set BC_CACHE_DIR before sourcing bash-cache.sh. This is useful if you're using Bash Cache across multiple scripts, as you could otherwise run into namespace collisions (e.g. two scripts caching different functions with the same name).

Copyright and License

Copyright 2012-2020 Michael Diamond

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.