Skip to content

Releases: jqnatividad/qsv

0.24.0

06 Dec 13:57
Compare
Choose a tag to compare

Added

  • Add logging by @mhuang74 in #116
  • Environment variables for logging - QSV_LOG_LEVEL and QSV_LOG_DIR - see Logging for more details.
  • sentiment analysis apply operation by @jqnatividad in #121
  • whatlang language detection apply operation by @jqnatividad in #122
  • aarch64-apple-darwin prebuilt binary (Apple Silicon AKA M1)
  • --envlist convenience option to list all environment variables with the QSV_ prefix

Changed

  • changed MAX_JOBS heuristic logical processor divisor from 4 to 3
  • selfupdate is no longer an optional feature

New Contributors

Full Changelog: 0.23.0...0.24.0

0.23.0

29 Nov 15:29
Compare
Choose a tag to compare

MAJOR NEW FEATURES

  • added --update option. This allows qsv to check and update itself if there are new release binaries published on GitHub.

NOTE: the selfupdate feature is not enabled in the published binaries below due to unresolved OpenSSL compilation issues on certain target platforms. Once these issues are resolved, the next release will have selfupdate enabled by default.

  • added --envlist option to show all environment variables with the QSV_ prefix.
  • apply, generate, lua, foreach and selfupdate are now optional features. apply and generate are marked optional since
    they have large dependency trees; lua and foreach are very powerful commands that can be abused to issue system commands.
    Users now have the option exclude these features from their local builds. Published binaries on GitHub still have -all-features enabled.
  • added QSV_COMMENTS environment variable (contributed by @jbertovic). This allows qsv to ignore lines in the CSV (including headers) that start with the set character. EXAMPLES
  • catch input empty condition when qsv's input is empty when using select.
    (e.g. cat /dev/null | qsv select 1 will now show the error "Input is empty." instead of "Selector index 1 is out of bounds. Index must be >= 1 and <= 0.")
  • added --pad <arg> option to split command to zero-pad the generated filename by the number of <arg> places. EXAMPLE

See CHANGELOG for details.

0.22.2

22 Nov 14:41
Compare
Choose a tag to compare
  • inadvertently published 0.22.1 with wrong Cargo.toml version.

0.22.1

22 Nov 14:22
Compare
Choose a tag to compare
  • added lua and foreach feature flags. These commands are very powerful and can be easily abused or get into "foot-shooting" scenarios.
    They are now only enabled when these features are enabled during install/build.
  • censor and censor_check now support the addition of custom profanities to screen for with the --comparand option.
  • smaller stripped binaries for x86_64-unknown-linux-gnu, i686-unknown-linux-gnu, x86_64-apple-darwin targets
  • expanded apply help text
  • added more tests (currencytonum, censor, censor_check)

See CHANGELOG for details.

0.22.0

15 Nov 20:42
Compare
Choose a tag to compare

MAJOR NEW FEATURES

  • generate command. Generate test data by profiling a CSV using Markov decision process machine learning.
  • add --no-headers option to rename command (see discussion #81)
  • New environment variables:
    • QSV_DEFAULT_DELIMITER - single ascii character to use as delimiter. Overrides --delimeter option.
      Defaults to "," (comma) for CSV files and "\t" (tab) for TSV files, when not set. Note that this will also set the delimiter for qsv's output. Adapted from xsv PR by @camerondavison.
    • QSV_NO_HEADERS - when set, the first row will NOT be interpreted as headers. Supersedes QSV_TOGGLE_HEADERS.
    • QSV_MAX_JOBS - number of jobs to use for parallelized commands (currently frequency, split and stats). If not set, max_jobs is set
      to number of logical processors divided by four. See Parallelization for more info.
    • QSV_REGEX_UNICODE - if set, makes search, searchset and replace commands unicode-aware.
      For increased performance, these commands are not unicode-aware and will ignore unicode values when matching and will panic when unicode characters are used in the regex.
  • Added parallelization heuristic (num_cpus/4), in connection with QSV_MAX_JOBS.

See CHANGELOG for details.

0.21.0

08 Nov 00:49
Compare
Choose a tag to compare

MAJOR NEW FEATURES

  • added apply geocode caching, more than doubling performance in the geocode benchmark.
  • added --random and --seed options to sort command from @pjsier, enabling reproducible, randomized "scrambling" of CSVs.
  • Bash shell qsv tab completion
  • additional apply operations subcommands:
    • Match Trim operations - enables trimming of more than just whitespace, but also of multiple trim characters in one pass (Example):
    • replace: Replace all matches of a pattern (using --comparand)
      with a string (using --replacement) (Std::String replace wrapper).
    • regex_replace: Replace the leftmost-first regex match with --replacement (regex replace wrapper).
    • titlecase - capitalizes English text using Daring Fireball titlecase style
      https://daringfireball.net/2008/05/title_case
    • censor_check: check if profanity is detected (boolean) Examples
    • censor: profanity filter
  • added parameter validation to apply operations subcommands
  • added more robust parameter validation to apply command by leveraging docopt

More benchmark script improvements:

  • allow binary to be changed, so users can benchmark xsv and other xsv forks by simply replacing the $bin shell variable
  • now uses a much larger data file - a 1M row, 512 mb, 41 column sampling of NYC's 311 data

See CHANGELOG for details.

0.20.0

31 Oct 15:36
Compare
Choose a tag to compare

MAJOR NEW FEATURES

  • major refactoring of apply command:
    • to take advantage of docopt parsing/validation.
    • instead of one big command, broke down apply to several subcommands:
      • operations
      • emptyreplace
      • datefmt
      • geocode
  • added string similarity operations to apply command:
    • simdl: Damerau-Levenshtein similarity
    • simdln: Normalized Damerau-Levenshtein similarity (between 0.0 & 1.0)
    • simjw: Jaro-Winkler similarity (between 0.0 & 1.0)
    • simsd: Sørensen-Dice similarity (between 0.0 & 1.0)
    • simhm: Hamming distance. Number of positions where characters differ.
    • simod: OSA Distance.
    • soundex: sounds like (boolean)
  • added progress bars to commands that may spawn long-running jobs - for this release,
    apply, foreach, and lua. Progress bars can be suppressed with --quiet option.
  • added progress bar helper functions to utils.rs.

Benchmark improvements:

  • added apply to benchmarks.
  • added sample NYC 311 data to benchmarks.
  • added records per second (RECS_PER_SEC) to benchmarks

See CHANGELOG for details.

0.19.0

25 Oct 03:16
Compare
Choose a tag to compare

MAJOR NEW FEATURES

  • new scramble command. Randomly scrambles a CSV's records.
  • read/write buffer capacity can now be set using environment variables
    QSV_RDR_BUFFER_CAPACITY and QSV_WTR_BUFFER_CAPACITY (in bytes).
  • benchmark script revamped. Now produces aligned output onscreen,
    while also creating a benchmark TSV file; downloads the sample file from GitHub;
    benchmark more commands. Designed to help users tailor and maximize qsv's performance
    in their environment.
  • added a Performance Tuning section in the README.

See CHANGELOG for details.

0.18.2

21 Oct 15:21
Compare
Choose a tag to compare

See CHANGELOG for details.

0.18.1

21 Oct 01:01
Compare
Choose a tag to compare

See CHANGELOG for details.