Releases: jqnatividad/qsv
Releases · jqnatividad/qsv
0.24.0
Added
- Add logging by @mhuang74 in #116
- Environment variables for logging -
QSV_LOG_LEVEL
andQSV_LOG_DIR
- see Logging for more details. sentiment
analysisapply
operation by @jqnatividad in #121whatlang
language detectionapply
operation by @jqnatividad in #122- aarch64-apple-darwin prebuilt binary (Apple Silicon AKA M1)
--envlist
convenience option to list all environment variables with theQSV_
prefix
Changed
- changed
MAX_JOBS
heuristic logical processor divisor from 4 to 3 selfupdate
is no longer an optional feature
New Contributors
Full Changelog: 0.23.0...0.24.0
0.23.0
MAJOR NEW FEATURES
- added
--update
option. This allows qsv to check and update itself if there are new release binaries published on GitHub.
NOTE: the
selfupdate
feature is not enabled in the published binaries below due to unresolved OpenSSL compilation issues on certain target platforms. Once these issues are resolved, the next release will haveselfupdate
enabled by default.
- added
--envlist
option to show all environment variables with theQSV_
prefix. apply
,generate
,lua
,foreach
andselfupdate
are now optional features.apply
andgenerate
are marked optional since
they have large dependency trees;lua
andforeach
are very powerful commands that can be abused to issue system commands.
Users now have the option exclude these features from their local builds. Published binaries on GitHub still have-all-features
enabled.- added
QSV_COMMENTS
environment variable (contributed by @jbertovic). This allows qsv to ignore lines in the CSV (including headers) that start with the set character. EXAMPLES - catch input empty condition when qsv's input is empty when using
select
.
(e.g.cat /dev/null | qsv select 1
will now show the error "Input is empty." instead of "Selector index 1 is out of bounds. Index must be >= 1 and <= 0.") - added
--pad <arg>
option tosplit
command to zero-pad the generated filename by the number of<arg>
places. EXAMPLE
See CHANGELOG for details.
0.22.2
- inadvertently published 0.22.1 with wrong Cargo.toml version.
0.22.1
- added
lua
andforeach
feature flags. These commands are very powerful and can be easily abused or get into "foot-shooting" scenarios.
They are now only enabled when these features are enabled during install/build. censor
andcensor_check
now support the addition of custom profanities to screen for with the--comparand
option.- smaller stripped binaries for
x86_64-unknown-linux-gnu
,i686-unknown-linux-gnu
,x86_64-apple-darwin
targets - expanded
apply
help text - added more tests (currencytonum, censor, censor_check)
See CHANGELOG for details.
0.22.0
MAJOR NEW FEATURES
generate
command. Generate test data by profiling a CSV using Markov decision process machine learning.- add
--no-headers
option torename
command (see discussion #81) - New environment variables:
QSV_DEFAULT_DELIMITER
- single ascii character to use as delimiter. Overrides--delimeter
option.
Defaults to "," (comma) for CSV files and "\t" (tab) for TSV files, when not set. Note that this will also set the delimiter for qsv's output. Adapted from xsv PR by @camerondavison.QSV_NO_HEADERS
- when set, the first row will NOT be interpreted as headers. SupersedesQSV_TOGGLE_HEADERS
.QSV_MAX_JOBS
- number of jobs to use for parallelized commands (currentlyfrequency
,split
andstats
). If not set, max_jobs is set
to number of logical processors divided by four. See Parallelization for more info.QSV_REGEX_UNICODE
- if set, makessearch
,searchset
andreplace
commands unicode-aware.
For increased performance, these commands are not unicode-aware and will ignore unicode values when matching and will panic when unicode characters are used in the regex.
- Added parallelization heuristic (num_cpus/4), in connection with
QSV_MAX_JOBS
.
See CHANGELOG for details.
0.21.0
MAJOR NEW FEATURES
- added
apply geocode
caching, more than doubling performance in the geocode benchmark. - added
--random
and--seed
options tosort
command from @pjsier, enabling reproducible, randomized "scrambling" of CSVs. - Bash shell qsv tab completion
- additional
apply operations
subcommands:- Match Trim operations - enables trimming of more than just whitespace, but also of multiple trim characters in one pass (Example):
- mtrim: Trims
--comparand
matches left & right of the string (trim_matches wrapper) - mltrim: Left trim
--comparand
matches (trim_start_matches wrapper) - mrtrim: Right trim
--comparand
matches (trim_end_matches wrapper)
- mtrim: Trims
- replace: Replace all matches of a pattern (using
--comparand
)
with a string (using--replacement
) (Std::String replace wrapper). - regex_replace: Replace the leftmost-first regex match with
--replacement
(regex replace wrapper). - titlecase - capitalizes English text using Daring Fireball titlecase style
https://daringfireball.net/2008/05/title_case - censor_check: check if profanity is detected (boolean) Examples
- censor: profanity filter
- Match Trim operations - enables trimming of more than just whitespace, but also of multiple trim characters in one pass (Example):
- added parameter validation to
apply operations
subcommands - added more robust parameter validation to
apply
command by leveraging docopt
More benchmark script improvements:
- allow binary to be changed, so users can benchmark xsv and other xsv forks by simply replacing the $bin shell variable
- now uses a much larger data file - a 1M row, 512 mb, 41 column sampling of NYC's 311 data
See CHANGELOG for details.
0.20.0
MAJOR NEW FEATURES
- major refactoring of
apply
command:- to take advantage of docopt parsing/validation.
- instead of one big command, broke down apply to several subcommands:
- operations
- emptyreplace
- datefmt
- geocode
- added string similarity operations to
apply
command:- simdl: Damerau-Levenshtein similarity
- simdln: Normalized Damerau-Levenshtein similarity (between 0.0 & 1.0)
- simjw: Jaro-Winkler similarity (between 0.0 & 1.0)
- simsd: Sørensen-Dice similarity (between 0.0 & 1.0)
- simhm: Hamming distance. Number of positions where characters differ.
- simod: OSA Distance.
- soundex: sounds like (boolean)
- added progress bars to commands that may spawn long-running jobs - for this release,
apply
,foreach
, andlua
. Progress bars can be suppressed with--quiet
option. - added progress bar helper functions to utils.rs.
Benchmark improvements:
- added
apply
to benchmarks. - added sample NYC 311 data to benchmarks.
- added records per second (RECS_PER_SEC) to benchmarks
See CHANGELOG for details.
0.19.0
MAJOR NEW FEATURES
- new
scramble
command. Randomly scrambles a CSV's records. - read/write buffer capacity can now be set using environment variables
QSV_RDR_BUFFER_CAPACITY
andQSV_WTR_BUFFER_CAPACITY
(in bytes). - benchmark script revamped. Now produces aligned output onscreen,
while also creating a benchmark TSV file; downloads the sample file from GitHub;
benchmark more commands. Designed to help users tailor and maximize qsv's performance
in their environment. - added a Performance Tuning section in the README.
See CHANGELOG for details.