0.132.0
Highlights
With this release, we finally finish the stats
caching refactor started in 0.131.0, replacing the binary encoded stats cache with a simpler JSONL cache. The stats
cache stores the necessary statistical metadata to make several key commands smarter & faster. Per the benchmarks:
frequency
is 6x faster (frequency_index_stats_mode_auto
).
Not only is it faster, it now doesn't need to compile a hashmap for columns with ALL unique values (e.g. ID columns) - practically, making it able to handle "real-world" datasets of any size (that is, unless all the columns have ALL unique cardinalities. In that case, the entire CSV will have to fit into memory).tojsonl
is 2.67x faster (tojsonl_index
)schema
is two orders of magnitude (100x) faster!!! (schema_index
)
The stats cache also provides the foundation for even more "smart" features and commands in the future. It also has the side-benefit of adding a way to produce stats in JSONL format that can be used for other purposes beyond qsv.
The search
, searchset
, and replace
commands now also have a --literal
option that allows you to search for and replace strings with regex special/reserved characters. This makes it easier to search for and replace strings that contain otherwise reserved regex characters without having to escape them (especially useful with URL columns that often contain characters like ?
,:
,-
,.
, etc.)
Added
search
,searchset
&replace
: add--literal
option #2060 & 7196053slice
: added usage text examples 04afaa3publish
: added workflow to build "portable" binaries with CPU features disabledcontrib(completions)
: add--literal
forsearch
andsearchset
by @rzmk in #2061contrib(completions)
: add--literal
completion toreplace
by @rzmk in #2062- add more polars metadata in
--version
info #2073 docs
: added more info to SECURITY.md 609d4dfdocs
: expanded Goals/Non-Goals 54998e3docs
: added Installation "Option 0" quick start bf5bf82- added
search --literal
benchmark
Changed
-
stats
,schema
,frequency
&tojsonl
: stats caching refactor, replacing binary encoded stats cache with a simpler JSONL cache #2055 -
rename
stats --stats-json
option tostats --stats-jsonl
#2063 -
changed "broken pipe" error to a warning 7353275
-
docs
: update multithreading and caching sections of PERFORMANCE.md 5e6bc45 -
deps
: switch to our qsv-optimized fork of csv crate 3fc1e82 -
deps
: bump polars from 0.41.3 to 0.42.0 #2051 -
build(deps): bump actix-web from 4.8.0 to 4.9.0 by @dependabot in #2041
-
build(deps): bump flate2 from 1.0.31 to 1.0.32 by @dependabot in #2071
-
build(deps): bump indexmap from 2.3.0 to 2.4.0 by @dependabot in #2049
-
build(deps): bump reqwest from 0.12.6 to 0.12.7 by @dependabot in #2070
-
build(deps): bump rust_decimal from 1.35.0 to 1.36.0 by @dependabot in #2068
-
build(deps): bump serde from 1.0.205 to 1.0.206 by @dependabot in #2043
-
build(deps): bump serde from 1.0.206 to 1.0.207 by @dependabot in #2047
-
build(deps): bump serde from 1.0.207 to 1.0.208 by @dependabot in #2054
-
build(deps): bump serde_json from 1.0.122 to 1.0.124 by @dependabot in #2045
-
build(deps): bump serde_json from 1.0.124 to 1.0.125 by @dependabot in #2052
-
apply select clippy lint suggestions
-
updated several indirect dependencies
-
made various usage text improvements
Fixed
stats
: fix--output
delimiter inferencing based on file extension #2065- make process_input helper handle stdin better #2058
docs
: fix completions for--stats-jsonl
and qsv pro installation text update by @rzmk in #2072docs
: added Note about whyluau
feature is disabled in musl binaries - ffa2bc5 & 27d0f8e
Removed
Full Changelog: 0.131.1...0.132.0