Releases: jqnatividad/qsv
Releases · jqnatividad/qsv
0.94.0
Added
luau
: qsv_register_lookup now supports "ckan://" scheme. This allows the luau script developer to fetch lookup table resources from CKAN instances. #864luau
: added detailed example for "dathere://" lookup scheme in https://github.com/dathere/qsv-lookup-tables repo. 3074538luau
: addedqsv_writefile
helper function. This allows the luau script developer to write text files to the current working directory. Filenames are sanitized for safety. #867luau
: random access mode now supports progressbars. The progressbar indicates the current record and the total number of records in the CSV file 63150a0input
: added --comment option which allows the user to specify the comment character.
CSV rows that start with the comment character are skipped. #866
Changed
luau
: added additional logging messages to help with script debugging bcff8adschema
&tojsonl
: refactor stdin handling 6c923b1- bump jsonschema from 0.16 to 0.17
- cargo update bump several indirect dependencies
- pin Rust nightly to 2023-03-17
Full Changelog: 0.93.1...0.94.0
0.93.1
Fixed
- Fixed publishing workflow so qsvdp
luau
is only enabled on platforms that support it
Full Changelog: 0.93.0...0.93.1
0.93.0
Added
luau
: qsv_register_lookup helper function now works with CSVs on URLs #860luau
: added support for "dathere://" lookup scheme, allowing users to conveniently load oft-used lookup tables from https://github.com/dathere/qsv-lookup-tables #861luau
: added detailed API definitions for Luau Helper Functions https://github.com/jqnatividad/qsv/blob/605b38b5636382d45f96d3d9d3c404bb20efaf15/src/cmd/luau.rs#L1156-L1497validate
: added --timeout option when downloading JSON Schemas 605b38b
Changed
- remove all glob imports #857 and #858
- qsvdp (Datapusher+-optimized qsv binary variant) now has an embedded
luau
interpreter #859 validate
: JSON Schema url now case-insensitive 3123dc6- Bump serde from 1.0.155 to 1.0.156 by @dependabot in #862
- applied select clippy lint recommendations
- cargo update bump several indirect dependencies
- pin Rust nightly to 2023-03-14
0.92.0
Added
excel
: added option to specify range to extract by @EricSoroos in #843luau
: added --remap option. This allows the user to only map specified columns to the output CSV #841luau
: added several new helper functions:
Changed
luau
: reorganized code for readability/maintainability #846foreach
: tweak usage text to say it works with shell commands, not just the bash shell 78851b3split
: added deeplink to examples/tests 6f293b8select
: added deeplink to examples/tests 72fa094- Switch to qsv-optimized fork of docopt.rs - qsv_docopt. As docopt.rs is unmaintained and docopt parsing is an integral part of qsv as we embed each command's usage text in a way that cannot be done by either clap or structopt #852
- Bump embedded Luau from 0.566 to 0.567 d624e84
- Bump csv from 1.2.0 to 1.2.1 by @dependabot in #839
- Bump serde from 1.0.152 to 1.0.153 by @dependabot in #842
- Bump serde from 1.0.153 to 1.0.154 by @dependabot in #844
- Bump rust_decimal from 1.28.1 to 1.29.0 by @dependabot in #853
- start using new crates.io sparse protocol
- applied select clippy lint recommendations
- cargo update bump several other dependencies
- pin Rust nightly to 2021-03-12
Fixed
stats
: fix stdin regression #851excel
: Fix missing integer headers in excel transform. by @EricSoroos in #840luau
: fix & improve comment remover #845
New Contributors
- @EricSoroos made their first contribution in #840
Full Changelog: 0.91.0...0.92.0
0.91.0
Added
luau
: map multiple new computed columns in one call #829luau
: addedqsv_autoindex()
helper function #834luau
: addedqsv_coalesce()
helper function 3064ba2luau
: added_LASTROW
special variable to facilitate random access mode
Changed
diff
: rename --primary-key-idx -> --key by @janriemer in #826diff
: implement option to sort by columns by @janriemer in #827luau
: parsing improvements #835luau
: bump embedded luau version from 0.562 to 0.566 f4a08b4sniff
: major refactoring. #836- enable polars nightly feature when building nightly #816
- bump qsv-sniffer from 0.6.1 to 0.7.0 5027a64
- Bump crossbeam-channel from 0.5.6 to 0.5.7 by @dependabot in #818
- Bump flexi_logger from 0.25.1 to 0.25.2 by @dependabot in #824
- Bump rayon from 1.6.1 to 1.7.0 by @dependabot in #831
- Bump ryu from 1.0.12 to 1.0.13 by @dependabot in #830
- Bump itoa from 1.0.5 to 1.0.6 by @dependabot in #832
- cargo update bump dependencies
- pin Rust nightly to 2023-03-04
Fixed
stats
: use utf8-aware truncate #819sniff
: fix URL sniffing 8d2c514- show polars version in
qsv --version
586a1ed
Full Changelog: 0.90.1...0.91.0
0.90.1
Changed
joinp
: Refactor to use LazyFrames instead of DataFrames for performance and ability to do streaming and process files larger than RAM. #814 and #815luau
: expanded example usingqsv_log
helper 5c198e4- handled new clippy lints e81a391
- adjust publishing workflows to build binaries with as many features enabled. On some platforms, the
to
andpolars
(forjoinp
) features cannot be built. - cargo update bump indirect dependencies, notably arrow and duckdb
- pin Rust nightly to 2023-02-27
Full Changelog: 0.90.0...0.90.1
0.90.0
Added
joinp
: new join command powered by Pola.rs. This is just the first of more commands that will leverage the Pola.rs engine. #798luau
: added random acess mode; major refactor as we prepare to useluau
as qsv's DSL; addedqsv_log
helper that can be called from Luau scripts to facilitate development of full-fledged data-wrangling scripts. #805 and #806sniff
: added URL & re-enabled stdin support; URL support features sampling only the required number of rows to sniff the metadata without downloading the entire file; expanded sniff metadata returned; added--progressbar
option for URL sniffing #812sniff
: added--timeout
option for URL inputs; now runs async from all the binary variants #813
Changed
diff
: sort by line when no other sort option is given by @janriemer in #808luau
: rename--prologue
/--epilogue
options to--begin
/--end
; add embedded BEGIN/END block handling #801- Update to csvs_convert 0.8 by @kindly in #800
- use simdutf8 when possible ae466cb
- Bump self_update from 0.35.0 to 0.36.0 by @dependabot in #797
- Bump sysinfo from 0.28.0 to 0.28.1 by @dependabot in #809
- Bump actix-web from 4.3.0 to 4.3.1 by @dependabot in #811
- improved conditional compilation of different variants 9e63694
- temporarily skip CI tests that use httpbin.org as it was causing intermittent failures bee1602
- cargo update bump several indirect dependencies
- pin Rust nightly to 2023-02-26
Removed
- Python 3.6 support removed 86b29d4
Fixed
sniff
: does not work with stdin which fixes #803; #807
Note that stdin support was shortly re-enabled in #812
Full Changelog: 0.89.0...0.90.0
0.89.0
Added
cat
: added newrowskey
subcommand. Unlike the existingrows
subcommand, it allows far more flexible concatenation of CSV files by row, even if the files have different number of columns and column order. #795- added jemalloc support. As the current default mimalloc allocator is not supported in some platforms. Also, for certain workloads, jemalloc may be faster. See Memory Allocator for more info #796
- added
--no-memcheck
and relatedQSV_NO_MEMORY_CHECK
env var. This relaxes the conservative Out-of-Memory prevention heuristic of qsv. See Memory Management for more info #792
Changed
--version
now returns "non-streaming" mode max input file size and detailed memory info. See Version details for more info #780exclude
: expanded usage text and added 'input parameters' help by @tmtmtmtm in #783stats
: performance tweaks in 96e8168, 634d42a and 7e148cf- Use simdutf8 to do SIMD accelerated utf8 validation, replacing problematic utf8 screening. Together with #782, completes utf8 validation revamp. #784
- Bump sysinfo from 0.27.7 to 0.28.0 by @dependabot in #786
- cargo update bump several indirect dependencies
- pin Rust nightly to 2023-02-18
Removed
- Removed patched versions of csv crate optimized for performance. With the release of csv 1.2, switched back to csv crate upstream. #794
- removed utf8 first 8k screening. It was increasing code complexity and not very reliable. #782
Fixed
dedup
: refactored to use iterators to avoid out of bounds errors. f5e547bexclude
: don't screen for utf8. This bugfix spurred the utf8 validation revamp, where I realized, I just needed to pull out utf8 screening #781py
:col
, notrow
#793
New Contributors
Full Changelog: 0.88.2...0.89.0
0.88.2
Changed
- also show
--update
and--updatenow
errors on stderr in addition to log file #770 sortcheck
: when a file is not sorted, dupecount is invalid. Set dupecount to -1 to make it plainly evident when file is not sorted. #771excel
: added--quiet
option 99d8849extdedup
: minimize allocations in hot loop 62096fa- improved mem_file_check OOM-prevention helper function. Better error messages; clamp free memory headroom percentage between 10 and 90 percent 6701ebf and 5cd8a95
- improved utf8 check error messages to give more detail, and not just say there is an encoding error c9b5b07
- improved README, adding Regular Expression Syntax section; reordered sections
- modified CI workflows to also check qsvlite
- Bump once_cell from 1.17.0 to 1.17.1 by @dependabot in #775
- cargo update bump several indirect dependencies
- pin Rust nightly to 2023-02-15
Fixed
dedup
unnecessarily doing utf8 check; improveinput
usage text #773dedup
: fix unstable dedup results caused by usingpar_sort_unstable_by
#776sort
: fix unstable sort results caused by usingpar_sort_unstable_by
9f01df4- removed mispublished 0.88.1 release
Full Changelog: 0.88.0...0.88.2
0.88.0
Added
extdedup
: new command to deduplicate arbitrarily large CSV/text files using a memory-buffered, on-disk hash table. Not only does it dedup very large files using constant memory, it does so while retaining the file's original sort order, unlikededup
which loads the entire file into memory to sort it first before deduping by comparing neighboring rows #762- Added Out-of-Memory (OOM) handling for "non-streaming" commands (i.e. commands that load the entire file into memory) using a heuristic that if an input file's size is lower than the free memory available minus a default headroom of 20 percent, qsv processing stops gracefully with a detailed message about the potential OOM condition. This headroom can be adjusted using the
QSV_FREEMEMORY_HEADROOM_PCT
environment variable, which has a minimum value of 10 percent #767 - add
-Q, --quiet
option to all commands that return counts to stderr (dedup
,extdedup
,search
,searchset
andreplace
) in #768
Changed
sort
&sortcheck
: separate test suites and link from usage text #756frequency
: amortize allocations, preallocate with_capacity. Informal benchmarking shows an improvement of ~30%! 🚀 #761extsort
: refactor. Aligned options withextdedup
; now also support stdin/stdout; added--memory-limit
option #763safenames
: minor optimization a7df378excel
: minor optimization 75eac78stats
: add date inferencing false positive warning, with a recommendation how to prevent false positives a84a4e6sortcheck
: added note to usage text that dupe_count is only valid if file is sorted ab69f14- reorganized Installation section to differentiate installation options 9ef8bfc
- bump MSRV to 1.67.1
- applied select clippy recommendations
- Bump flexi_logger from 0.25.0 to 0.25.1 by @dependabot in #755
- Bump pyo3 from 0.18.0 to 0.18.1 by @dependabot in #757
- Bump serde_json from 1.0.92 to 1.0.93 by @dependabot in #760
- Bump filetime from 0.2.19 to 0.2.20 by @dependabot in #759
- Bump self_update from 0.34.0 to 0.35.0 by @dependabot in #765
- cargo update bump several indirect dependencies
- pin Rust nightly to 2023-02-12
Fixed
sortcheck
: correct wrong progress message showing invalid dupe_count (as dupe count is only valid if the file is sorted) 8eaa824py
&luau
: correct usage text about stderr 1b56e72
Full Changelog: 0.87.1...0.88.0