Skip to content

Releases: jqnatividad/qsv

0.94.0

18 Mar 03:16
Compare
Choose a tag to compare

Added

  • luau: qsv_register_lookup now supports "ckan://" scheme. This allows the luau script developer to fetch lookup table resources from CKAN instances. #864
  • luau: added detailed example for "dathere://" lookup scheme in https://github.com/dathere/qsv-lookup-tables repo. 3074538
  • luau: added qsv_writefile helper function. This allows the luau script developer to write text files to the current working directory. Filenames are sanitized for safety. #867
  • luau: random access mode now supports progressbars. The progressbar indicates the current record and the total number of records in the CSV file 63150a0
  • input: added --comment option which allows the user to specify the comment character.
    CSV rows that start with the comment character are skipped. #866

Changed

  • luau: added additional logging messages to help with script debugging bcff8ad
  • schema & tojsonl: refactor stdin handling 6c923b1
  • bump jsonschema from 0.16 to 0.17
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-03-17

Full Changelog: 0.93.1...0.94.0

0.93.1

15 Mar 14:29
Compare
Choose a tag to compare

Fixed

  • Fixed publishing workflow so qsvdp luau is only enabled on platforms that support it

Full Changelog: 0.93.0...0.93.1

0.93.0

15 Mar 12:57
Compare
Choose a tag to compare

Added

Changed

  • remove all glob imports #857 and #858
  • qsvdp (Datapusher+-optimized qsv binary variant) now has an embedded luau interpreter #859
  • validate: JSON Schema url now case-insensitive 3123dc6
  • Bump serde from 1.0.155 to 1.0.156 by @dependabot in #862
  • applied select clippy lint recommendations
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-03-14

0.92.0

13 Mar 15:37
Compare
Choose a tag to compare

Added

  • excel: added option to specify range to extract by @EricSoroos in #843
  • luau: added --remap option. This allows the user to only map specified columns to the output CSV #841
  • luau: added several new helper functions:
    • qsv_skip: skips writing the current record to the output CSV #854
    • qsv_break: stops processing the current CSV file #846
    • qsv_insertrecord: inserts a new record to the output CSV #845
    • qsv_register_lookup: loads a CSV that can be used as a lookup table in Luau 38e7b7e

Changed

  • luau: reorganized code for readability/maintainability #846
  • foreach: tweak usage text to say it works with shell commands, not just the bash shell 78851b3
  • split: added deeplink to examples/tests 6f293b8
  • select: added deeplink to examples/tests 72fa094
  • Switch to qsv-optimized fork of docopt.rs - qsv_docopt. As docopt.rs is unmaintained and docopt parsing is an integral part of qsv as we embed each command's usage text in a way that cannot be done by either clap or structopt #852
  • Bump embedded Luau from 0.566 to 0.567 d624e84
  • Bump csv from 1.2.0 to 1.2.1 by @dependabot in #839
  • Bump serde from 1.0.152 to 1.0.153 by @dependabot in #842
  • Bump serde from 1.0.153 to 1.0.154 by @dependabot in #844
  • Bump rust_decimal from 1.28.1 to 1.29.0 by @dependabot in #853
  • start using new crates.io sparse protocol
  • applied select clippy lint recommendations
  • cargo update bump several other dependencies
  • pin Rust nightly to 2021-03-12

Fixed

  • stats: fix stdin regression #851
  • excel: Fix missing integer headers in excel transform. by @EricSoroos in #840
  • luau: fix & improve comment remover #845

New Contributors

Full Changelog: 0.91.0...0.92.0

0.91.0

06 Mar 01:40
Compare
Choose a tag to compare

Added

  • luau: map multiple new computed columns in one call #829
  • luau: added qsv_autoindex() helper function #834
  • luau: added qsv_coalesce() helper function 3064ba2
  • luau: added _LASTROW special variable to facilitate random access mode

Changed

  • diff: rename --primary-key-idx -> --key by @janriemer in #826
  • diff: implement option to sort by columns by @janriemer in #827
  • luau: parsing improvements #835
  • luau: bump embedded luau version from 0.562 to 0.566 f4a08b4
  • sniff: major refactoring. #836
  • enable polars nightly feature when building nightly #816
  • bump qsv-sniffer from 0.6.1 to 0.7.0 5027a64
  • Bump crossbeam-channel from 0.5.6 to 0.5.7 by @dependabot in #818
  • Bump flexi_logger from 0.25.1 to 0.25.2 by @dependabot in #824
  • Bump rayon from 1.6.1 to 1.7.0 by @dependabot in #831
  • Bump ryu from 1.0.12 to 1.0.13 by @dependabot in #830
  • Bump itoa from 1.0.5 to 1.0.6 by @dependabot in #832
  • cargo update bump dependencies
  • pin Rust nightly to 2023-03-04

Fixed

  • stats: use utf8-aware truncate #819
  • sniff: fix URL sniffing 8d2c514
  • show polars version in qsv --version 586a1ed

Full Changelog: 0.90.1...0.91.0

0.90.1

28 Feb 13:14
Compare
Choose a tag to compare

Changed

  • joinp: Refactor to use LazyFrames instead of DataFrames for performance and ability to do streaming and process files larger than RAM. #814 and #815
  • luau: expanded example using qsv_log helper 5c198e4
  • handled new clippy lints e81a391
  • adjust publishing workflows to build binaries with as many features enabled. On some platforms, the to and polars(for joinp) features cannot be built.
  • cargo update bump indirect dependencies, notably arrow and duckdb
  • pin Rust nightly to 2023-02-27

Full Changelog: 0.90.0...0.90.1

0.90.0

27 Feb 17:36
Compare
Choose a tag to compare

Added

  • joinp: new join command powered by Pola.rs. This is just the first of more commands that will leverage the Pola.rs engine. #798
  • luau: added random acess mode; major refactor as we prepare to use luau as qsv's DSL; added qsv_log helper that can be called from Luau scripts to facilitate development of full-fledged data-wrangling scripts. #805 and #806
  • sniff: added URL & re-enabled stdin support; URL support features sampling only the required number of rows to sniff the metadata without downloading the entire file; expanded sniff metadata returned; added --progressbar option for URL sniffing #812
  • sniff: added --timeout option for URL inputs; now runs async from all the binary variants #813

Changed

  • diff: sort by line when no other sort option is given by @janriemer in #808
  • luau: rename --prologue/--epilogue options to --begin/--end; add embedded BEGIN/END block handling #801
  • Update to csvs_convert 0.8 by @kindly in #800
  • use simdutf8 when possible ae466cb
  • Bump self_update from 0.35.0 to 0.36.0 by @dependabot in #797
  • Bump sysinfo from 0.28.0 to 0.28.1 by @dependabot in #809
  • Bump actix-web from 4.3.0 to 4.3.1 by @dependabot in #811
  • improved conditional compilation of different variants 9e63694
  • temporarily skip CI tests that use httpbin.org as it was causing intermittent failures bee1602
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-02-26

Removed

  • Python 3.6 support removed 86b29d4

Fixed

  • sniff: does not work with stdin which fixes #803; #807
    Note that stdin support was shortly re-enabled in #812

Full Changelog: 0.89.0...0.90.0

0.89.0

20 Feb 14:53
Compare
Choose a tag to compare

Added

  • cat: added new rowskey subcommand. Unlike the existing rows subcommand, it allows far more flexible concatenation of CSV files by row, even if the files have different number of columns and column order. #795
  • added jemalloc support. As the current default mimalloc allocator is not supported in some platforms. Also, for certain workloads, jemalloc may be faster. See Memory Allocator for more info #796
  • added --no-memcheck and related QSV_NO_MEMORY_CHECK env var. This relaxes the conservative Out-of-Memory prevention heuristic of qsv. See Memory Management for more info #792

Changed

  • --version now returns "non-streaming" mode max input file size and detailed memory info. See Version details for more info #780
  • exclude: expanded usage text and added 'input parameters' help by @tmtmtmtm in #783
  • stats: performance tweaks in 96e8168, 634d42a and 7e148cf
  • Use simdutf8 to do SIMD accelerated utf8 validation, replacing problematic utf8 screening. Together with #782, completes utf8 validation revamp. #784
  • Bump sysinfo from 0.27.7 to 0.28.0 by @dependabot in #786
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-02-18

Removed

  • Removed patched versions of csv crate optimized for performance. With the release of csv 1.2, switched back to csv crate upstream. #794
  • removed utf8 first 8k screening. It was increasing code complexity and not very reliable. #782

Fixed

  • dedup: refactored to use iterators to avoid out of bounds errors. f5e547b
  • exclude: don't screen for utf8. This bugfix spurred the utf8 validation revamp, where I realized, I just needed to pull out utf8 screening #781
  • py: col, not row #793

New Contributors

Full Changelog: 0.88.2...0.89.0

0.88.2

16 Feb 05:59
Compare
Choose a tag to compare

Changed

  • also show --update and --updatenow errors on stderr in addition to log file #770
  • sortcheck: when a file is not sorted, dupecount is invalid. Set dupecount to -1 to make it plainly evident when file is not sorted. #771
  • excel: added --quiet option 99d8849
  • extdedup: minimize allocations in hot loop 62096fa
  • improved mem_file_check OOM-prevention helper function. Better error messages; clamp free memory headroom percentage between 10 and 90 percent 6701ebf and 5cd8a95
  • improved utf8 check error messages to give more detail, and not just say there is an encoding error c9b5b07
  • improved README, adding Regular Expression Syntax section; reordered sections
  • modified CI workflows to also check qsvlite
  • Bump once_cell from 1.17.0 to 1.17.1 by @dependabot in #775
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-02-15

Fixed

  • dedup unnecessarily doing utf8 check; improve input usage text #773
  • dedup: fix unstable dedup results caused by using par_sort_unstable_by #776
  • sort: fix unstable sort results caused by using par_sort_unstable_by 9f01df4
  • removed mispublished 0.88.1 release

Full Changelog: 0.88.0...0.88.2

0.88.0

13 Feb 16:47
Compare
Choose a tag to compare

Added

  • extdedup: new command to deduplicate arbitrarily large CSV/text files using a memory-buffered, on-disk hash table. Not only does it dedup very large files using constant memory, it does so while retaining the file's original sort order, unlike dedup which loads the entire file into memory to sort it first before deduping by comparing neighboring rows #762
  • Added Out-of-Memory (OOM) handling for "non-streaming" commands (i.e. commands that load the entire file into memory) using a heuristic that if an input file's size is lower than the free memory available minus a default headroom of 20 percent, qsv processing stops gracefully with a detailed message about the potential OOM condition. This headroom can be adjusted using the QSV_FREEMEMORY_HEADROOM_PCT environment variable, which has a minimum value of 10 percent #767
  • add -Q, --quiet option to all commands that return counts to stderr (dedup, extdedup, search, searchset and replace) in #768

Changed

  • sort & sortcheck: separate test suites and link from usage text #756
  • frequency: amortize allocations, preallocate with_capacity. Informal benchmarking shows an improvement of ~30%! 🚀 #761
  • extsort: refactor. Aligned options with extdedup; now also support stdin/stdout; added --memory-limit option #763
  • safenames: minor optimization a7df378
  • excel: minor optimization 75eac78
  • stats: add date inferencing false positive warning, with a recommendation how to prevent false positives a84a4e6
  • sortcheck: added note to usage text that dupe_count is only valid if file is sorted ab69f14
  • reorganized Installation section to differentiate installation options 9ef8bfc
  • bump MSRV to 1.67.1
  • applied select clippy recommendations
  • Bump flexi_logger from 0.25.0 to 0.25.1 by @dependabot in #755
  • Bump pyo3 from 0.18.0 to 0.18.1 by @dependabot in #757
  • Bump serde_json from 1.0.92 to 1.0.93 by @dependabot in #760
  • Bump filetime from 0.2.19 to 0.2.20 by @dependabot in #759
  • Bump self_update from 0.34.0 to 0.35.0 by @dependabot in #765
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-02-12

Fixed

  • sortcheck: correct wrong progress message showing invalid dupe_count (as dupe count is only valid if the file is sorted) 8eaa824
  • py & luau: correct usage text about stderr 1b56e72

Full Changelog: 0.87.1...0.88.0