Skip to content

Releases: jqnatividad/qsv

0.103.0

15 May 09:59
Compare
Choose a tag to compare

Added

  • sniff: On Linux, short-circuit sniffing a remote file when we already know its not a CSV #976
  • stats: now computes variance for dates e3e6782
  • stats: now automatically invalidates cached stats across qsv releases 6e929dd
  • add magic version to --version option 455c0f2
  • added CKAN-aware (CKAN) legend to List of Available Commands

Changed

  • stats: improve usage text
  • stats: use extend_from_slice for readability 23275e2
  • validate: do not panic if the input is not UTF-8 532cd01
  • sniff: simplify getting stdin last_modified property; on Linux, return detected mime type in JSON error response 0197591
  • luau: update embedded Luau from 0.573 to 0.576
  • Update nightly build instructions
  • Bump qsv-sniffer from 0.9.1 to 0.9.2 by @dependabot in #972
  • Bump tokio from 1.28.0 to 1.28.1 by @dependabot in #973
  • Bump serde from 1.0.162 to 1.0.163 by @dependabot in #974
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2021-05-13

Full Changelog: 0.102.1...0.103.0

0.102.1

09 May 10:02
Compare
Choose a tag to compare

0.102.1 is a small patch release to fix issues in publishing the pre-built binary variants with magic for sniff when cross-compiling.

Changed

  • stats: refine --infer-boolean option info & update test count de6390b
  • tojsonl: refine boolcheck_first_lower_char() fn 241115e

Fixed

  • tweaked GitHub Actions publishing workflows to enable building magic-enabled sniff on Linux. Disabled magic when cross-compiling for non-x86_64 Linux targets.

Full Changelog: 0.102.0...0.102.1

0.102.0

08 May 20:28
Compare
Choose a tag to compare

A lot of work was done on sniff to make it not just a CSV dialect detector, but a general purpose file type detector leveraging 🪄 magic ✨ - able to detect mime types even for files on URLs.

sniff can now also use the same data types as stats with the --stats-types option. This was primarily done to support metadata collection when registering CKAN resources not only during data entry, but also when checking resource links for bitrot, and when harvesting metadata from other systems, so stats & sniff can be used interchangeably based on the response time requirement and the data quality of the data source.

For example, sniff can be used for quickly inferring metadata by just downloading a small sample from a very large data file DURING data entry ("Resource-first upload workflow"), with stats being used later on, when the data is actually being pushed to the Datastore with Datapusher+, when data type inferences need to be guaranteed, and the entire file will need to be scanned.

Added

  • stats: add --infer-boolean option #967
  • sniff: add --stats-types option #968
  • sniff: add magic mime-type detection on Linux #970
  • sniff: add --user-agent option bd0bf78
  • sniff: add last_modified info ef68bff

Changed

  • make --envlist option allocator-aware f3566dc
  • Bump serde from 1.0.160 to 1.0.162 by @dependabot in #962
  • Bump robinraju/release-downloader from 1.7 to 1.8 by @dependabot in #960
  • Bump flexi_logger from 0.25.3 to 0.25.4 by @dependabot in #965
  • Bump sysinfo from 0.28.4 to 0.29.0 by @dependabot in #966
  • Bump jql-runner from 6.0.6 to 6.0.7 by @dependabot in #969
  • Bump polars from 0.28.0 to 0.29.0 by @dependabot in #971
  • apply select clippy recommendations
  • cargo update bump indirect dependencies
  • change MSRV to 1.69.0
  • pin Rust nightly to 2023-05-07

Fixed

  • sniff: make sniff give more consistent results #958. Fixes #956
  • Bump qsv-sniffer from 0.8.3 to 0.9.1. Replaced all assert with proper error-handling. #961 a7c607a 43d7eaf
  • sniff: fixed rowcount calculation when sniffing a URL and the entire file was actually downloaded - ef68bff

Full Changelog: 0.101.0...0.102.0

0.101.0

01 May 11:35
Compare
Choose a tag to compare

We're back to the future! The qsv release train is back on track, as we jump to 0.101.0 over the yanked
0.100.0 release now that self-update logic has been fixed.

Added

  • stats: added more metadata to stats arg cache json - 5767e56
  • added target-triple to user-agent string, and changed agent name to qsv binary variant 063b080, 70f4ea3, f0fcb05

Changed

  • excel: performance, safety & documentation refinements e9a283d, 3800d25, 252b01e, 6a6df0f, 67ccd85, f2908ce, 6d5105d, dbcea39, faa8ef9
  • replace: clarify that it works on a field-by-field basis c0e2012
  • stats: use extend_from_slice when possible - c71ad4e
  • fetch & fetchpost: replace multiple push_fields with a csv from vec - f4e0479
  • fetch & fetchpost: Migrate to jql 6 #955
  • schema: made bincode reader buffer bigger - 39b4bb5
  • index: use increased default buffer size when creating index 60fe7d6
  • standardized user_agent processing 4c06301, 010c565
  • User agent environment variable; standardized user agent processing #951
  • more robust Environment Variables processing #946
  • move Environment Variables to its own markdown file 77c167f
  • Bump tokio from 1.27.0 to 1.28.0 by @dependabot in #945
  • Bump mimalloc from 0.1.36 to 0.1.37 by @dependabot in #944
  • Bump mlua from 0.9.0-beta.1 to 0.9.0-beta.2 by @dependabot in #952
  • Bump flate2 from 1.0.25 to 1.0.26 by @dependabot in #954
  • Bump reqwest from 0.11.16 to 0.11.17 by @dependabot in #953
  • cargo update bump indirect dependencies
  • pin Rust nightly to 2023-04-30

Full Changelog: 0.99.1...0.101.0

0.99.1

24 Apr 16:35
Compare
Choose a tag to compare

Even though this is a patch release, it actually contains a lot of new features and improvements.
This was done so that qsv version 0.99.0 and below can upgrade to this release, as the self-update logic
in older versions compared versions as strings, and not as semvers, preventing the older versions from updating as the yanked 0.100.0 is less than anything 0.99.0 and below when compared as strings.

The changelog below is a combination of the changelog of the yanked 0.100.0 and the changes since 0.99.0.

Added

  • snappy: add validate subcommand #920
  • sniff: can now sniff snappy-compressed files - on the local file system and on URLs #925
  • schema & stats: stats now has a --stats-binout option which schema takes advantage of #931
  • schema: added example NYC 311 JSON schema validation file generated by qsv schema c956212
  • to: added snappy auto-compression/decompression support 09a7afd
  • to: added dirs as input source a31fb3b and 4d4dd54
  • to: added unit tests for sqlite, postgres, xslx and datapackage 16f2b7e 808b018 10739c5
  • add dotenv file support #936 and #937

Changed

  • stats & schema: major performance improvement (30x faster) with stats binary format serialization/deserialization 73b4b20
  • snappy: misc improvements in #921
  • stats: Refine stats binary format caching in #932
  • bump embedded Luau from 0.5.71 to 0.5.73 d0ea7c8
  • Better OOM checks. It now has two distinct modes - NORMAL and CONSERVATIVE, with NORMAL being the default. Previously, the CONSERVATIVE heuristic was the default and it was causing too many false positives #935
  • Bump actions/setup-python from 4.5.0 to 4.6.0 by @dependabot in #934
  • Bump emdedded Luau from 0.5.67 to 0.5.71 a67bd3e
  • Bump qsv-stats from 0.7 to 0.8 9a6812a
  • Bump serde from 1.0.159 to 1.0.160 by @dependabot in #918
  • Bump cached from 0.42.0 to 0.43.0 by @dependabot in #919
  • Bump serde_json from 1.0.95 to 1.0.96 by @dependabot in #922
  • Bump pyo3 from 0.18.2 to 0.18.3 by @dependabot in #923
  • Bump ext-sort from 0.1.3 to 0.1.4 by @dependabot in #929
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-04-23

Removed

  • snappy is even snappier when we removed 8-cpu cap for even faster compression - going from 1.75 gb/sec to 2.25 gb/sec for the NYC 311 test data 🚀 19acf2f

Fixed

  • excel: Float serialization correctness by @bluepython508 in #933
  • luau: only create qsv_cache directory when needed #930
  • luau: make qsv_shellcmd() helper function work with Windows f867158 and cc24acb
  • Self update semver parsing fixed so versions are compared as semvers, not as strings. This prevented self-update from updating from 0.99.0 to 0.100.0 as 0.99.0 > 0.100.0 when compared as strings. #940
  • fixed werr macro to also format! messages c3ceaf7

New Contributors

Full Changelog: 0.99.0...0.99.1

0.99.0

10 Apr 11:57
Compare
Choose a tag to compare

Added

  • added Snappy auto-compression/decompression support. The Snappy format was chosen primarily
    because it supported streaming compression/decompression and is designed for performance. #911
  • added snappy command. Although files ending with the ".sz" extension are automatically compressed/decompressed by qsv, the snappy command offers 4-5x faster multi-threaded compression. It can also be used to check if a file is Snappy-compressed or not, and can be used to compress/decompress ANY file. #911 and #916
  • diff command added to qsvlite and qsvdp binary variants #910
  • to: added stdin support #913

Changed

  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-04-09

Full Changelog: 0.98.0...0.99.0

0.98.0

07 Apr 13:59
Compare
Choose a tag to compare

Added

  • stats: added stats caching and storing the computed stats as metadata. Doing so not only prevents unnecessary recomputation of stats, especially for very large files, it also sets the foundation for summary statistics to be used more widely across qsv to support new commands that leverages these stats - e.g. fixdata, outliers, describegpt, fake, statsviz and multi-pass stats, etc. #902
  • stats: added --force option to force recomputation of stats 2f91d0c
  • luau: add qsv_loadcsv helper function #908
  • added more info about regular expression syntax and link to https://regex101.com which now supports the Rust flavor of regex

Changed

  • logging is now buffered by default #903
  • renamed features to be more easily understandable: "full" -> "feature_capable", "all_full" -> "all_features" #906
  • changed GitHub Actions workflows to use the new feature names
  • Bump redis from 0.22.3 to 0.23.0 by @dependabot in #901
  • Bump filetime from 0.2.20 to 0.2.21 by @dependabot in #904
  • reenabled fetch and fetchpost CI tests
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-04-06

Full Changelog: 0.97.0...0.98.0

0.97.0

04 Apr 09:59
Compare
Choose a tag to compare

Since 0.96.x was not published, 0.97.0 contains the changes from 0.96.x after fixing the mimalloc build errors on some platforms.

Added

  • excel: add --date-format option in #897 and 6a7db99
  • luau: add qsv_fileexists() helper fn f4cc60f

Changed

  • excel: speed up float conversion by using ryu and itoa together rather than going thru core::fmt::Formatter e722753
  • joinp: --cross option does not require columns; added CI tests #894
  • schema: better, more human-readable regex patterns are generated when inferring pattern attribute; more interactive messages 1620477
  • schema & validate: improve usage text; added JSON Schema Validation info 3da6847
  • Bump tokio from 1.26.0 to 1.27.0 by @dependabot in #887
  • Bump reqwest from 0.11.15 to 0.11.16 by @dependabot in #888
  • Bump serde_json from 1.0.94 to 1.0.95 by @dependabot in #889
  • Bump serde from 1.0.158 to 1.0.159 by @dependabot in #890
  • Bump tempfile from 3.4.0 to 3.5.0 by @dependabot in #891
  • Bump polars from 0.27.2 to 0.28.0 by @dependabot in #893
  • Bump mlua from 0.8 to 0.9.0-beta.1 9b7e984
  • bump MSRV to Rust 1.68.2
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-04-02

Removed

  • luau: removed unnecessary --exec option 0d4ccda

Fixed

  • Fixed build errors on non-Windows platforms #900 by bumping mimalloc from 0.1.34 to 0.1.36

Full Changelog: 0.95.1...0.97.0

0.95.1

27 Mar 06:45
Compare
Choose a tag to compare

Changed

  • count: add example/test add link from usage text 9cd3c29
  • diff: add examples link from usage text 4250811
  • Standardize --timeout option handling and exposed it with QSV_TIMEOUT env var #886
  • improved self-update messages 4027306
  • Bump qsv-dateparser from 0.6 to 0.7
  • Bump qsv-sniffer from 0.7 to 0.8
  • Bump actions/stale from 7 to 8 by @dependabot in #876
  • Bump newline-converter from 0.2.2 to 0.3.0 by @dependabot in #877
  • Bump rust_decimal from 1.29.0 to 1.29.1 by @dependabot in #882
  • Bump regex from 1.7.2 to 1.7.3 by @dependabot in #881
  • Bump sysinfo from 0.28.3 to 0.28.4 by @dependabot in #883
  • Bump pyo3 from 0.18.1 to 0.18.2 by @dependabot in #885
  • Bump indexmap from 1.9.2 to 1.9.3 by @dependabot in #884
  • change MSRV to Rust 1.68.1
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-03-26

Full Changelog: 0.95.0...0.95.1

0.95.0

23 Mar 13:04
Compare
Choose a tag to compare

Added

  • luau: added qsv_cmd() and qsv_shellcmd() helpers, detailed map error messages to help with script development #869
  • luau: added environment variable set/get helper functions - qsv_setenv() and qsv_getenv() #872
  • luau: added smart qsv_register_lookup() caching so lookup tables need not be repeatedly downloaded and can be persisted/expired as required #874
  • luau: added QSV_CKAN_API, QSV_CKAN_TOKEN and QSV_CACHE_DIR env vars 9b7269e

Changed

  • apply & applydp: expanded usage text to have arguments section; emptyreplace subcommand now supports column selectors #868
  • luau: smarter script file processing. In addition to recognizing "file:" prefix, if the script argument ends with ".lua/luau" file extensions, its automatically processed as a file #875
  • luau: qsv_sleep() and qsv_writefile() improvements 27358a2
  • partition: added arguments section to usage text; added NYC 311 example 74aa37b
  • Bump reqwest from 0.11.14 to 0.11.15 by @dependabot in #870
  • Bump regex from 1.7.1 to 1.7.2 by @dependabot in #873
  • apply select clippy lint recommendations
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-03-22

Full Changelog: 0.94.0...0.95.0