Releases: jqnatividad/qsv
0.103.0
Added
sniff
: On Linux, short-circuit sniffing a remote file when we already know its not a CSV #976stats
: now computes variance for dates e3e6782stats
: now automatically invalidates cached stats across qsv releases 6e929dd- add magic version to --version option 455c0f2
- added CKAN-aware () legend to List of Available Commands
Changed
stats
: improve usage textstats
: use extend_from_slice for readability 23275e2validate
: do not panic if the input is not UTF-8 532cd01sniff
: simplify getting stdin last_modified property; on Linux, return detected mime type in JSON error response 0197591luau
: update embedded Luau from 0.573 to 0.576- Update nightly build instructions
- Bump qsv-sniffer from 0.9.1 to 0.9.2 by @dependabot in #972
- Bump tokio from 1.28.0 to 1.28.1 by @dependabot in #973
- Bump serde from 1.0.162 to 1.0.163 by @dependabot in #974
- cargo update bump several indirect dependencies
- pin Rust nightly to 2021-05-13
Full Changelog: 0.102.1...0.103.0
0.102.1
0.102.1 is a small patch release to fix issues in publishing the pre-built binary variants with magic for sniff
when cross-compiling.
Changed
stats
: refine--infer-boolean
option info & update test count de6390btojsonl
: refine boolcheck_first_lower_char() fn 241115e
Fixed
- tweaked GitHub Actions publishing workflows to enable building magic-enabled
sniff
on Linux. Disabled magic when cross-compiling for non-x86_64 Linux targets.
Full Changelog: 0.102.0...0.102.1
0.102.0
A lot of work was done on sniff
to make it not just a CSV dialect detector, but a general purpose file type detector leveraging 🪄 magic ✨ - able to detect mime types even for files on URLs.
sniff
can now also use the same data types as stats
with the --stats-types
option. This was primarily done to support metadata collection when registering CKAN resources not only during data entry, but also when checking resource links for bitrot, and when harvesting metadata from other systems, so stats
& sniff
can be used interchangeably based on the response time requirement and the data quality of the data source.
For example, sniff
can be used for quickly inferring metadata by just downloading a small sample from a very large data file DURING data entry ("Resource-first upload workflow"), with stats
being used later on, when the data is actually being pushed to the Datastore with Datapusher+, when data type inferences need to be guaranteed, and the entire file will need to be scanned.
Added
stats
: add--infer-boolean
option #967sniff
: add--stats-types
option #968sniff
: add magic mime-type detection on Linux #970sniff
: add--user-agent
option bd0bf78sniff
: add last_modified info ef68bff
Changed
- make
--envlist
option allocator-aware f3566dc - Bump serde from 1.0.160 to 1.0.162 by @dependabot in #962
- Bump robinraju/release-downloader from 1.7 to 1.8 by @dependabot in #960
- Bump flexi_logger from 0.25.3 to 0.25.4 by @dependabot in #965
- Bump sysinfo from 0.28.4 to 0.29.0 by @dependabot in #966
- Bump jql-runner from 6.0.6 to 6.0.7 by @dependabot in #969
- Bump polars from 0.28.0 to 0.29.0 by @dependabot in #971
- apply select clippy recommendations
- cargo update bump indirect dependencies
- change MSRV to 1.69.0
- pin Rust nightly to 2023-05-07
Fixed
sniff
: make sniff give more consistent results #958. Fixes #956- Bump qsv-sniffer from 0.8.3 to 0.9.1. Replaced all assert with proper error-handling. #961 a7c607a 43d7eaf
sniff
: fixed rowcount calculation when sniffing a URL and the entire file was actually downloaded - ef68bff
Full Changelog: 0.101.0...0.102.0
0.101.0
We're back to the future! The qsv release train is back on track, as we jump to 0.101.0 over the yanked
0.100.0 release now that self-update logic has been fixed.
Added
stats
: added more metadata to stats arg cache json - 5767e56- added target-triple to user-agent string, and changed agent name to qsv binary variant 063b080, 70f4ea3, f0fcb05
Changed
excel
: performance, safety & documentation refinements e9a283d, 3800d25, 252b01e, 6a6df0f, 67ccd85, f2908ce, 6d5105d, dbcea39, faa8ef9replace
: clarify that it works on a field-by-field basis c0e2012stats
: use extend_from_slice when possible - c71ad4efetch
&fetchpost
: replace multiple push_fields with a csv from vec - f4e0479fetch
&fetchpost
: Migrate to jql 6 #955schema
: made bincode reader buffer bigger - 39b4bb5index
: use increased default buffer size when creating index 60fe7d6- standardized user_agent processing 4c06301, 010c565
- User agent environment variable; standardized user agent processing #951
- more robust Environment Variables processing #946
- move Environment Variables to its own markdown file 77c167f
- Bump tokio from 1.27.0 to 1.28.0 by @dependabot in #945
- Bump mimalloc from 0.1.36 to 0.1.37 by @dependabot in #944
- Bump mlua from 0.9.0-beta.1 to 0.9.0-beta.2 by @dependabot in #952
- Bump flate2 from 1.0.25 to 1.0.26 by @dependabot in #954
- Bump reqwest from 0.11.16 to 0.11.17 by @dependabot in #953
- cargo update bump indirect dependencies
- pin Rust nightly to 2023-04-30
Full Changelog: 0.99.1...0.101.0
0.99.1
Even though this is a patch release, it actually contains a lot of new features and improvements.
This was done so that qsv version 0.99.0 and below can upgrade to this release, as the self-update logic
in older versions compared versions as strings, and not as semvers, preventing the older versions from updating as the yanked 0.100.0 is less than anything 0.99.0 and below when compared as strings.
The changelog below is a combination of the changelog of the yanked 0.100.0 and the changes since 0.99.0.
Added
snappy
: add validate subcommand #920sniff
: can now sniff snappy-compressed files - on the local file system and on URLs #925schema
&stats
: stats now has a--stats-binout
option whichschema
takes advantage of #931schema
: added example NYC 311 JSON schema validation file generated byqsv schema
c956212to
: added snappy auto-compression/decompression support 09a7afdto
: added dirs as input source a31fb3b and 4d4dd54to
: added unit tests for sqlite, postgres, xslx and datapackage 16f2b7e 808b018 10739c5- add dotenv file support #936 and #937
Changed
stats
&schema
: major performance improvement (30x faster) with stats binary format serialization/deserialization 73b4b20snappy
: misc improvements in #921stats
: Refine stats binary format caching in #932- bump embedded Luau from 0.5.71 to 0.5.73 d0ea7c8
- Better OOM checks. It now has two distinct modes - NORMAL and CONSERVATIVE, with NORMAL being the default. Previously, the CONSERVATIVE heuristic was the default and it was causing too many false positives #935
- Bump actions/setup-python from 4.5.0 to 4.6.0 by @dependabot in #934
- Bump emdedded Luau from 0.5.67 to 0.5.71 a67bd3e
- Bump qsv-stats from 0.7 to 0.8 9a6812a
- Bump serde from 1.0.159 to 1.0.160 by @dependabot in #918
- Bump cached from 0.42.0 to 0.43.0 by @dependabot in #919
- Bump serde_json from 1.0.95 to 1.0.96 by @dependabot in #922
- Bump pyo3 from 0.18.2 to 0.18.3 by @dependabot in #923
- Bump ext-sort from 0.1.3 to 0.1.4 by @dependabot in #929
- cargo update bump several indirect dependencies
- pin Rust nightly to 2023-04-23
Removed
snappy
is even snappier when we removed 8-cpu cap for even faster compression - going from 1.75 gb/sec to 2.25 gb/sec for the NYC 311 test data 🚀 19acf2f
Fixed
excel
: Float serialization correctness by @bluepython508 in #933luau
: only create qsv_cache directory when needed #930luau
: makeqsv_shellcmd()
helper function work with Windows f867158 and cc24acb- Self update semver parsing fixed so versions are compared as semvers, not as strings. This prevented self-update from updating from 0.99.0 to 0.100.0 as 0.99.0 > 0.100.0 when compared as strings. #940
- fixed werr macro to also format! messages c3ceaf7
New Contributors
- @bluepython508 made their first contribution in #933
Full Changelog: 0.99.0...0.99.1
0.99.0
Added
- added Snappy auto-compression/decompression support. The Snappy format was chosen primarily
because it supported streaming compression/decompression and is designed for performance. #911 - added
snappy
command. Although files ending with the ".sz" extension are automatically compressed/decompressed by qsv, thesnappy
command offers 4-5x faster multi-threaded compression. It can also be used to check if a file is Snappy-compressed or not, and can be used to compress/decompress ANY file. #911 and #916 diff
command added toqsvlite
andqsvdp
binary variants #910to
: added stdin support #913
Changed
- cargo update bump several indirect dependencies
- pin Rust nightly to 2023-04-09
Full Changelog: 0.98.0...0.99.0
0.98.0
Added
stats
: added stats caching and storing the computed stats as metadata. Doing so not only prevents unnecessary recomputation of stats, especially for very large files, it also sets the foundation for summary statistics to be used more widely across qsv to support new commands that leverages these stats - e.g.fixdata
,outliers
,describegpt
,fake
,statsviz
and multi-pass stats, etc. #902stats
: added--force
option to force recomputation of stats 2f91d0cluau
: add qsv_loadcsv helper function #908- added more info about regular expression syntax and link to https://regex101.com which now supports the Rust flavor of regex
Changed
- logging is now buffered by default #903
- renamed features to be more easily understandable: "full" -> "feature_capable", "all_full" -> "all_features" #906
- changed GitHub Actions workflows to use the new feature names
- Bump redis from 0.22.3 to 0.23.0 by @dependabot in #901
- Bump filetime from 0.2.20 to 0.2.21 by @dependabot in #904
- reenabled
fetch
andfetchpost
CI tests - cargo update bump several indirect dependencies
- pin Rust nightly to 2023-04-06
Full Changelog: 0.97.0...0.98.0
0.97.0
Since 0.96.x was not published, 0.97.0 contains the changes from 0.96.x after fixing the mimalloc build errors on some platforms.
Added
Changed
excel
: speed up float conversion by using ryu and itoa together rather than going thru core::fmt::Formatter e722753joinp
: --cross option does not require columns; added CI tests #894schema
: better, more human-readable regex patterns are generated when inferring pattern attribute; more interactive messages 1620477schema
&validate
: improve usage text; added JSON Schema Validation info 3da6847- Bump tokio from 1.26.0 to 1.27.0 by @dependabot in #887
- Bump reqwest from 0.11.15 to 0.11.16 by @dependabot in #888
- Bump serde_json from 1.0.94 to 1.0.95 by @dependabot in #889
- Bump serde from 1.0.158 to 1.0.159 by @dependabot in #890
- Bump tempfile from 3.4.0 to 3.5.0 by @dependabot in #891
- Bump polars from 0.27.2 to 0.28.0 by @dependabot in #893
- Bump mlua from 0.8 to 0.9.0-beta.1 9b7e984
- bump MSRV to Rust 1.68.2
- cargo update bump several indirect dependencies
- pin Rust nightly to 2023-04-02
Removed
luau
: removed unnecessary --exec option 0d4ccda
Fixed
- Fixed build errors on non-Windows platforms #900 by bumping mimalloc from 0.1.34 to 0.1.36
Full Changelog: 0.95.1...0.97.0
0.95.1
Changed
count
: add example/test add link from usage text 9cd3c29diff
: add examples link from usage text 4250811- Standardize --timeout option handling and exposed it with QSV_TIMEOUT env var #886
- improved self-update messages 4027306
- Bump qsv-dateparser from 0.6 to 0.7
- Bump qsv-sniffer from 0.7 to 0.8
- Bump actions/stale from 7 to 8 by @dependabot in #876
- Bump newline-converter from 0.2.2 to 0.3.0 by @dependabot in #877
- Bump rust_decimal from 1.29.0 to 1.29.1 by @dependabot in #882
- Bump regex from 1.7.2 to 1.7.3 by @dependabot in #881
- Bump sysinfo from 0.28.3 to 0.28.4 by @dependabot in #883
- Bump pyo3 from 0.18.1 to 0.18.2 by @dependabot in #885
- Bump indexmap from 1.9.2 to 1.9.3 by @dependabot in #884
- change MSRV to Rust 1.68.1
- cargo update bump several indirect dependencies
- pin Rust nightly to 2023-03-26
Full Changelog: 0.95.0...0.95.1
0.95.0
Added
luau
: added qsv_cmd() and qsv_shellcmd() helpers, detailed map error messages to help with script development #869luau
: added environment variable set/get helper functions - qsv_setenv() and qsv_getenv() #872luau
: added smart qsv_register_lookup() caching so lookup tables need not be repeatedly downloaded and can be persisted/expired as required #874luau
: added QSV_CKAN_API, QSV_CKAN_TOKEN and QSV_CACHE_DIR env vars 9b7269e
Changed
apply
&applydp
: expanded usage text to have arguments section; emptyreplace subcommand now supports column selectors #868luau
: smarter script file processing. In addition to recognizing "file:" prefix, if the script argument ends with ".lua/luau" file extensions, its automatically processed as a file #875luau
: qsv_sleep() and qsv_writefile() improvements 27358a2partition
: added arguments section to usage text; added NYC 311 example 74aa37b- Bump reqwest from 0.11.14 to 0.11.15 by @dependabot in #870
- Bump regex from 1.7.1 to 1.7.2 by @dependabot in #873
- apply select clippy lint recommendations
- cargo update bump several indirect dependencies
- pin Rust nightly to 2023-03-22
Full Changelog: 0.94.0...0.95.0