15 Jul 10:56

756bfba

0.129.1

This is a small patch release to fix some publishing issues, update tab completion, and to fix minor CI errors.
See 0.129.0 release notes to get the details on qsv's biggest release to date!

Changed

clipboard: add error handling based on clipboard::Error by @rzmk in #1970
contrib(completions): add all commands (except applydp & generate) by @rzmk in #1971
Temporarily suppressed some CI tests that were flaky on GH macOS Apple Silicon action runners. They previously worked fine on self-hosted macOS Apple Silicon action runners that are temporarily unavailable.

Full Changelog: 0.129.0...0.129.1

Contributors

rzmk

Assets 15

14 Jul 10:06

jqnatividad

0.129.0

640a32c

0.129.0

This release is the biggest one ever!

Packed with new features, improvements, and previews of upcoming qsv pro features, here are a few highlights:

📌 Highlights (click each dropdown for more info)

Meet @rzmk - qsv pro's software engineer now also co-maintains qsv!

@rzmk has contributed to projects in the qsv ecosystem including qsv's describegpt, prompt, json, and clipboard commands; qsv's tab completion support; qsv.dathere.com including its online configurator and benchmarks page; 100.dathere.com with its qsv lessons and exercises; and qsv pro the spreadsheet data wrangling desktop app (along with its promo site). @rzmk now also co-maintains qsv!

With @rzmk now also co-maintaining qsv, our data-wrangling portfolio's roadmap may get more intriguing as @rzmk's work on qsv pro, 100.dathere.com, and other initiatives can result in contributions to qsv as we've seen in this release. Perhaps some aims may be put towards AI; "automagical" metadata inferencing; DCAT 3; and expanded recipe support with the accelerated evolution of qsv pro as an enterprise-grade Data-Wrangling/Data Curation Workbench.

Polars v0.41.3 - numerous sqlp and joinp improvements

sqlp: expanded SQL support
- Natural Join support
- DuckDB-like COLUMNS SQL function to select columns that match a pattern
- ORDER BY ALL support
- Support POSTGRESQL ^@ ("starts with"), ~~,~~*,!~~,!~~* ("like", "ilike") string-matching operators
- Support for SQL SELECT * ILIKE wildcard syntax
- Support SQL temporal functions STRFTIME and STRPTIME
sqlp: added --streaming option

New command qsv prompt - Use a file dialog for qsv file input and output

Be more interactive with qsv by using a file dialog to select a file for input and output.

Here are a few key highlights:

Start with qsv prompt when piping commands to provide a file as input from an open file dialog and pipe it into another command, for example: qsv prompt | qsv stats.
End with qsv prompt -f when piping commands to save the output to a file you choose with a save file dialog.

There are other options too, so feel free to explore more with qsv prompt --help.

This will allow you to create qsv pipelines that are more "user-friendly" and distribute them to non-technical users. It's not as flexible as qsv pro's full-blown GUI, but it's a start!

New command qsv json - Convert JSON data to CSV and optionally provide a jq-like filter

The new json command allows you to convert non-nested JSON data to CSV. If your data is not in the expected format, try using the --jaq option to provide a jq-like filter. See qsv json --help for more information and examples.

Here are a few key highlights:

Specify the path to a JSON file to attempt conversion to CSV with qsv json <filepath>.
Attempt conversion of JSON to CSV data from stdin, for example: qsv slice <filepath.csv> --json | qsv json.
Write the output to a file with the --output <filepath> (or -o for short) option.
Use the --jaq <filter> option to try converting nested or complex JSON data into the intended format before parsing to CSV.

You may learn more by running qsv json --help.

Along with the jsonl command, we now have more options to convert JSON to CSV with qsv!

New command qsv clipboard - Provide input from your clipboard and save output to your clipboard

Provide your clipboard content using qsv clipboard and save output to your clipboard by piping into qsv clipboard --save (or -s for short).

100.dathere.com - Try out lessons and exercises with qsv from your browser!

You may run qsv commands from your browser without having to install it locally at 100.dathere.com.

Within the lesson (in-page) using Thebe	In a Jupyter Lab environment

Thanks to Jupyter Book, datHere has released a website available at 100.dathere.com where you may explore lessons and exercises with qsv by running them within the web page, in a Jupyter Lab environment, or locally after following the provided installation instructions. There are multiple exercises planned, but feel free to try out the first few available lessons/exercises by visiting 100.dathere.com and star the source code's repository here.

New multi-shell completions draft (bash, zsh, powershell, fish, nushell, fig, elvish)

There's a draft of more qsv shell completion support including 7 different shells! The plan is to add the rest of the commands in this implementation since we can use one codebase to generate the 7 shell completion script files. Feel free to try out the various shell completions in the examples folder from contrib/completions to verify if the examples work (as of today's release date only qsv count and qsv clipboard may be available) and also contribute to adding the rest of the completions if you know a bit of Rust.

The existing Bash shell completions for v0.129.0 and fish shell completions draft are available for now as the multi-shell completions draft is being developed.

Bash completions demo	Fish completions demo

With shell completions enabled, you may identify qsv commands more easily when pressing the tab key on your keyboard in certain positions using the relevant Bash or fish shell from your terminal. You may follow the instructions from 100.dathere.com here to learn how to install the Bash completions and under the Usage section here for fish shell completions. Note that the fish shell completions are incomplete and both of the implementations may be replaced by the multi-shell completions implementation once complete.

qsvpro.dathere.com - Preview: Download spreadsheets from a compatible CKAN instance into the qsv pro Workflow

This is a preview of a feature, meaning it is planned for an upcoming release but may change by the time it is released.

In addition to importing local spreadsheet files and uploading to a CKAN instance, this new feature allows users to select a locally registered CKAN instance where they have the create_dataset permission to download a spreadsheet file from their CKAN instance and load the new local spreadsheet file into the Workflow. qsv pro's Workflow would therefore have both upload and download capability to and from a compatible CKAN instance.

qsvpro.dathere.com - Preview: Attempt SQL query generation from natural language with a compatible LLM API instance

This is a preview of a feature, meaning it is planned for an upcoming release but may change by the time it is released.
Also note that this video is sped up as you may see by...

Contributors

dependabot and rzmk

Assets 15

25 May 22:33

jqnatividad

0.128.0

98c9d95

0.128.0

[0.128.0] - 2024-05-25

❤️ csv,conf,v8 Edition 🎉
🏇🏽 ¡Ándale! ¡Ándale! ¡Arriba! ¡Arriba! 💨

Yii-hah! We're Mexico bound as we head to csv,conf,v8 to present and share qsv with fellow data-makers and wranglers from all over!

And we've packed a lot into this release for the occasion:

search got a lot of love as it now powers qsv pro's new search feature to get near-instant search results even on large datasets.
stats - the ❤️ of qsv, now has several cache fine-tuning options with --cache-threshold. It now also computes max_precision for floats and is_ascii for strings. It also has a new --round 9999 sentinel value to suppress rounding of statistics.
schema & tojsonl are now faster thanks to stats --cache-threshold autoindex & cache creation/deletion logic.
We upgraded Polars to 0.40.0 to unlock additional capabilities in the count, joinp & sqlp commands.
count now has an additional blazing fast counting mode using Polars' read_csv() table function.
frequency gets some micro-optimizations for even faster frequency analysis.
luau is now bundled with luau 0.625 from 0.622. We also upgraded the bundled LuaDate library from 2.2.0 to 2.2.1. All of this, while making it ~10% faster!

Overall, qsv manages to keep its performance edge despite the addition of new capabilities and features. We'll give a whirlwind tour of qsv and these updates in our talk at csv,conf,v8.

We'll also preview what we've been calling the People's APPI - our "Answering People/Policymaker Interface" in qsv pro.

This is a new way to interact with qsv that's more conversational and less command-line-y using a natural language interface. It's a way to make qsv more accessible to more people, especially those who are not comfortable with the command line.

We're excited to share all these qsv innovations with the csv,conf,v8 community and the wider world! Nos vemos en Puebla!

¡Ándele! ¡Ándele! ¡Epa! ¡Epa! ¡Epa!

Added

count: additional Polars-powered counting mode using read_csv() SQL table function 05c5809
input: add --quote-style option df3c8f1
joinp: add --coalesce option 8d142e5
search: add --preview-match option #1785
search: add --json output option #1790
search: add "match-only" --flag option mode #1799
search: add --not-one flag for not using exit code 1 when no match by @rzmk in #1810
sqlp: add --decimal-comma option #1832
stats: add --cache-threshold option #1795
stats: add --cache-threshold autoindex creation/deletion logic #1809
stats: add additional mode to --cache-threshold 63fdc55
stats: now computes max_precision for floats #1815
stats: add --round 9999 sentinel value support to suppress rounding #1818
stats: add is_ascii column #1824
added new benchmarks for search command 58d73c3

Changed

count: document three count modes 3d5a333
describegpt: update --max-tokens type for LLMs with larger context sizes by @rzmk #1841
excel: use simpler range::headers() to get headers 069acbf
frequency: ensure --other-sorted works with --other-text 7430ad7
frequency: microoptimize hot loop d9c01e1, 7c9f925 and
luau: improve usage text cb6b4d9
luau: we now bundle luau 0.625 from 0.622 4060975
luau: update vendored LuaDate library from 2.2.0 to 2.2.1 #1840
schema: adjust to reflect stats --cache-threshold option 92fed86
slice: move json output helpers to util 1f44b48
tojsonl: refactor boolcheck helper 74d5f5a
docs: cross-reference split & partition commands #1828
contrib(bashly): update completions.bash for qsv v0.127.0 by @rzmk in #1776
contrib(bashly): update completions.bash for qsv v0.128.0 by @rzmk in #1838
deps: upgrade to polars 0.40.0 #1831
build(deps): bump actix-web from 4.5.1 to 4.6.0 by @dependabot in #1825
build(deps): bump anyhow from 1.0.82 to 1.0.83 by @dependabot in #1798
build(deps): bump anyhow from 1.0.83 to 1.0.85 by @dependabot in #1823
build(deps): bump anyhow from 1.0.85 to 1.0.86 by @dependabot in #1826
build(deps): bump cached from 0.50.0 to 0.51.0 by @dependabot in #1789
build(deps): bump cached from 0.51.0 to 0.51.1 by @dependabot in #1793
build(deps): bump cached from 0.51.1 to 0.51.2 by @dependabot in #1802
build(deps): bump cached from 0.51.2 to 0.51.3 by @dependabot in #1805
build(deps): bump crossbeam-channel from 0.5.12 to 0.5.13 by @dependabot in #1827
build(deps): bump csvs_convert from 0.8.9 to 0.8.10 by @dependabot in #1808
build(deps): bump data-encoding from 2.5.0 to 2.6.0 by @dependabot in #1780
build(deps): bump file-format from 0.24.0 to 0.25.0 by @dependabot in #1807
build(deps): bump flate2 from 1.0.28 to 1.0.29 by @dependabot in #1778
build(deps): bump flate2 from 1.0.29 to 1.0.30 by @dependabot in #1784
build(deps): bump hashbrown from 0.14.3 to 0.14.5 by @dependabot in #1781
build(deps): bump itertools from 0.12.1 to 0.13.0 by @dependabot in #1822
deps: bump forked jsonschema from 0.17.1 to 0.18.0 f02620f
build(deps): bump mimalloc from 0.1.41 to 0.1.42 by @dependabot in #1829
build(deps): bump mlua from 0.9.7 to 0.9.8 by @dependabot in #1821
build(deps): bump qsv-stats from 0.16.0 to 0.17.1 by @dependabot in #1813
build(deps): bump qsv-stats from 0.17.1 to 0.17.2 by @dependabot in #1814
build(deps): bump qsv-stats from 0.17.2 to 0.18.0 by @dependabot in #1816
build(deps): bump ryu from 1.0.17 to 1.0.18 by @dependabot in #1801
build(deps): bump semver from 1.0.22 to 1.0.23 by @dependabot in #1800
build(deps): bump serde from 1.0.198 to 1.0.199 by @dependabot in #1777
build(deps): bump serde from 1.0.199 to 1.0.200 by @dependabot in #1787
build(deps): bump serde from 1.0.200 to 1.0.201 by @dependabot in #1804
build(deps): bump serde from 1.0.201 to 1.0.202 by @dependabot in #1817
build(deps): bump serde_json from 1.0.116 to 1.0.117 by @dependabot in #1806
build(deps): bump serial_test from 3.1.0 to 3.1.1 by @dependabot in #1779
build(deps): bump simple-expand-tilde from 0.1.5 to 0.1.6 by @dependabot in #1811
build(deps): bump sysinfo from 0.30.11 to 0.30.12 by @dependabot in https://github.com/jq...

Contributors

dependabot and rzmk

Assets 15

25 Apr 09:53

jqnatividad

0.127.0

cf4c180

0.127.0

📊 Enhanced Frequency Analysis 📊

This a quick release adding several frequency enhancements for more detailed frequency analysis. The frequency command now includes a percentage column, calculates other values, and supports limiting unique counts and negative limits.
These options provides additional context for Datapusher+, qsv-pro and describegpt so their metadata inferences are more accurate and comprehensive.

Previously, for a 775-row CSV file containing one column named state with entries for all 50 states, frequency only showed¹:

qsv frequency freq_state_example.csv | qsv table
field  value  count
state  NY     100
state  NJ     70
state  CA     60
state  MA     55
state  FL     45
state  TX     43
state  NM     40
state  AZ     39
state  NV     38
state  MI     35

Now, there's a new percentage column and other values calculation, both of which have configurable options:

qsv frequency freq_state_example.csv | qsv table
field  value       count  percentage
state  NY          100    12.90323
state  NJ          70     9.03226
state  CA          60     7.74194
state  MA          55     7.09677
state  FL          45     5.80645
state  TX          43     5.54839
state  NM          40     5.16129
state  AZ          39     5.03226
state  NV          38     4.90323
state  MI          35     4.51613
state  Other (40)  250    32.25806

This release is also out of cycle to address a big performance regression in the excel command caused by unnecessary formula info retrieval for the --error-format option introduced in 0.126.0. This has been fixed, and the excel command is now back to its speedy self.

Added

frequency: added percentage column; other values calculation, implementing #1774 #1775
benchmarks: added new frequency and excel benchmarks b83ad3a

Changed

contrib(bashly): update completions.bash for qsv v0.126.0 by @rzmk in #1771
build(deps): bump mimalloc from 0.1.39 to 0.1.41 by @dependabot in #1772
build(deps): bump qsv-stats from 0.14.0 to 0.15.0 by @dependabot in #1773
updated several indirect dependencies
applied select clippy recommendations

Fixed

excel: fixed performance regression because qsv was unnecessarily getting formula info (an expensive operation) for --error-format option even when not required 772af34
renamed 0.126.0 sqlp_vs_duckdb benchmark results so they're next to each other for easy direct comparison. 7bcd59e.
Per the benchmarks, sqlp is 2.87 times faster than duckdb v0.10.2 for a simple aggregation (0.066 secs vs 0.19 secs), and 1.42 times faster for an "expensive" aggregation (0.143 secs vs 0.203 secs).

Full Changelog: 0.126.0...0.127.0

with its default --limit setting of 10 only show the top 10 unique values in the column, sorted by occurence ↩

Contributors

dependabot and rzmk

Assets 15

22 Apr 15:35

jqnatividad

0.126.0

ecd0ac7

0.126.0

🤖 Expanded Metadata Inferencing 🤖

describegpt headlines this release, with its new ability to support other local Large Language Models (LLMs) using popular tools that serve them through APIs such as Ollama and Jan. This broadens the tool's utility in diverse AI environments. Beyond OpenAI, qsv can now use other popular LLMs like Llama 3, Mistral, and Gemma. It also unlocks expanded metadata inferencing capabilities in qsv pro.

Several commands got additional options: cat with --no-headers support in the rowskey subcommand; excel with new options like --error-format and short --metadata mode; and foreach with a --dry-run option. frequency also got new options, including --unq-limit for limiting unique counts, support for negative limits, and a --lmt-threshold option for compiling comprehensive frequencies below a threshold. slice now supports negative indices and new JSON output options, providing more flexibility in data slicing.

This is all rounded out with sqlp improvements, including support for single-line comments in SQL scripts and a special SKIP_INPUT value to skip input preprocessing when using table functions directly in Polars SQL (e.g. read_csv() and read_parquet()) - all while increasing performance thanks to the Polars engine being upgraded to 0.39.2.

New Features

cat: Added --no-headers support to the rowskey subcommand.
describegpt: Added compatibility for other local Large Language Models (LLMs) such as Ollama and Jan, broadening the tool's utility in diverse AI environments.
excel: Introduced new options in the excel command: --error-format for better error handling and a short --metadata JSON mode.
foreach: added a --dry-run option, allowing users to preview the results of scripts without executing them.
frequency: New options added such as --unq-limit for limiting unique counts; support for negative limits to only show frequencies >= abs(negative limit); and a --lmt-threshold option to allow the compilation of comprehensive frequencies below the threshold - all providing more detailed control over frequency analysis.
slice: Support for negative indices to slice from the end and new JSON output options.
sqlp: sqlp now supports single-line comments and includes a special SKIP_INPUT value for more efficient data loading. The Polars engine has also been upgraded to 0.39.2, providing enhanced performance and stability.

Changes and Optimizations

Performance Enhancements: Microoptimizations in datefmt and validate commands, and increased default length for --infer-len in sqlp for improved performance.
Dependency Updates: Numerous updates including bumping Luau, jql-runner, pyo3, and other dependencies to enhance stability and security.
Benchmarks Added: New performance benchmarks for sqlp vs duckdb added to ensure there are no performance regressions between releases. Right now, sqlp is faster than duckdb in most cases (thanks to Polars - see the latest TPC-H benchmarks), but we want to make sure that we keep it that way.

Security and Robustness

Security Fixes: Updated rustls to fix a specific CVE, and other minor fixes to enhance the security and robustness of network and data processing features.
Bug Fixes: Various bug fixes including improvements in error formatting in excel and robustness in fetch and fetchpost commands.

Added

cat: add --no-headers support to rowskey subcommand #1762
describegpt: add compatibility for other (local) LLMs (Ollama, Jan, etc.) by @rzmk in #1761
excel: add --error-format option #1721
excel: add --metadata short JSON mode #1738
foreach: add --dry-run option #1740
frequency: add --unq-limit option #1763
frequency: add support for negative --limits #1765
frequency: add --lmt-threshold option #1766
slice: add support for negative --index option values #1726
slice: implement --json output option #1729
sqlp: added support for single-line comments in SQL scripts bb52bce
sqlp: added SKIP_INPUT special value to short-circuit input processing if the user wants to
load input files directly using table functions (e.g. read_csv(), read_parquet(), etc.) fe850ad
validate: add --valid-output option #1730
contrib: add sample Bashly completions implementation by @rzmk in #1731
benchmarks: added sqlp vs duckdb benchmarks.

Changed

datefmt: microoptimize formatting 0ee27e7
joinp: adapt to breaking change in Polars 0.39 for lazyframe sort c625ca9
sqlp: change --infer-len option default from 250 to 1000 for increased performance da1d215
validate: microoptimize to_json_instance() c2e4a1c
bump Luau from 0.616 to 0.622 9216ec3
build(deps): bump jql-runner from 7.1.6 to 7.1.7 by @dependabot in #1711
build(deps): bump pyo3 from 0.21.0 to 0.21.1 by @dependabot in #1712
build(deps): bump pyo3 from 0.21.1 to 0.21.2 by @dependabot in #1750
build(deps): bump strsim from 0.11.0 to 0.11.1 by @dependabot in #1715
build(deps): bump sysinfo from 0.30.7 to 0.30.8 by @dependabot in #1716
build(deps): bump sysinfo from 0.30.8 to 0.30.9 by @dependabot in #1732
build(deps): bump sysinfo from 0.30.9 to 0.30.10 by @dependabot in #1735
build(deps): bump sysinfo from 0.30.10 to 0.30.11 by @dependabot in #1755
build(deps): bump redis from 0.25.2 to 0.25.3 by @dependabot in #1720
build(deps): bump mlua from 0.9.6 to 0.9.7 by @dependabot in #1724
build(deps): bump reqwest from 0.12.2 to 0.12.3 by @dependabot in #1725
build(deps): bump reqwest from 0.12.3 to 0.12.4 by @dependabot in #1759
build(deps): bump anyhow from 1.0.81 to 1.0.82 by @dependabot in #1733
build(deps): bump robinraju/release-downloader from 1.9 to 1.10 by @dependabot in #1734
build(deps): bump chrono from 0.4.37 to 0.4.38 by @dependabot in #1744
bump polars from 0.38 to 0.39 #1745
build(deps): bump polars from 0.39.0 to 0.39.1 by @dependabot in #1746
build(deps): bump polars from 0.39.1 to 0.39.2 by @dependabot in #1752
build(deps): bump qsv-dateparser from 0.12.0 to 0.12.1 by @dependabot in #1747
build(deps): bump serde_json from 1.0.115 to 1.0.116 by @dependabot in #1749
build(deps): bump serde from 1.0.197 to 1.0.198 by @dependabot in #1751
build(deps): bump rustls from 0.22.3 to 0.22.4 by @dependabot in #1758
build(deps): bump simple-expand-tilde from 0.1.4 to 0.1.5 by @dependabot in #1767
build(deps): bump serial_test from 3.0.0 to 3.1.0 by @dependabot in #1768
build(deps): bump actions/setup-python from 5.0.0 to 5.1.0 by @dependabot in #1769
applied select clippy recommendations
updated several indirect dependencies
added several benchmarks for new/changed commands
pin Rust nightly to 2024-04-15 - the same nightly that Polars 0.39 is pinned to
bumped MSRV to 1.77.2

Fixed

Make init_logger more robust #1717
count: empty CSVs count as zero also for polars. Fixes #1741 #1742
excel: fix #1682 by adding --error-format option #1689
fetch & fetchpost: more robust JSON response validation ebc7287
slice: use write! macro to get rid of GH Advanced Security lint c739097
sqlp: fixed docopt defaults that were not being parsed correctly fe850ad
deps: bump h2 from 0.4.3 to 0.4.4 ...

Contributors

dependabot and rzmk

Assets 15

01 Apr 12:38

jqnatividad

0.125.0

d559548

0.125.0

In this release, we focused on the 🏎️ need for even more speed 🏎️ .

This was done primarily by tweaking several supporting qsv crates. qsv-docopt now parses command-line arguments slightly faster. qsv-stats, the crate behind commands like stats, schema, tojsonl, and frequency, has been further optimized for speed. qsv-dateparser has been updated to support new timezone handling options in datefmt. qsv-sniffer also got a speed boost.

Per the benchmark suite, stats is 25% faster (1.563 secs vs 2.067 secs) when computing the 13 "streaming" stats and 14% faster when computing --everything (17 columns of addl stats - 3.149 secs vs 3.656 secs) for the 1M row, 41 column, 520mb sample of NYC's 311 data.

The count command has been refactored to utilize Polars' SQLContext, which leverages LazyFrames evaluation to automagically count even very large files in just a few seconds. Previously, count was already using Polars, but it mistakenly fell back to a slower counting mode. Now, it consistently delivers fast performance, even without an index. On the same benchmark suite, it takes 0.052 secs vs 0.503 seconds - almost 10x faster!

As count is not just a top-level command, but also a widely used helper used by several qsv commands, this gives the entire suite a nice performance boost.

Continuing on the performance front, the excel command now has a new short --metadata mode, allowing users to just get a "shorter" version of the metadata report that only list the workbook's top level metadata (sheet index, sheet name, sheet type, visibility) instead of the full metadata report (which also has info like num rows, column metadata, etc.). On the benchmark suite, the short metadata report takes all of 0.005 secs vs 11.237 secs for the 1M row xlsx version of the same NYC 311 data - more than 3 orders of magnitude faster! (it may actually be faster since 0.005 secs is at the limits of what hyperfine can measure)

The datefmt command also got some major enhancements with new timezone handling and timestamp parsing options, though at the cost of a small 15% performance penalty.

Lastly, we are excited to announce that qsv will be featured at the CSV,Conf,V8 conference in Puebla, Mexico on May 28-29. I'll be presenting a talk titled "qsv: A Blazing Fast CSV Data-Wrangling Toolkit". Hope to see you there!.

Added

excel: added short mode to --metadata option #1699
datefmt: added ts-resolution option to specify resolution to use when parsing unix timestamps #1704
datefmt: added timezone handling options #1706 #1707 #1642

Changed

count: refactored to use Polars SQLContext 43a236f
stats: refactored stats_path helper function 174c30e
apply, applydp, datefmt, excel, geocode, py, validate: use std::mem::take to avoid clone 1fd187f 8402d3a 8496157
excel: optimized workbook opening operation 67f662e
build(deps): bump flexi_logger from 0.27.4 to 0.28.0 by @dependabot in #1673
build(deps): bump polars from 0.38.2 to 0.38.3 by @dependabot in #1674
build(deps): bump uuid from 1.7.0 to 1.8.0 by @dependabot in #1675
build(deps): bump hashbrown from 0.14.3 to 0.14.4 by @dependabot in #1680
build(deps): bump reqwest from 0.11.26 to 0.11.27 by @dependabot in #1679
build(deps): bump bytes from 1.5.0 to 1.6.0 by @dependabot in #1685
build(deps): bump regex from 1.10.3 to 1.10.4 by @dependabot in #1686
build(deps): bump indexmap from 2.2.5 to 2.2.6 by @dependabot in #1687
build(deps): bump rayon from 1.9.0 to 1.10.0 by @dependabot in #1688
build(deps): bump qsv_docopt from 1.6.0 to 1.7.0 by @dependabot in #1691
build(deps): bump reqwest from 0.12.1 to 0.12.2 by @dependabot in #1693
build(deps): bump serde_json from 1.0.114 to 1.0.115 by @dependabot in #1694
build(deps): bump itoa from 1.0.10 to 1.0.11 by @dependabot in #1695
build(deps): bump actions/setup-python from 5.0.0 to 5.1.0 by @dependabot in #1700
build(deps): bump rust_decimal from 1.34.3 to 1.35.0 by @dependabot in #1701
build(deps): bump chrono from 0.4.35 to 0.4.37 by @dependabot in #1702
build(deps): bump tokio from 1.36.0 to 1.37.0 by @dependabot in #1703
build(deps): bump qsv-sniffer from 0.10.2 to 0.10.3 by @dependabot in #1708
build(deps): bump titlecase from 2.2.1 to 3.0.0 by @dependabot in #1709
build(deps): bump qsv-stats from 0.13.0 to 0.14.0 by @dependabot in #1710
applied select clippy recommendations
updated several indirect dependencies
added several benchmarks for new/changed commands
bumped MSRV to 1.77.1
use #[cfg(debug_assertions)] conditional compilation to avoid compiling debug code in release mode
use patched forks of jsonschema, cached, self_update and localzone crates to avoid old dependencies
which was causing dependency bloat

Fixed

count: fixed polars_count_input helper, as it was always falling back to "slow" counting mode 3484c89

Full Changelog: 0.124.1...0.125.0

Contributors

dependabot

Assets 15

15 Mar 22:05

jqnatividad

0.124.1

1dbfadb

0.124.1

Datapusher+ "Speed of Insight" Release! 🚀🚀🚀

This release is all about speed, speed, speed! We've made qsv even faster by leveraging Polars' multithreaded, mem-mapped CSV reader to get near-instant row counts of large CSV files, and near instant SQL queries and aggregations with Datapusher+ - automagically inferring metadata and giving you quick insights into your data in seconds!

We're demoing our qsv-powered Datapusher+ at the March 2024 installment of CKAN Montly Live on March 20, 2024, 13:00-14:00 UTC. Join us!

Beyond pushing data reliably at speed into your CKAN Datastore (it pushes real good! 😉), DP+ does some extended analysis, processing and enrichment of the data so it can be readily Used.

Both fetch and fetchpost commands now also have a --disk-cache option and are fully synched - forming the foundation for high-speed data enrichment from Web Services - including datHere's forthcoming, fully-integrated Data Enrichment Service.

🏇🏽 Hi-ho Quicksilver, away! 🏇🏽

Added

count: automatically use Polars multithreaded, mem-mapped CSV reader when polars feature is enabled to get near-instant row counts of large CSV files even without an index #1656
qsvdp: added polars support to Datapusher+-optimized binary variant, so we can do near instant SQL queries and aggregations during DP+ processing #1664
fetchpost: added --disk-cache options and synced usage options with fetch #1671
extended .infile-list to skip empty and commented lines, and to validate file paths
20a45c8 and
2650930

Changed

sqlp: automatically disable read_csv() fast path optimization when a custom delimiter is specified #1648
refactored util::count_rows() helper to also use polars if available 1e09e17 and 8d321fe
publish: updated Windows MSI publish GH Action workflow to use Wix 3.14 from 3.11 75894ef
deps: bump polars from 0.38.1 to 0.38.2 5faf90e
deps: update Luau from 0.614 to 0.616 eb197fe and 52331da
build(deps): bump sysinfo from 0.30.6 to 0.30.7 by @dependabot in #1650
build(deps): bump chrono from 0.4.34 to 0.4.35 by @dependabot in #1651
build(deps): bump strum from 0.26.1 to 0.26.2 by @dependabot in #1658
build(deps): bump qsv-stats from 0.12.0 to 0.13.0 by @dependabot in #1663
build(deps): bump anyhow from 1.0.80 to 1.0.81 by @dependabot in #1662
build(deps): bump reqwest from 0.11.25 to 0.11.26 by @dependabot in #1667
applied select clippy recommendations
updated several indirect dependencies
added several benchmarks for new/changed commands

Fixed

dedup: fixed #1665 dedup not handling numeric values properly by adding a --numeric option #1666
joinp: reenable join validation tests now that Polars 0.38.2 join validation is working again 5faf90e and fcfc75b
count: broken in unreleased 0.124.0. Polars-powered count require a "clean" CSV file as it infers the schema based on the first 1000 rows of a CSV. This will sometimes result in an invalid "error" (e.g. it infers a column is a number column, when its not). 0.124.1 fixes this by adding a fallback to the "regular" CSV reader if a Polars error occurs a2c0869

Removed

gender_guesser 0.2.0 has been released. Remove patch.crates-io entry
97873a5

Full Changelog: 0.123.0...0.124.1

Contributors

dependabot

Assets 15

05 Mar 14:18

jqnatividad

0.123.0

b833e47

0.123.0

OPEN DATA DAY 2024 Release! 🎉🎉🎉

In celebration of Open Data Day, we're releasing qsv 0.123.0 - the biggest release ever with 330+ commits! qsv 0.123.0 continues to focus on performance, stability and reliability as we continue setting the stage for qsv's big brother - qsv pro.

We've been baking qsv pro for a while now, and it's almost ready for release. qsv pro is a cross-platform Desktop Data Wrangling tool marrying an Excel-like UI with the power of qsv, backed by cloud-based data cleaning, enrichment and enhancement service that's easy to use for casual Excel users and Data Publishers, yet powerful enough for data scientists and data engineers.

Stay tuned!

Highlights:

sqlp now has automatic read_csv() fast path optimization, often making optimized queries run dramatically faster - e.g what took 6.09 seconds for a non-trivial SQL aggregation on an 18 column, 657mb CSV with 7.43 million rows now takes just 0.14 seconds with the optimization - 🚀 43.5x FASTER 🚀 ! ¹

# with fast path optimization turned off
/usr/bin/time qsv sqlp taxi.csv --no-optimizations "select VendorID,sum(total_amount) from taxi group by VendorID order by VendorID"
VendorID,total_amount
1,52377417.52985942
2,89959869.13054822
4,600584.610000027
(3, 2)
        6.09 real         6.82 user         0.16 sys

# with fast path optimization, fully exploiting Polars' multithreaded, mem-mapped CSV reader!
 /usr/bin/time qsv sqlp taxi.csv "select VendorID,sum(total_amount) from taxi group by VendorID order by VendorID"
VendorID,total_amount
1,52377417.52985942
2,89959869.13054822
4,600584.610000027
(3, 2)
        0.14 real         1.09 user         0.09 sys

# in contrast, csvq takes 72.46 seconds - 517.57x slower
/usr/bin/time csvq "select VendorID,sum(total_amount) from taxi group by VendorID order by VendorID"
+----------+---------------------+
| VendorID |  SUM(total_amount)  |
+----------+---------------------+
| 1        |  52377417.529256366 |
| 2        |    89959869.1264675 |
| 4        |   600584.6099999828 |
+----------+---------------------+
       72.46 real        65.15 user        75.17 sys

"Traditional" SQL engines

qsv and csvq both operate on "bare" CSVs. For comparison, let's contrast qsv's performance against "traditional" SQL engines
that require setup and import (aka ETL). Not counting setup and import time (which alone, takes several minutes), we get:

sqlite3.43.2 takes 2.910 seconds - 20.79x slower

sqlite> .timer on
sqlite> select VendorID,sum(total_amount) from taxi group by VendorID order by VendorID;
1,52377417.53
2,89959869.13
4,600584.61
Run Time: real 2.910 user 2.569494 sys 0.272972

PostgreSQL 15.6 using PgAdmin 4 v6.12 takes 18.527 seconds - 132.34x slower

even with an index, qsv sqlp is still 5.96x faster

sqlp now supports JSONL output format and adds compression support for Avro and Arrow output formats.
fetch now has a --disk-cache option, so you can cache web service responses to disk, complete with cache control and expiry handling!
jsonl is now multithreaded with additional --batch and --job options.
split now has three modes: split by record count, split by number of chunks and split by file size.
datefmt is a new top-level command for date formatting. We extracted it from apply to make it easier to use, and to set the stage for expanded date and timezone handling.
enum now has a --start option.
excel now has a --keep-zero-time option and now has improved datetime/duration parsing/handling with upgrade of calamine from 0.23 to 0.24.
tojsonl now has --trim and --no-boolean options and eliminated false positive boolean inferences.

Added

apply: add gender_guess operation #1569
datefmt: new top-level command for date formatting. #1638
enum: add --start option #1631
excel: added --keep-zero-time option; improved datetime/duration parsing/handling with upgrade of calamine from 0.23 to 0.24 #1595
fetch: add --disk-cache option #1621
jsonl: major performance refactor! Now multithreaded with addl --batch and --job options #1553
sniff: added addl mimetype/file formats detected by bumping file-format from 0.23 to 0.24 #1589
split: add <outdir> error handling and add usage text examples #1585
split: added --chunks option #1587
split: add --kb-size option #1613
sqlp: added JSONL output format and compression support for AVRO and Arrow output formats in #1635
tojsonl: add --trim option #1554
Add QSV_DOTENV_PATH env var #1562
Add license scan report and status by @fossabot in #1550
Added several benchmarks for new/changed commands

Changed

luau: bumped Luau from 0.606 to 0.614
freq: major performance refactor - 1a3a4b4
split: migrate to rayon from threadpool #1555
split: refactored to actually create chunks <= desired --kb-size, obviating need for hacky --sep-factor option #1615
tojsonl: improved true/false boolean inferencing false positive handling #1641
tojsonl: fine-tune boolean inferencing #1643
schema: use parallel sort when sorting enums for fields 523c60a
Use array for rustflags to avoid conflicts with user flags by @clarfonthey in #1548
Make it easier and more consistent to package for distros by @alerque in #1549
Replace simple_home_dir with simple_expand_tilde crate #1578
build(deps): bump rayon from 1.8.0 to 1.8.1 by @dependabot in #1547
build(deps): bump rayon from 1.8.1 to 1.9.0 by @dependabot in #1623
build(deps): bump uuid from 1.6.1 to 1.7.0 by @dependabot in #1551
build(deps): bump jql-runner from 7.1.2 to 7.1.3 by @dependabot in #1552
build(deps): bump jql-runner from 7.1.3 to 7.1.5 by @dependabot in #1602
build(deps): bump jql-runner from 7.1.5 to 7.1.6 by @dependabot in #1637
build(deps): bump flexi_logger from 0.27.3 to 0.27.4 by @dependabot in #1556
build(deps): bump regex from 1.10.2 to 1.10.3 by @dependabot in #1557
build(deps): bump cached from 0.47.0 to 0.48.0 by @dependabot in #1558
build(deps): bump cached from 0.48.0 to 0.48.1 by @dependabot in #1560
build(deps): bump cached from 0.48.1 to 0.49.2 by @dependabot in #1618
build(deps): bump chrono from 0.4.31 to 0.4.32 by @dependabot in #1559
build(deps): bump chrono from 0.4.32 to 0.4.33 by @dependabot in #1566
build(deps): bump mlua from 0.9.4 to 0.9.5 by @dependabot in #1565
build(deps): bump mlua from 0.9.5 to 0.9.6 by @dependabot in #1632
build(deps): bump serde from 1.0.195 to 1.0.196 by @dependabot in #1568
build(deps): bump serde from 1.0.196 to 1.0.197 by @dependabot in #1612
build(deps): bump serde_json from 1.0.111 to 1.0.112 by @dependabot in #1567
build(deps): bump serde_json from 1.0.112 to 1.0.113 by @dependabot in #1576
build(deps): bump serde_json from 1.0.113 to 1.0.114 by @dependabot in #1610
bump Polars from 0.36 to 0.37 #1570
build(deps): bump polars from 0.37.0 to 0.38.0 by @dependabot in #1629
build(deps): bump polars from 0.38.0 to 0.38.1 by @dependabot in #1634
build(deps): bump strum from 0.25.0 to 0.26.1 by @dependabot in #1572
build(deps): bump indexmap from 2.1.0 to 2.2.1 by @dependabot in https://g...

measurements taken on an Apple Mac Mini 2023 model with an M2 Pro chip with 12 CPU cores & 32GB of RAM, running macOS Sonoma 14.4 ↩

Contributors

alerque, rex4539, and 3 other contributors

Assets 15

17 Jan 04:54

jqnatividad

0.122.0

4ff43bc

0.122.0

👉 REQUEST FOR USE CASES: 👈

Please help define the future of qsv.
Add what you're currently using qsv for here - #1529

Not only does it help us catalog what use cases we should optimize for, posters will get higher priority access to the qsv pro preview.

Highlights:

qsvpy is now available in the prebuilt binaries for select platforms! It's a new qsv binary variant with the python feature, enabling the py command. Three subvariants are available - qsvpy310, qsvpy311 and qsvpy312, corresponding to Python 3.10, 3.11 and 3.12 respectively.
Removed generate command as generate's main dependency is unmaintained and has old dependencies. generate was also not used much, as the test data it generated was not well suited for training models and it was too slow so we decided to remove it even before the synthesize (#235) command is ready.
reverse now has index support and can work in "streaming" mode and handle larger than memory CSV files.
sort and sample: users can now choose from three Random Number Generator (RNG) algorithms with the --rng option - standard, faster & cryptosecure.
pseudo now has --start, --increment & --formatstr options.
fmt now has a --no-final-newline option to suppress the final newline for better interoperability with other tools, specifically Excel. It also treats "T" as special value for tab character for the --out-delimiter option.

Added

reverse: now has index support and can work in "streaming" mode #1531
sort: added --rng <kind> for different kinds of RNGs - standard, faster & cryptosecure #1535
sample: added --rng <kind> option (standard, faster & cryptosecure) #1532
pseudo: major refactor. Added --start, --increment & --formatstr options #1541
fmt: add --no-final-newline option #1545
added additional benchmarks
added additional test for new options. We now have ~1,300 tests!

Changed

fmt: --out-delimiter now treats "T" as special value for tab character #1546
build(deps): bump whatlang from 0.16.3 to 0.16.4 by @dependabot in #1525
build(deps): bump serde_json from 1.0.110 to 1.0.111 by @dependabot in #1524
build(deps): bump pyo3 from 0.20.1 to 0.20.2 by @dependabot in #1526
build(deps): bump sysinfo from 0.30.3 to 0.30.4 by @dependabot in #1523
build(deps): bump sysinfo from 0.30.4 to 0.30.5 by @dependabot in #1530
build(deps): bump serial_test from 2.0.0 to 3.0.0 by @dependabot in #1534
build(deps): bump mlua from 0.9.2 to 0.9.3 by @dependabot in #1540
build(deps): bump mlua from 0.9.3 to 0.9.4 by @dependabot in #1542
build(deps): bump simple-home-dir from 0.2.1 to 0.2.3 by @dependabot in #1544
apply select clippy suggestions
update several indirect dependencies

Removed

removed generate command #1527
removed generate feature from GitHub Action workflows #1528
sample: removed --faster RNG sampling option, replacing it with --rng #1532

Full Changelog: 0.121.0...0.122.0

Contributors

dependabot

Assets 15

03 Jan 13:24

jqnatividad

0.121.0

12957d3

0.121.0

Two days ago, qsv 0.120.0 was released. Hours later, significant updates occurred in our ecosystem: Polars upgraded to version 0.36, Homebrew rolled out support for Rust 1.75.0, and our pull request for 'cached' was merged.

In light of these developments, we're releasing 0.121.0 out of cycle to leverage the new features, fixes and performance enhancements in these key components integral to qsv.

👉 REQUEST FOR USE CASES: 👈
Please help define the future of qsv.
Add what you're currently using qsv for here - #1529

Not only does it help us catalog what use cases we should optimize for, posters will get higher priority access to the qsv pro preview.

Added

sqlp: with Polars 0.36, it now supports:
- subqueries for JOIN and FROM (examples)
- REGEXP and RLIKE pattern matching (examples)
- common variant spelling STDEV in the SQL engine (in addition to STDDEV)
- and more under the hood improvements!
sqlp: now supports writing to Apache Avro format 32f2fbb
sqlp: when writing to CSV --format, if the --output file has a TSV or TAB extension, it will automatically use the tab delimiter c97048c

Changed

Bump polars from 0.35 to 0.36 #1521
build(deps): bump serde from 1.0.193 to 1.0.194 by @dependabot in #1520
build(deps): bump serde_json from 1.0.109 to 1.0.110 by @dependabot in #1519
build(deps): bump semver from 1.0.20 to 1.0.21 by @dependabot in #1518
build(deps): bump serde_stacker from 0.1.10 to 0.1.11 by @dependabot in #1517
build(deps): bump cached from 0.46.1 to 0.47.0 by @dependabot in #1522
bumped MSRV to 1.75.0

Fixed

cat: fixed performance regression in rowskey by moving unchanging variables out of hot loop - 96a40e9
sqlp: Polars 0.36 fixed the SQL SUBSTR() function

Full Changelog: 0.120.0...0.121.0

Contributors

dependabot

Assets 15

Releases: jqnatividad/qsv

0.129.1

Changed

Contributors

0.129.0

📌 Highlights (click each dropdown for more info)

Contributors

0.128.0

[0.128.0] - 2024-05-25

❤️ csv,conf,v8 Edition 🎉🏇🏽 ¡Ándale! ¡Ándale! ¡Arriba! ¡Arriba! 💨

Added

Changed

Contributors

0.127.0

📊 Enhanced Frequency Analysis 📊

Added

Changed

Fixed

Contributors

0.126.0

🤖 Expanded Metadata Inferencing 🤖

New Features

Changes and Optimizations

Security and Robustness

Added

Changed

Fixed

Contributors

0.125.0

In this release, we focused on the 🏎️ need for even more speed 🏎️ .

Added

Changed

Fixed

Contributors

0.124.1

Datapusher+ "Speed of Insight" Release! 🚀🚀🚀

🏇🏽 Hi-ho Quicksilver, away! 🏇🏽

Added

Changed

Fixed

Removed

Contributors

0.123.0

OPEN DATA DAY 2024 Release! 🎉🎉🎉

Highlights:

"Traditional" SQL engines

sqlite3.43.2 takes 2.910 seconds - 20.79x slower

PostgreSQL 15.6 using PgAdmin 4 v6.12 takes 18.527 seconds - 132.34x slower

even with an index, qsv sqlp is still 5.96x faster

Added

Changed

Contributors

0.122.0

👉 REQUEST FOR USE CASES: 👈

Highlights:

Added

Changed

Removed

Contributors

0.121.0

Added

Changed

Fixed

Contributors

❤️ csv,conf,v8 Edition 🎉
🏇🏽 ¡Ándale! ¡Ándale! ¡Arriba! ¡Arriba! 💨