Skip to content

Commit

Permalink
Merge pull request #1257 from jqnatividad/apply-multi-column
Browse files Browse the repository at this point in the history
`apply` & `applydp`: improve usage text in relation to multi-column capabilites
  • Loading branch information
jqnatividad authored Aug 27, 2023
2 parents 45ec7ef + 0f9a257 commit 357eee0
Show file tree
Hide file tree
Showing 3 changed files with 53 additions and 25 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ See [FAQ](https://github.com/jqnatividad/qsv/discussions/categories/faq) for mor

| Command | Description |
| --- | --- |
| [apply](/src/cmd/apply.rs#L2)<br>✨🚀🧠🤖 | Apply series of string, date, math, currency & geocoding transformations to a CSV column. It also has some basic [NLP](https://en.wikipedia.org/wiki/Natural_language_processing) functions ([similarity](https://crates.io/crates/strsim), [sentiment analysis](https://crates.io/crates/vader_sentiment), [profanity](https://docs.rs/censor/latest/censor/), [eudex](https://github.com/ticki/eudex#eudex-a-blazingly-fast-phonetic-reductionhashing-algorithm) & [language detection](https://crates.io/crates/whatlang)). |
| [apply](/src/cmd/apply.rs#L2)<br>✨🚀🧠🤖 | Apply series of string, date, math, currency & geocoding transformations to given CSV column/s. It also has some basic [NLP](https://en.wikipedia.org/wiki/Natural_language_processing) functions ([similarity](https://crates.io/crates/strsim), [sentiment analysis](https://crates.io/crates/vader_sentiment), [profanity](https://docs.rs/censor/latest/censor/), [eudex](https://github.com/ticki/eudex#eudex-a-blazingly-fast-phonetic-reductionhashing-algorithm) & [language detection](https://crates.io/crates/whatlang)). |
| <a name="applydp_deeplink"></a>[applydp](/src/cmd/applydp.rs#L2)<br>🚀 ![CKAN](docs/images/ckan.png)| applydp is a slimmed-down version of `apply` with only [Datapusher+](https://github.com/dathere/datapusher-plus) relevant subcommands/operations (`qsvdp` binary variant only). |
| [behead](/src/cmd/behead.rs#L2) | Drop headers from a CSV. |
| [cat](/src/cmd/cat.rs#L2) | Concatenate CSV files by row or by column. |
Expand Down
24 changes: 18 additions & 6 deletions src/cmd/apply.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
static USAGE: &str = r#"
Apply a series of transformation functions to a given CSV column. This can be used to
Apply a series of transformation functions to given CSV column/s. This can be used to
perform typical data-wrangling tasks and/or to harmonize some values, etc.
It has six subcommands:
Expand All @@ -10,7 +10,7 @@ It has six subcommands:
* geocode - geocodes a WGS84 location against a static copy of the Geonames cities database.
* calcconv - parse and evaluate math expressions, with support for units and conversions.
OPERATIONS
OPERATIONS (multi-column capable)
Multiple operations can be applied, with the comma-delimited operation series
applied in order:
Expand All @@ -22,6 +22,8 @@ applied in order:
Operations support multi-column transformations. Just make sure the
number of transformed columns with the --rename option is the same. e.g.:
# trim and fold to uppercase the col1,col2 and col3 columns and rename them
# to newcol1,newcol2 and newcol3
$ qsv apply operations trim,upper col1,col2,col3 -r newcol1,newcol2,newcol3 file.csv
It has 36 supported operations:
Expand Down Expand Up @@ -136,7 +138,7 @@ You can also use this subcommand command to make a copy of a column:
$ qsv apply operations copy col_to_copy -c col_copy file.csv
EMPTYREPLACE
EMPTYREPLACE (multi-column capable)
Replace empty cells with <--replacement> string.
Non-empty cells are not modified. See the `fill` command for more complex empty field operations.
Expand All @@ -149,7 +151,11 @@ Replace empty cells in file.csv Measurement column with 'Unknown Measurement'.
$ qsv apply emptyreplace --replacement 'Unknown Measurement' file.csv
DATEFMT
Replace all empty cells in file.csv for columns that start with 'Measurement' with 'None'.
$ qsv apply emptyreplace --replacement None '/^Measurement/' file.csv
DATEFMT (multi-column capable)
Formats a recognized date column to a specified format using <--formatstr>.
See https://github.com/jqnatividad/belt/tree/main/dateparser#accepted-date-formats for
recognized date formats.
Expand All @@ -168,6 +174,10 @@ Format multiple date columns in file.csv to ISO 8601/RFC 3339 format:
$ qsv apply datefmt 'Open Date,Modified Date,Closed Date' file.csv
Format all columns that end with "_date" in file.csv to ISO 8601/RFC 3339 format:
$ qsv apply datefmt \_date$\ file.csv
Format dates in OpenDate column using '%Y-%m-%d' format:
$ qsv apply datefmt OpenDate --formatstr '%Y-%m-%d' file.csv
Expand Down Expand Up @@ -271,8 +281,10 @@ qsv apply calcconv --formatstr=<string> [options] --new-column=<name> [<input>]
qsv apply --help
apply arguments:
The <column> argument can be a list of columns for the operations, emptyreplace &
datefmt subcommands. See 'qsv select --help' for the format details.
<column> The column/s to apply the transformation to.
Note that the <column> argument supports multiple columns
for the operations, emptyreplace & datefmt subcommands.
See 'qsv select --help' for the format details.
OPERATIONS subcommand:
<operations> The operation/s to apply.
Expand Down
52 changes: 34 additions & 18 deletions src/cmd/applydp.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
static USAGE: &str = r#"
applydp is a slimmed-down version of apply specifically created for Datapusher+.
It "applies" a series of transformation functions to a given CSV column. This can be used to
It "applies" a series of transformation functions to given CSV column/s. This can be used to
perform typical data-wrangling tasks and/or to harmonize some values, etc.
It has four subcommands:
Expand All @@ -9,7 +9,7 @@ It has four subcommands:
* datefmt - Formats a recognized date column to a specified format using <--formatstr>.
* dynfmt - Dynamically constructs a new column from other columns using the <--formatstr> template.
OPERATIONS
OPERATIONS (multi-column capable)
Multiple operations can be applied, with the comma-delimited operation series
applied in order:
Expand Down Expand Up @@ -60,6 +60,10 @@ save it to a new column named uppercase_clean_surname.
$ qsv applydp operations trim,upper surname -c uppercase_clean_surname file.csv
Trim, squeeze, then transform to uppercase in place ALL fields that end with "_name"
$ qsv applydp operations trim,squeeze,upper \_name$\ file.csv
Trim, then transform to uppercase the firstname and surname fields and
rename the columns ufirstname and usurname.
Expand All @@ -78,7 +82,7 @@ You can also use this subcommand command to make a copy of a column:
$ qsv applydp operations copy col_to_copy -c col_copy file.csv
EMPTYREPLACE
EMPTYREPLACE (multi-column capable)
Replace empty cells with <--replacement> string.
Non-empty cells are not modified. See the `fill` command for more complex empty field operations.
Expand All @@ -91,7 +95,12 @@ Replace empty cells in file.csv Measurement column with 'Unknown Measurement'.
$ qsv applydp emptyreplace --replacement 'Unknown Measurement' file.csv
DATEFMT
Replace all empty cells in file.csv for columns that start with
'observation' case insensitive with 'None'.
$ qsv apply emptyreplace --replacement None '/(?i)^observation/' file.csv
DATEFMT (multi-column capable)
Formats a recognized date column to a specified format using <--formatstr>.
See https://github.com/jqnatividad/belt/tree/main/dateparser#accepted-date-formats for
recognized date formats.
Expand All @@ -110,6 +119,11 @@ Format multiple date columns in file.csv to ISO 8601/RFC 3339 format:
$ qsv applydp datefmt 'Open Date,Modified Date,Closed Date' file.csv
Format all columns that end with "_date" case-insensitive in file.csv to
ISO 8601/RFC 3339 format:
$ qsv apply datefmt '\(?i)_date$\' file.csv
Format dates in OpenDate column using '%Y-%m-%d' format:
$ qsv applydp datefmt OpenDate --formatstr '%Y-%m-%d' file.csv
Expand Down Expand Up @@ -154,28 +168,30 @@ qsv applydp dynfmt --formatstr=<string> [options] --new-column=<name> [<input>]
qsv applydp --help
apply arguments:
The <column> argument can be a list of columns for the operations, emptyreplace &
datefmt subcommands. See 'qsv select --help' for the format details.
<column> The column/s to apply the transformation to.
Note that the <column> argument supports multiple columns
for the operations, emptyreplace & datefmt subcommands.
See 'qsv select --help' for the format details.
OPERATIONS subcommand:
<operations> The operation/s to apply.
<column> The column/s to apply the operations to.
<operations> The operation/s to apply.
<column> The column/s to apply the operations to.
EMPTYREPLACE subcommand:
--replacement=<string> The string to to use to replace empty values.
<column> The column/s to check for emptiness.
--replacement=<string> The string to to use to replace empty values.
<column> The column/s to check for emptiness.
DATEFMT subcommand:
--formatstr=<string> The date format to use for the datefmt operation.
See DATEFMT section in the --formatstr option below
for more details.
<column> The date column/s to apply the datefmt operation to.
--formatstr=<string> The date format to use for the datefmt operation.
See DATEFMT section in the --formatstr option below
for more details.
<column> The date column/s to apply the datefmt operation to.
DYNFMT subcommand:
--formatstr=<string> The template to use for the dynfmt operation.
See DYNFMT example above for more details.
--new-column=<name> Put the generated values in a new column.
--formatstr=<string> The template to use for the dynfmt operation.
See DYNFMT example above for more details.
--new-column=<name> Put the generated values in a new column.
<input> The input file to read from. If not specified, reads from stdin.
applydp options:
Expand Down

0 comments on commit 357eee0

Please sign in to comment.