Skip to content

Commit

Permalink
tojsonl: improved boolean inferencing
Browse files Browse the repository at this point in the history
by turning on case-insensitive frequency compilation, we can now properly infer if a field is boolean even if the domain is more than 2 case-sensitive, but 2 case-insensitive
(e.g. True, False, true, false, TRUE, FALSE, truE, fAlse has a case-sensitive domain of 8 and cardinality 8, but a case insensitive domain of 2, cardinality 2)
  • Loading branch information
jqnatividad committed Oct 27, 2023
1 parent 8093f52 commit 6345f2d
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 4 deletions.
5 changes: 5 additions & 0 deletions src/cmd/tojsonl.rs
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,11 @@ pub fn run(argv: &[&str]) -> CliResult<()> {
// is just two values. if its more than 2, that's all we need know
// for boolean inferencing
flag_enum_threshold: 3,
// ignore case for enum constraints
// so we can properly infer booleans. e.g. if a field has a domain of
// True, False, true, false, TRUE, FALSE that it is still a boolean
// with a case-insensitive cardinality of 2
flag_ignore_case: true,
flag_strict_dates: false,
flag_pattern_columns: crate::select::SelectColumns::parse("")?,
// json doesn't have a date type, so don't infer dates
Expand Down
9 changes: 5 additions & 4 deletions tests/test_tojsonl.rs
Original file line number Diff line number Diff line change
Expand Up @@ -164,12 +164,13 @@ fn tojsonl_not_boolean_case_sensitive() {
let mut cmd = wrk.command("tojsonl");
cmd.arg("in.csv");

// not treated as boolean since col1's domain has three values
// properly treated as boolean since col1's domain has two values
// case-insensitive, even though the enum for col1 is
// True, False and false
let got: String = wrk.stdout(&mut cmd);
let expected = r#"{"col1":"True","col2":"Mark"}
{"col1":"False","col2":"John"}
{"col1":"false","col2":"Bob"}"#;
let expected = r#"{"col1":true,"col2":"Mark"}
{"col1":false,"col2":"John"}
{"col1":false,"col2":"Bob"}"#;
assert_eq!(got, expected);
}

Expand Down

0 comments on commit 6345f2d

Please sign in to comment.