Skip to content

Commit

Permalink
Merge pull request #1372 from jqnatividad/sniff-just-mime
Browse files Browse the repository at this point in the history
`sniff` add `--just-mime` option
  • Loading branch information
jqnatividad authored Oct 18, 2023
2 parents c13c341 + 12d4b4f commit 7c92cff
Show file tree
Hide file tree
Showing 3 changed files with 43 additions and 0 deletions.
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,11 @@ The `to` command converts CSVs to `.xlsx`, [Parquet](https://parquet.apache.org)

The `sqlp` command returns query results in CSV, JSON, Parquet & [Arrow IPC](https://arrow.apache.org/docs/format/Columnar.html#ipc-file-format) formats. Polars SQL also supports reading external files directly in various formats with its `read_ndjson`, `read_csv`, `read_parquet` & `read_ipc` [table functions](https://github.com/pola-rs/polars/blob/c7fa66a1340418789ec66bdedad6654281afa0ab/polars/polars-sql/src/table_functions.rs#L9-L36).

The `sniff` command can also detect the mime type of any file with the `--no-infer` or `--just-mime` options, may it be local or remote (http and https schemes supported).
It can detect more than 120 file formats, including MS Office/Open Document files, JSON, XML,
PDF, PNG, JPEG and specialized geospatial formats like GPX, GML, KML, TML, TMX, TSX, TTML.
See https://docs.rs/file-format/latest/file_format/#reader-features for a complete list.

### Snappy Compression/Decompression

qsv supports *automatic compression/decompression* using the [Snappy frame format](https://github.com/google/snappy/blob/main/framing_format.txt). Snappy was chosen instead of more popular compression formats like gzip because it was designed for [high-performance streaming compression & decompression](https://github.com/google/snappy/tree/main/docs#readme) (up to 2.58 gb/sec compression, 0.89 gb/sec decompression).
Expand Down
9 changes: 9 additions & 0 deletions src/cmd/sniff.rs
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,10 @@ sniff options:
(Unsigned, Signed => Integer, Text => String, everything else the same)
--no-infer Do not infer the schema. Only return the file's mime type, size and
last modified date. Use this to use sniff as a general mime type detector.
Note that CSV and TSV files will only be detected as mime type plain/text
in this mode.
--just-mime Only return the file's mime type. Use this to use sniff as a general
mime type detector. Synonym for --no-infer.
--quick When sniffing a non-CSV remote file, only download the first chunk of the file
before attempting to detect the mime type. This is faster but less accurate as
some mime types cannot be detected with just the first downloaded chunk.
Expand Down Expand Up @@ -139,6 +143,7 @@ struct Args {
flag_user_agent: Option<String>,
flag_stats_types: bool,
flag_no_infer: bool,
flag_just_mime: bool,
flag_quick: bool,
flag_harvest_mode: bool,
}
Expand Down Expand Up @@ -714,6 +719,10 @@ async fn sniff_main(mut args: Args) -> CliResult<()> {
Some("CKAN-harvest/$QSV_VERSION ($QSV_TARGET; $QSV_BIN_NAME)".to_string());
}

if args.flag_just_mime {
args.flag_no_infer = true;
}

let mut sample_size = args.flag_sample;
if sample_size < 0.0 {
if args.flag_json || args.flag_pretty_json {
Expand Down
29 changes: 29 additions & 0 deletions tests/test_sniff.rs
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,35 @@ fn sniff_notcsv() {
assert!(got_error.starts_with(expected));
}

#[test]
fn sniff_justmime() {
let wrk = Workdir::new("sniff_justmime");

let test_file = wrk.load_test_file("excel-xls.xls");

let mut cmd = wrk.command("sniff");
cmd.arg("--just-mime").arg(test_file);

let got: String = wrk.stdout(&mut cmd);

let expected = "Detected mime type: application/vnd.ms-excel";
assert!(got.starts_with(expected));
}

#[test]
fn sniff_justmime_remote() {
let wrk = Workdir::new("sniff_justmime_remote");

let mut cmd = wrk.command("sniff");
cmd.arg("--just-mime")
.arg("https://github.com/jqnatividad/qsv/raw/master/resources/test/excel-xls.xls");

let got: String = wrk.stdout(&mut cmd);

let expected = "Detected mime type: application/vnd.ms-excel";
assert!(got.starts_with(expected));
}

#[test]
fn sniff_url_snappy() {
let wrk = Workdir::new("sniff_url_snappy");
Expand Down

0 comments on commit 7c92cff

Please sign in to comment.