Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avro reader/writer cannot roundtrip sample foods1.csv #19619

Open
2 tasks done
Bidek56 opened this issue Nov 4, 2024 · 1 comment
Open
2 tasks done

Avro reader/writer cannot roundtrip sample foods1.csv #19619

Bidek56 opened this issue Nov 4, 2024 · 1 comment
Labels
A-io-avro Area: reading/writing Avro files bug Something isn't working needs triage Awaiting prioritization by a maintainer rust Related to Rust Polars

Comments

@Bidek56
Copy link
Contributor

Bidek56 commented Nov 4, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

use std::error::Error;
use std::io::{BufReader, BufWriter};
use polars::prelude::*;
use polars::frame::DataFrame;
use polars::prelude::SerReader;
use polars::io::avro::{AvroWriter, AvroReader};

fn read_csv(csv_file: &str) -> Result<DataFrame, Box<dyn Error>> {
    let file = std::fs::File::open(csv_file).unwrap();
    let df = CsvReadOptions::default()
        .into_reader_with_file_handle(file)
        .finish()
        .map_err(|e| format!("{e}") )?;
    Ok(df)
}

fn read_write_avro_read_avro(df: DataFrame, avro_file: &str) -> Result<DataFrame, Box<dyn Error>> {
    let f = std::fs::File::create(avro_file).unwrap();
    let f = BufWriter::new(f);
    AvroWriter::new(f)
        .finish(&mut df.clone())
        .map_err(|e| format!("{e}") )?;

    // Read Avro
    let f = std::fs::File::open(avro_file)?;
    let reader = BufReader::new(f);
    let adf = AvroReader::new(reader)
        // .with_n_rows(Some(5))
        .finish()
        .map_err(|e| format!("{e}") )?;

    // Print the Avro DataFrame
    println!("{:?}", adf);

    Ok(adf)
}

fn main() {
    const FOODS_CSV: &str = "../../examples/datasets/foods1.csv";

    match read_csv(FOODS_CSV) {
        Ok(df) => {
            match read_write_avro_read_avro(df, "test.avro") {
                Ok(adf) => {
                    println!("ADF: \n{:?}", adf);
                }
                Err(e) => {
                    eprintln!("Error reading Avro file: {}", e);
                }
            }
        }
        Err(e) => {
            eprintln!("Error reading CSV file: {}", e);
        }
    };
}

Log output

Error reading Avro file: avro-error: OutOfSpec

Issue description

After writing an Avro file using AvroWriter::new, only the first 4 can be read using: AvroReader::new
Any rows after 4 return: avro-error: OutOfSpec

Similar issue occurs in this nodejs-polars PR

Expected behavior

Avro file can be read into a DF after it has been written using AvroWriter::new

Installed versions

[dependencies]
polars = "0.44.2"
polars-io = { version = "0.44.2", features = [ "avro" ] }

@Bidek56 Bidek56 added bug Something isn't working needs triage Awaiting prioritization by a maintainer rust Related to Rust Polars labels Nov 4, 2024
@Bidek56 Bidek56 changed the title Avro reader/writer cannot roundtrip sample food1.csv file Avro reader/writer cannot roundtrip sample food1.csv Nov 4, 2024
@Bidek56 Bidek56 changed the title Avro reader/writer cannot roundtrip sample food1.csv Avro reader/writer cannot roundtrip sample foods1.csv Nov 4, 2024
@Bidek56
Copy link
Contributor Author

Bidek56 commented Nov 4, 2024

Is this related to this PY issue?

@alexander-beedie alexander-beedie added the A-io-avro Area: reading/writing Avro files label Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io-avro Area: reading/writing Avro files bug Something isn't working needs triage Awaiting prioritization by a maintainer rust Related to Rust Polars
Projects
None yet
Development

No branches or pull requests

2 participants