Skip to content

PermissionDenied Error When Committing Large Number of Documents #2647

@hkhk368

Description

@hkhk368

Describe the bug
In the code below, the writer is only created once and is operated on by multiple threads using an Arc.
Each time, data is sent in via an mpsc channel for committing.
When the amount of data to be indexed by Tantivy is small, the code works fine.
However, when there are tens of thousands of documents to index, the following errors occur:

Commit failed: Failed to open file for write: 'IoError { io_error: Os { code: 5, kind: PermissionDenied, message: "Access denied." }, filepath: "4ce4108abcf04b7cb72d6dc8b9318627.fieldnorm" }'
Commit failed: Failed to open file for write: 'IoError { io_error: Os { code: 5, kind: PermissionDenied, message: "Access denied." }, filepath: "4501cf16a4e14f0e98f07bc1d275b0b5.idx" }'
Commit failed: An IO error occurred: 'Access denied. (os error 5)'
Commit failed: Failed to open file for write: 'IoError { io_error: Os { code: 5, kind: PermissionDenied, message: "Access denied." }, filepath: "309472c95d424d0ca02f36ff423e69e0.idx" }'
Commit failed: An IO error occurred: 'Access denied. (os error 5)'
Commit failed: An IO error occurred: 'Access denied. (os error 5)'
Commit failed: An IO error occurred: 'Access denied. (os error 5)'
Commit failed: An IO error occurred: 'Access denied. (os error 5)'
Commit failed: An IO error occurred: 'Access denied. (os error 5)'
Commit failed: An IO error occurred: 'Access denied. (os error 5)'
Commit failed: An IO error occurred: 'Access denied. (os error 5)'
Commit failed: An IO error occurred: 'Access denied. (os error 5)'
Commit failed: An IO error occurred: 'Access denied. (os error 5)'

Which version of tantivy are you using?
tantivy = "0.24.1"

To Reproduce

use tokio::sync::mpsc::{UnboundedReceiver, UnboundedSender};
use parking_lot::{Mutex, RwLock};
pub async fn write_index_to_disk(
    mut rx: UnboundedReceiver<Vec<(u8, u8, u32, u64, String)>>,
    index: Index,
    drive_file_field: tantivy::schema::Field,
    modified_field: tantivy::schema::Field,
    body_field: tantivy::schema::Field,
) {
    let writer = index.writer(500_000_000).expect("create writer");
    let merge_policy = <LogMergePolicy as std::default::Default>::default();
    writer.set_merge_policy(Box::new(merge_policy));
    let writer = Arc::new(RwLock::new(writer)); // parking_lot::RwLock
    while let Some(batch) = rx.recv().await {
        let batch_len = batch.len();
        // println!("{:?}", batch);
        let writer_clone = Arc::clone(&writer);
        batch
            .into_par_iter()
            .for_each(|(flag, drive_index, file_id, modified_time, content)| {
                let drive_file = (drive_index as u64) << 32 | (file_id as u64);
                let iw_read = writer_clone.read();
                match flag {
                    1 => {
                        iw_read
                            .add_document(doc!(
                                drive_file_field => drive_file,
                                modified_field   => modified_time,
                                body_field       => content,
                            ))
                            .expect("add document failed");
                    }
                    2 => {
                        iw_read.delete_term(Term::from_field_u64(drive_file_field, drive_file));
                    }
                    3 => {
                        iw_read.delete_term(Term::from_field_u64(drive_file_field, drive_file));
                        iw_read
                            .add_document(doc!(
                                drive_file_field => drive_file,
                                modified_field   => modified_time,
                                body_field       => content,
                            ))
                            .expect("add document failed");
                    }
                    _ => {}
                }
            });

        let mut iw_write = writer.write();

        
        match iw_write.commit() {
            Ok(_) => {
                // println!("commit success");
            }
            Err(e) => {
                println!("Commit failed: {}", e);
            }
        }

        update_content_counter_and_maybe_print(batch_len as u64);

    }
}

Env
windows 11 23h2
no anti-virus software
no bitlocker
Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions