Skip to content

RFC-5871: Read Returns Metadata#5871

Merged
Xuanwo merged 7 commits intomainfrom
rfc-read_returns_metadata
Mar 26, 2025
Merged

RFC-5871: Read Returns Metadata#5871
Xuanwo merged 7 commits intomainfrom
rfc-read_returns_metadata

Conversation

@meteorgan
Copy link
Copy Markdown
Contributor

@meteorgan meteorgan commented Mar 24, 2025

Which issue does this PR close?

Closes #5425.

Rationale for this change

What changes are included in this PR?

a new RFC

Are there any user-facing changes?

@meteorgan meteorgan changed the title RFC: Read Returns Metadata RFC-5871: Read Returns Metadata Mar 24, 2025
zhaohaidao pushed a commit to zhaohaidao/opendal that referenced this pull request Mar 24, 2025
Comment thread core/src/docs/rfcs/5871_read_returns_metadata.md Outdated
Comment thread core/src/docs/rfcs/5871_read_returns_metadata.md
@meteorgan meteorgan marked this pull request as ready for review March 24, 2025 15:29
@meteorgan meteorgan requested a review from Xuanwo as a code owner March 24, 2025 15:30
Comment thread core/src/docs/rfcs/5871_read_returns_metadata.md Outdated
@meteorgan meteorgan force-pushed the rfc-read_returns_metadata branch from bdb5021 to 4a3ffe9 Compare March 25, 2025 11:14
Comment thread core/src/docs/rfcs/5871_read_returns_metadata.md Outdated
Comment thread core/src/docs/rfcs/5871_read_returns_metadata.md Outdated
Comment thread core/src/docs/rfcs/5871_read_returns_metadata.md
Comment thread core/src/docs/rfcs/5871_read_returns_metadata.md Outdated
@meteorgan meteorgan force-pushed the rfc-read_returns_metadata branch from fae7404 to d2bfabc Compare March 25, 2025 15:52
@meteorgan
Copy link
Copy Markdown
Contributor Author

I think the crux of the issue lies here, metadata() returns a u64, not an Option<u64>. I'm not sure if we're required to always return a valid content_length. If we are, why does the comment mention it'll be 0 when unset ? That could clash with cases where the content is genuinely empty.

/// Content length of this entry. It will be `0` if the content length is not set by the storage services.
pub fn content_length(&self) -> u64 {
self.content_length.unwrap_or_default()
}

@Xuanwo
Copy link
Copy Markdown
Member

Xuanwo commented Mar 26, 2025

I'm not sure if we're required to always return a valid content_length. If we are, why does the comment mention it'll be 0 when unset ? That could clash with cases where the content is genuinely empty.

It's definitely a dirty workaround.

We have this because some storage services, like fs, don't return content_length in their listings. However, since Metadata::content_length is one of the most commonly used metadata fields, we don't want users to have to check for its existence just because a few services don't provide it.

Comment thread core/src/docs/rfcs/5871_read_returns_metadata.md Outdated
Comment thread core/src/docs/rfcs/5871_read_returns_metadata.md Outdated
pub trait Read {
// Existing functions...

fn metadata(&self) -> Metadata;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such API also changes the fact that all APIs for oio::Read takes &mut self. This can lead to oio::Read is not Sync anymore.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can lead to oio::Read is not Sync anymore.

Hmmm, I don't understand this point of view.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an interesting trick involving Rust's Send and Sync.

If a trait only provides &mut self APIs and there's no way to access the struct itself, we can safely declare it as Sync when it's Send by self because it cannot be accessed from multiple threads simultaneously. This can eliminate some unnecessary Mutex<T> instances when implementing oio::Read.

Comment thread core/src/docs/rfcs/5871_read_returns_metadata.md Outdated
Comment thread core/src/docs/rfcs/5871_read_returns_metadata.md
Comment thread core/src/docs/rfcs/5871_read_returns_metadata.md Outdated
@meteorgan meteorgan force-pushed the rfc-read_returns_metadata branch from d2bfabc to d6e018b Compare March 26, 2025 14:07
Copy link
Copy Markdown
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @meteorgan for this, looks nice!

@Xuanwo Xuanwo merged commit fe9469d into main Mar 26, 2025
84 checks passed
@Xuanwo Xuanwo deleted the rfc-read_returns_metadata branch March 26, 2025 15:57
@meteorgan
Copy link
Copy Markdown
Contributor Author

Thanks for the review—it’s practically a whole new design. I’ve learned a ton from it! @Xuanwo

@Xuanwo
Copy link
Copy Markdown
Member

Xuanwo commented Mar 26, 2025

Thanks for the review—it’s practically a whole new design. I’ve learned a ton from it! @Xuanwo

Nice work!

@Xuanwo
Copy link
Copy Markdown
Member

Xuanwo commented Mar 27, 2025

Oh, I recall an issue with the current design while sleeping last night.

We are not calling Access::read during Operator::reader so we might don't have Metadata to use at this stage.

@meteorgan
Copy link
Copy Markdown
Contributor Author

We are not calling Access::read during Operator::reader so we might don't have Metadata to use at this stage.

I'm aware of this issue, and planning to call Access::read before constructing ReadContext.

@Xuanwo
Copy link
Copy Markdown
Member

Xuanwo commented Mar 27, 2025

I'm aware of this issue, and planning to call Access::read before constructing ReadContext.

But we don't know the range to read 🤔

@meteorgan
Copy link
Copy Markdown
Contributor Author

But we don't know the range to read 🤔

Oh, Got it. After digging a bit deeper, I've come up with a few more questions:

  1. Does it make sense to run stat before Access::read ? Without a version_id, these two operations might end up hitting the same object but different version_ids
  2. Based on how ReadGenerator is designed, it looks like every call to next_reader triggers another Access::read. Couldn't that mean we sometimes end up reading different versions of the same object ?

@meteorgan
Copy link
Copy Markdown
Contributor Author

meteorgan commented Mar 27, 2025

But we don't know the range to read 🤔

It looks like we need a bit of time to think through this issue. It might actually call for an API change, or add a new API returns both data and metadata ?

@meteorgan
Copy link
Copy Markdown
Contributor Author

The returned value like object_store might be a better choice.

/// Result for a get request
#[derive(Debug)]
pub struct GetResult {
    /// The [`GetResultPayload`]
    pub payload: GetResultPayload,
    /// The [`ObjectMeta`] for this object
    pub meta: ObjectMeta,
    /// The range of bytes returned by this request
    ///
    /// Note this is not `usize` as `object_store` supports 32-bit architectures such as WASM
    pub range: Range<u64>,
    /// Additional object attributes
    pub attributes: Attributes,
}

@meteorgan
Copy link
Copy Markdown
Contributor Author

Hi, @Xuanwo do any new ideas on these issues ?

@Xuanwo
Copy link
Copy Markdown
Member

Xuanwo commented Mar 31, 2025

Hi, @Xuanwo do any new ideas on these issues ?

I'm guessing we need to start a new one. Returning GetResult is not good for us since our Reader performs more complex work than object_store do like chunking and concurrent reading.

@meteorgan
Copy link
Copy Markdown
Contributor Author

meteorgan commented Mar 31, 2025

Oh, Got it. After digging a bit deeper, I've come up with a few more questions:

  1. Does it make sense to run stat before Access::read ? Without a version_id, these two operations might end up hitting the same object but different version_ids
  2. Based on how ReadGenerator is designed, it looks like every call to next_reader triggers another Access::read. Couldn't that mean we sometimes end up reading different versions of the same object ?

How about these two issues ? Did i overlook anything ? or are there potential bugs in current implementation ? @Xuanwo

@meteorgan
Copy link
Copy Markdown
Contributor Author

I'm guessing we need to start a new one.

Are you suggesting we open a new issue to dive into these issues ?

@Xuanwo
Copy link
Copy Markdown
Member

Xuanwo commented Mar 31, 2025

How about these two issues ? Did overlook anything ? or are there potential bugs in current implementation ? @Xuanwo

Users who want to make sure to read the exact version should specify version while calling reader_with. We can make this clear in the API docs.

Are you suggesting we open a new issue to dive into these issues ?

Let's reopen the old issue.

@meteorgan
Copy link
Copy Markdown
Contributor Author

Users who want to make sure to read the exact version should specify version while calling reader_with. We can make this clear in the API docs.

These issues don't tie into Read Returns Metadata. I'll go ahead and open a separate issue to cover them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

new feature: Returning metadata while reading

2 participants