Skip to content

perf: implement size_hint for iterators#75

Open
hsqStephenZhang wants to merge 1 commit intomozillazg:masterfrom
hsqStephenZhang:opt/size_hint
Open

perf: implement size_hint for iterators#75
hsqStephenZhang wants to merge 1 commit intomozillazg:masterfrom
hsqStephenZhang:opt/size_hint

Conversation

@hsqStephenZhang
Copy link
Copy Markdown

@hsqStephenZhang hsqStephenZhang commented Apr 6, 2026

Problem and solution

The iterator's could provide size_hint but did not. This PR bridges that gap to improve performance.

The size_hint is utilized by the standard library to pre-allocate memory for containers like Vec. By providing an accurate hint, we can significantly reduce the number of memory allocations when collecting elements.

benchmark

I tested this PR with the following code, which uses a allocator that counts the number of allocations.
the with_hint is 2 while the baseline is 4, which means that this optimization indeed works.

#[cfg(test)]
mod alloc_tests {
    use std::alloc::{GlobalAlloc, Layout, System};
    use std::sync::atomic::{AtomicUsize, Ordering::Relaxed};

    static ALLOC_COUNT: AtomicUsize = AtomicUsize::new(0);

    struct CountingAlloc;
    unsafe impl GlobalAlloc for CountingAlloc {
        unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
            ALLOC_COUNT.fetch_add(1, Relaxed);
            System.alloc(layout)
        }
        unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
            System.dealloc(ptr, layout)
        }
    }

    #[global_allocator]
    static A: CountingAlloc = CountingAlloc;

    #[test]
    fn size_hint_reduces_allocs() {
        use crate::ToPinyin;
        let s = "这是一段比较长的中文字符串用来测试"; // 17 chars

        // baseline: manually simulate no size_hint (lower=0)
        ALLOC_COUNT.store(0, Relaxed);
        let _: Vec<_> = std::iter::from_fn({
            let mut it = s.to_pinyin();
            // hides size_hint, same affect as the code before this PR
            move || it.next() 
        })
        .collect();
        let baseline = ALLOC_COUNT.load(Relaxed);

        ALLOC_COUNT.store(0, Relaxed);
        let _: Vec<_> = s.to_pinyin().collect();
        let with_hint = ALLOC_COUNT.load(Relaxed);

        println!("{with_hint}, {baseline}");
        assert!(with_hint < baseline,);
    }
}

Summary by CodeRabbit

  • Refactor
    • Optimized iterator implementations with improved size hint calculations for better performance characteristics and memory efficiency.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 6, 2026

📝 Walkthrough

Walkthrough

Added size_hint implementations to iterator types in the pinyin crate across two modules. These implementations provide iteration count metadata to enable consumer code optimizations, with one variant forwarding to an underlying iterator and another computing remaining elements directly.

Changes

Cohort / File(s) Summary
Iterator Size Hints
src/pinyin.rs
Added #[inline] size_hint to PinyinStrIter<'a> forwarding to underlying Chars<'a> iterator's size hint.
Iterator Size Hints
src/pinyin_multi.rs
Added size_hint implementations to PinyinMultiIter (computing remaining as self.inner.count() - self.index) and PinyinMultiStrIter<'a> (forwarding from Chars iterator).

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Poem

A rabbit hops through pinyin strings so fine,
Now hints reveal their size in ordered line,
No logic bent, just metadata revealed,
Optimization opportunities sealed! 🐰✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'perf: implement size_hint for iterators' directly and concisely describes the main change in the PR. The changeset adds size_hint implementations to iterator types (PinyinStrIter, PinyinMultiIter, PinyinMultiStrIter) with the explicit goal of improving performance through better memory pre-allocation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/pinyin_multi.rs (1)

63-67: Consider adding #[inline] for consistency.

The calculation is correct. However, the other size_hint implementations in this PR (PinyinStrIter, PinyinMultiStrIter) include #[inline]. Adding it here would maintain consistency across the codebase.

Suggested diff
+    #[inline]
     fn size_hint(&self) -> (usize, Option<usize>) {
         let remaining = self.inner.count() - self.index;
         (remaining, Some(remaining))
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/pinyin_multi.rs` around lines 63 - 67, Add the #[inline] attribute to the
iterator's size_hint method to match the other implementations (e.g.,
PinyinStrIter and PinyinMultiStrIter); locate the fn size_hint(&self) -> (usize,
Option<usize>) in this file (the method that computes remaining =
self.inner.count() - self.index) and place #[inline] immediately above it to
keep inlining behavior and consistency across the iterators.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/pinyin_multi.rs`:
- Around line 63-67: Add the #[inline] attribute to the iterator's size_hint
method to match the other implementations (e.g., PinyinStrIter and
PinyinMultiStrIter); locate the fn size_hint(&self) -> (usize, Option<usize>) in
this file (the method that computes remaining = self.inner.count() - self.index)
and place #[inline] immediately above it to keep inlining behavior and
consistency across the iterators.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 08cb5c85-cdba-46b8-ba23-8902661e255d

📥 Commits

Reviewing files that changed from the base of the PR and between 76fc7d3 and b11ac18.

📒 Files selected for processing (2)
  • src/pinyin.rs
  • src/pinyin_multi.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant