Skip to content

[Rust] FlatBufferBuilder::create_shared_string is O(N^2) #8687

@PollRobots

Description

@PollRobots

The string_pool is maintained as a sorted list of offsets, which when create_shared_string is called are searched using slice::binary_search_by

If a match is not found, then the string is stored using create_byte_string and the index is inserted into string_pool using Vec::insert

So the happy path is an O(LogN) binary search

The unhappy path is an O(N) vector insertion

Every happy path requires a preceding unhappy path, so if you have N unique strings and M duplicates, then the cost of all calls to create_shared_string becomes O(N^2) + O(MLogN) or simply O(N^2)

I have a stress test that is writing a flatbuffer with ~500,000 unique strings, 80% of our serialization time is in create_shared_string

I can reduce this to ~30% of our serialization time (total serialization time goes from to 20.4s to 4.1s) by

  1. changing the type of string_pool to HashMap<String, WIPOffset<&'fbb str>>, and
  2. changing the implementation of create_shared_string to
    #[inline]
    pub fn create_shared_string<'a: 'b, 'b>(&'a mut self, s: &'b str) -> WIPOffset<&'fbb str> {
        self.assert_not_nested(
            "create_shared_string can not be called when a table or vector is under construction",
        );

        match self.strings_pool.get(s) {
            Some(address) => address,
            None => {
                let address = WIPOffset::new(self.create_byte_string(s.as_bytes()).value());
                self.strings_pool.insert(s.to_owned(), address);
                address
            }
        }
    }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions