Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some notes for the big update #16

Open
martinus opened this issue Jun 17, 2022 · 4 comments
Open

Some notes for the big update #16

martinus opened this issue Jun 17, 2022 · 4 comments

Comments

@martinus
Copy link
Owner

martinus commented Jun 17, 2022

  • don't split up benchmark results by hash
  • Maybe split up into open address hashing and chained hashing y or node based
    • Split up into 2 categories: open address hashing (boost maps, std::unordered_map), and all of them.
  • Having a filter for the results would be nice
  • Disable zoom? at least make a wider view
  • Add one summary page with the geomean of all find & insert benchmarks (except the ctor benchmarks)
  • Add a conclusio page:
    • Use a reasonable hash that spreads entrophy in upper bits to lower bits. std::hash or boost::hash's identity was and is a bad idea. Doesn't need to withstand randomness tests, mumx seems to be good enough)
    • Use a pool allocator (boost or PoolAllocator). It's faster and uses much less RAM.
  • Create a sortable table: X axis benchmark, y axis map & hash. With one entry that's the GEOMEAN.
@ktprime
Copy link
Contributor

ktprime commented Jun 18, 2022

most test case is only one hash map.
pls add a new benchmark like this (many small hash maps)

template<class hash_type>
void multi_small_ife(const std::string& hash_name, const std::vector<keyType>& vList)
{
#if KEY_INT
    size_t sum = 0;
    const auto hash_size = vList.size() / 1003 + 10;
    const auto ts1 = getus();

	auto mh = new hash_type[hash_size];
	for (const auto& v : vList) {
		auto hash_id = ((uint32_t)v) % hash_size;
		sum += mh[hash_id].emplace(v, 0).second;
	}

	for (const auto& v : vList) {
		auto hash_id = ((uint32_t)v) % hash_size;
		sum += mh[hash_id].count(v);
	}

	for (const auto& v : vList) {
		auto hash_id = ((uint32_t)v) % hash_size;
		sum += mh[hash_id].erase(v + v % 2);
	}

	delete []mh;

#endif
}

@martinus
Copy link
Owner Author

@ktprime I don't see what that benchmarks adds that is not already covered by the other benchmarks?

@ktprime
Copy link
Contributor

ktprime commented Jun 20, 2022

the code is copyed from my bench
https://github.com/ktprime/emhash/blob/master/bench/ebench.cpp

@ktprime
Copy link
Contributor

ktprime commented Jul 5, 2022

I find bench code, key and value(integer case) is alway same type (int-> int, size_t-> size, uint64_t->uint64_t)
can u add or modify some case with <key, value> pairs like <uint64_t, int32_t>?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants