Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashed Containers (or a Better Explanation why not) #9

Open
polijan opened this issue Jan 3, 2021 · 6 comments
Open

Hashed Containers (or a Better Explanation why not) #9

polijan opened this issue Jan 3, 2021 · 6 comments

Comments

@polijan
Copy link

polijan commented Jan 3, 2021

The README states that CTL will not have unordered_map and unordered_set. The rationale for this is quite succinct: "ordered containers are preferred, even at the cost of performance".

If CTL won't offer hashed containers, could it at least extend the explanation why red/black containers should be always preferred. It's not obvious why it would always be true. Especially, as:

  1. hash tables are a popular datastructure in C
  2. it's not just about popularity, hash containers retrieval is algorithmically faster (O(1) average) which really can matter.
@glouw
Copy link
Owner

glouw commented Jan 3, 2021

Yes, I got quite a bit of interest on hackernews for unordered sets, so I started my implementation of std::unordered_set on the staging branch (see ust.h): https://github.com/glouw/ctl/blob/staging/ctl/ust.h

@glouw
Copy link
Owner

glouw commented Jan 3, 2021

It should be ready in a couple weeks

@rurban
Copy link

rurban commented Jan 4, 2021

But very buggy still. And chained, not open addressing. You shalt not make the same mistake as the STL

Rehash, each are broken, insert needs to check the load_factor

@glouw
Copy link
Owner

glouw commented Jan 4, 2021

@rurban, considering it's the staging branch, I wouldn't consider it production worthy. As for linked list chained hash tables, the main benefit is that pointers to unordered set elements remain valid for the lifetime of the unordered set, which, in practically, is more important than speed. I wouldn't consider this a mistake in the STL.

@glouw
Copy link
Owner

glouw commented Jan 13, 2021

I've back ported std::unordered_set, named ust.h. It's 2.5x faster than set.h. Here are it's performance characteristics compare to set.h:

image

image

@glouw
Copy link
Owner

glouw commented Jan 13, 2021

Example hashing of strings (with a relatively uninteresting hash function):

#include <stdio.h>
#include <str.h>

#define T str
#include <ust.h>

size_t
str_hash(str* s)
{
    size_t hash = 5381;
    int c;
    char* temp = s->value;
    while(c = *temp++)
        hash = ((hash << 5) + hash) + c;
    return hash;
}

int
str_equal(str* a, str* b)
{
    return str_key_compare(a, b) == 0;
}

int
main(void)
{
    ust_str strs = ust_str_init(str_hash, str_equal);
    ust_str_insert(&strs, str_init("this"));
    ust_str_insert(&strs, str_init("is"));
    ust_str_insert(&strs, str_init("a"));
    ust_str_insert(&strs, str_init("test"));
    ust_str_insert(&strs, str_init("for"));
    ust_str_insert(&strs, str_init("unordered"));
    ust_str_insert(&strs, str_init("set"));
    ust_str_insert(&strs, str_init("for"));
    ust_str_insert(&strs, str_init("the"));
    ust_str_insert(&strs, str_init("C"));
    ust_str_insert(&strs, str_init("Standard"));
    ust_str_insert(&strs, str_init("Template"));
    ust_str_insert(&strs, str_init("Library"));
    ust_str_insert(&strs, str_init("(CTL)"));
    ust_str_insert(&strs, str_init("..."));
    foreach(ust_str, &strs, it) 
        puts(str_c_str(it.ref));
    for(size_t i = 0; i < strs.bucket_count; i++)
        printf("%2d : %2lu\n", i, ust_str_bucket_size(&strs, i));
    str what = str_init("(CTL)");
    str* found = &ust_str_find(&strs, what)->key;
    puts(str_c_str(found));
    ust_str_erase(&strs, what); // For fun.
    str_free(&what);
    ust_str_free(&strs);
}
gcc test.c -I ctl
this
Template
the
(CTL)
is
unordered
Standard
Library
C
a
...
set
for
test
 0 :  0
 1 :  0
 2 :  1
 3 :  0
 4 :  2
 5 :  1
 6 :  0
 7 :  0
 8 :  1
 9 :  0
10 :  1
11 :  0
12 :  0
13 :  1
14 :  1
15 :  1
16 :  1
17 :  0
18 :  1
19 :  0
20 :  1
21 :  0
22 :  0
23 :  0
24 :  2
25 :  0
26 :  0
27 :  0
28 :  0
(CTL)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants