Skip to content

Commit

Permalink
Add v2 using generics
Browse files Browse the repository at this point in the history
  • Loading branch information
duncanharris committed May 11, 2022
1 parent 79084ac commit c817b64
Show file tree
Hide file tree
Showing 11 changed files with 636 additions and 15 deletions.
74 changes: 68 additions & 6 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ on:
pull_request:

jobs:
build:
name: CI
build_v2:
name: Build for v2
runs-on: ubuntu-latest

steps:
Expand All @@ -22,14 +22,13 @@ jobs:
echo github.event.changes.title.from=$CI_PR_PREV_TITLE
- name: Set up Go
uses: actions/setup-go@v2
uses: actions/setup-go@v3
with:
go-version: '~1.17.9'
go-version: '~1.18'
id: go

- name: Install utilities
run: |
go install golang.org/x/lint/golint@latest
go install golang.org/x/tools/cmd/goimports@latest
go install honnef.co/go/tools/cmd/staticcheck@latest
# display Go environment for reference
Expand All @@ -47,21 +46,84 @@ jobs:
- name: Get dependencies
run: |
cd v2
go mod tidy
/usr/bin/git diff --exit-code
- name: Build
run: |
cd v2
go build -v ./...
- name: Check
run: |
cd v2
go vet ./...
golint ./...
staticcheck ./...
goimports -w .
/usr/bin/git diff --exit-code
- name: Test
run: |
cd v2
go test -v ./...
build_v1:
name: Build for v1
runs-on: ubuntu-latest

steps:
- name: Log
env:
CI_EVENT_ACTION: ${{ github.event.action }}
CI_PR_TITLE: ${{ github.event.pull_request.title }}
CI_PR_PREV_TITLE: ${{ github.event.changes.title.from }}
run: |
echo github.event.action=$CI_EVENT_ACTION
echo github.event.pull_request.title=$CI_PR_TITLE
echo github.event.changes.title.from=$CI_PR_PREV_TITLE
- name: Set up Go
uses: actions/setup-go@v3
with:
go-version: '~1.17'
id: go

- name: Install utilities
run: |
go install golang.org/x/lint/golint@latest
go install golang.org/x/tools/cmd/goimports@latest
go install honnef.co/go/tools/cmd/staticcheck@latest
# display Go environment for reference
go env
- name: Check out code
uses: actions/checkout@v2

- uses: actions/cache@v2
with:
path: ~/go/pkg/mod
key: ${{ runner.os }}-go-${{ hashFiles('**/go.sum') }}
restore-keys: |
${{ runner.os }}-go-
- name: Get dependencies
run: |
go mod tidy
/usr/bin/git diff --exit-code
- name: Build
run: |
go build -v ./...
- name: Check
run: |
go vet ./*.go
golint ./*.go
staticcheck ./*.go
goimports -w ./*.go
/usr/bin/git diff --exit-code
- name: Test
run: |
go test -v ./...
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Copyright 2021 The Sensible Code Company Ltd
Copyright 2022 The Sensible Code Company Ltd

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and
associated documentation files (the "Software"), to deal in the Software without restriction,
Expand Down
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# faststringmap

## v2 : Latest for Go 1.18 onwards
**v2** is the latest which uses generics and runs on Go 1.18. See [v2/README.md](v2/README.md) for details.

## v1 : for Go 1.17 and earlier

`faststringmap` is a fast read-only string keyed map for Go (golang).
For our use case it is approximately 5 times faster than using Go's
built-in map type with a string key. It also has the following advantages:
Expand Down Expand Up @@ -50,9 +55,5 @@ BenchmarkGoStringToUint32-8 49279 24483 ns/op

## Improvements

You can improve the performance further by using a slice for the ``next`` fields.
This avoids a bounds check when looking up the entry for a byte. However, it
comes at the cost of easy serialization and introduces a lot of pointers which
will have impact on GC. It is not possible to directly construct the slice version
in the same way so that the whole store is one block of memory. Either create as in
this code and then derive the slice version or create distinct slice objects at each level.
[v2](v2/README.md) features a version which has improved performance by using a slice for
the `next` fields. It is also built using generics so you can easily use any value type.
2 changes: 1 addition & 1 deletion uint32_store.go
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// Copyright 2021 The Sensible Code Company Ltd
// Copyright 2022 The Sensible Code Company Ltd
// Author: Duncan Harris

package faststringmap
Expand Down
3 changes: 3 additions & 0 deletions uint32_store_example_test.go
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
// Copyright 2022 The Sensible Code Company Ltd
// Author: Duncan Harris

package faststringmap_test

import (
Expand Down
2 changes: 1 addition & 1 deletion uint32_store_test.go
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// Copyright 2021 The Sensible Code Company Ltd
// Copyright 2022 The Sensible Code Company Ltd
// Author: Duncan Harris

package faststringmap_test
Expand Down
65 changes: 65 additions & 0 deletions v2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# faststringmap

`faststringmap` is a fast read-only string keyed map for Go (golang).
For our use case it is approximately 5 times faster than using Go's
built-in map type with a string key. It also has the following advantages:

* look up strings and byte slices without use of the `unsafe` package
* minimal impact on GC due to lack of pointers in the data structure
* data structure can be trivially serialized to disk or network

faststringmap v2 is built using Go generics for Go 1.18 onwards.

`faststringmap` is a variant of a data structure called a
[Trie](https://en.wikipedia.org/wiki/Trie).
At each level we use a slice to hold the next possible byte values.
This slice is of length one plus the difference between the lowest and highest
possible next bytes of strings in the map. Not all the entries in the slice are
valid next bytes. `faststringmap` is thus more space efficient for keys using a
small set of nearby runes, for example those using a lot of digits.

There are two variants provided:

* `Map` is a version using a single slice and indexes which can be directly
serialized (e.g. to a file). It contains no embedded pointers so has minimal
impact on GC.

* `MapFaster` has improved performance by using a slice for the `next` fields.
This avoids a bounds check when looking up the entry for a byte. However, it
comes at the cost of easy serialization and introduces a lot of pointers which
will have impact on GC. It is not possible to directly construct the slice version
in the same way so that the whole store is one block of memory. So this code provides
a function to create it from `Map`. An alternative construction might create distinct
slice objects at each level.

## Example

Example usage can be found in the tests and also
[`fast_string_map_example_test.go`](fast_string_map_example_test.go)
which shows a populated data structure to aid understanding.

## Motivation

I created `faststringmap` in order to improve the speed of parsing CSV
where the fields were category codes from survey data. The majority of these
were numeric (`"1"`, `"2"`, `"3"`...) plus a distinct code for "not applicable".
I was struck that in the simplest possible cases (e.g. `"1"` ... `"5"`) the map
should be a single slice lookup.

Our fast CSV parser provides fields as byte slices into the read buffer to
avoid creating string objects. So I also wanted to facilitate key lookup from a
`[]byte` rather than a string. This is not possible using a built-in Go map without
use of the `unsafe` package.

## Benchmarks

Below are example benchmarks from my laptop which are for looking up every element
in a map of size 1000. So approximate times are 25ns per lookup for the Go native map
and 5ns per lookup for the ``faststringmap``.
```
cpu: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
BenchmarkUint32Store
BenchmarkUint32Store-8 218463 4959 ns/op
BenchmarkGoStringToUint32
BenchmarkGoStringToUint32-8 49279 24483 ns/op
```
Loading

0 comments on commit c817b64

Please sign in to comment.