Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: uses memoization to decrease memory consumption. #13

Merged
merged 1 commit into from
Nov 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions internal/memoize/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Memoize

**IMPORTANT: This package is copied from <https://github.com/corazawaf/coraza/tree/main/internal/memoize>**

Memoize allows to cache certain expensive function calls and
cache the result. The main advantage in Coraza is to memoize
the regexes and aho-corasick dictionaries when the connects
spins up more than one WAF in the same process and hence same
regexes are being compiled over and over.

Currently it is opt-in under the `memoize_builders` build tag
as under a misuse (e.g. using after build time) it could lead
to a memory leak as currently the cache is global.

**Important:** Connectors with *live reload* functionality (e.g. Caddy)
could lead to memory leaks which might or might not be negligible in
most of the cases as usually config changes in a WAF are about a few
rules, this is old objects will be still alive in memory until the program
stops.
10 changes: 10 additions & 0 deletions internal/memoize/noop.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
// Copyright 2023 Juan Pablo Tosso and the OWASP Coraza contributors
// SPDX-License-Identifier: Apache-2.0

//go:build !memoize_builders

package memoize

func Do(_ string, fn func() (interface{}, error)) (interface{}, error) {
return fn()
}
36 changes: 36 additions & 0 deletions internal/memoize/nosync.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
// Copyright 2023 Juan Pablo Tosso and the OWASP Coraza contributors
// SPDX-License-Identifier: Apache-2.0

//go:build tinygo && memoize_builders

package memoize

import "sync"

var doer = makeDoer(new(sync.Map))

// Do executes and returns the results of the given function, unless there was a cached
// value of the same key. Only one execution is in-flight for a given key at a time.
// The boolean return value indicates whether v was previously stored.
func Do(key string, fn func() (interface{}, error)) (interface{}, error) {
value, err, _ := doer(key, fn)
return value, err
}

// makeDoer returns a function that executes and returns the results of the given function
func makeDoer(cache *sync.Map) func(string, func() (interface{}, error)) (interface{}, error, bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I know I suggested copying the same PR. But realized that since this is wasilibs we shouldn't use any of these sync mechanisms - there are no threads (maybe in the future but I'd rather deal with it when it happens). So it'd be good to simplify the implementation and tests to just use a normal map and no other libraries.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same thought about the sync package but then realise people could
still want to use wasilibs with go, and hence the sync isn't? Still we can
remove it for now until someone requests for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's probably fine. I have bad memories of a sync pool not actually pooling for some time in TinyGo and have some caution towards that package but if sync.Map is mostly a normal map in practice it should be ok

return func(key string, fn func() (interface{}, error)) (interface{}, error, bool) {
// Check cache
value, found := cache.Load(key)
if found {
return value, nil, true
}

data, err := fn()
if err == nil {
cache.Store(key, data)
}

return data, err, false
}
}
167 changes: 167 additions & 0 deletions internal/memoize/nosync_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
// Copyright 2023 Juan Pablo Tosso and the OWASP Coraza contributors
// SPDX-License-Identifier: Apache-2.0

//go:build tinygo && memoize_builders

// https://github.com/kofalt/go-memoize/blob/master/memoize.go

package memoize

import (
"errors"
"sync"
"testing"
)

func TestDo(t *testing.T) {
expensiveCalls := 0

// Function tracks how many times its been called
expensive := func() (interface{}, error) {
expensiveCalls++
return expensiveCalls, nil
}

// First call SHOULD NOT be cached
result, err := Do("key1", expensive)
if err != nil {
t.Fatalf("unexpected error: %s", err.Error())
}

if want, have := 1, result.(int); want != have {
t.Fatalf("unexpected value, want %d, have %d", want, have)
}

// Second call on same key SHOULD be cached
result, err = Do("key1", expensive)
if err != nil {
t.Fatalf("unexpected error: %s", err.Error())
}

if want, have := 1, result.(int); want != have {
t.Fatalf("unexpected value, want %d, have %d", want, have)
}

// First call on a new key SHOULD NOT be cached
result, err = Do("key2", expensive)
if err != nil {
t.Fatalf("unexpected error: %s", err.Error())
}

if want, have := 2, result.(int); want != have {
t.Fatalf("unexpected value, want %d, have %d", want, have)
}
}

func TestSuccessCall(t *testing.T) {
do := makeDoer(new(sync.Map))

expensiveCalls := 0

// Function tracks how many times its been called
expensive := func() (interface{}, error) {
expensiveCalls++
return expensiveCalls, nil
}

// First call SHOULD NOT be cached
result, err, cached := do("key1", expensive)
if err != nil {
t.Fatalf("unexpected error: %s", err.Error())
}

if want, have := 1, result.(int); want != have {
t.Fatalf("unexpected value, want %d, have %d", want, have)
}

if want, have := false, cached; want != have {
t.Fatalf("unexpected caching, want %t, have %t", want, have)
}

// Second call on same key SHOULD be cached
result, err, cached = do("key1", expensive)
if err != nil {
t.Fatalf("unexpected error: %s", err.Error())
}

if want, have := 1, result.(int); want != have {
t.Fatalf("unexpected value, want %d, have %d", want, have)
}

if want, have := true, cached; want != have {
t.Fatalf("unexpected caching, want %t, have %t", want, have)
}

// First call on a new key SHOULD NOT be cached
result, err, cached = do("key2", expensive)
if err != nil {
t.Fatalf("unexpected error: %s", err.Error())
}

if want, have := 2, result.(int); want != have {
t.Fatalf("unexpected value, want %d, have %d", want, have)
}

if want, have := false, cached; want != have {
t.Fatalf("unexpected caching, want %t, have %t", want, have)
}
}

func TestFailedCall(t *testing.T) {
do := makeDoer(new(sync.Map))

calls := 0

// This function will fail IFF it has not been called before.
twoForTheMoney := func() (interface{}, error) {
calls++

if calls == 1 {
return calls, errors.New("Try again")
} else {
return calls, nil
}
}

// First call should fail, and not be cached
result, err, cached := do("key1", twoForTheMoney)
if err == nil {
t.Fatalf("expected error")
}

if want, have := 1, result.(int); want != have {
t.Fatalf("unexpected value, want %d, have %d", want, have)
}

if want, have := false, cached; want != have {
t.Fatalf("unexpected caching, want %t, have %t", want, have)
}

// Second call should succeed, and not be cached
result, err, cached = do("key1", twoForTheMoney)
if err != nil {
t.Fatalf("unexpected error: %s", err.Error())
}

if want, have := 2, result.(int); want != have {
t.Fatalf("unexpected value, want %d, have %d", want, have)
}

if want, have := false, cached; want != have {
t.Fatalf("unexpected caching, want %t, have %t", want, have)
}

// Third call should succeed, and be cached
result, err, cached = do("key1", twoForTheMoney)
if err != nil {
t.Fatalf("unexpected error: %s", err.Error())
}

if want, have := 2, result.(int); want != have {
t.Fatalf("unexpected value, want %d, have %d", want, have)
}

if want, have := true, cached; want != have {
t.Fatalf("unexpected caching, want %t, have %t", want, have)
}
}
47 changes: 47 additions & 0 deletions internal/memoize/sync.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
// Copyright 2023 Juan Pablo Tosso and the OWASP Coraza contributors
// SPDX-License-Identifier: Apache-2.0

//go:build !tinygo && memoize_builders

// https://github.com/kofalt/go-memoize/blob/master/memoize.go

package memoize

import (
"sync"

"golang.org/x/sync/singleflight"
)

var doer = makeDoer(new(sync.Map), new(singleflight.Group))

// Do executes and returns the results of the given function, unless there was a cached
// value of the same key. Only one execution is in-flight for a given key at a time.
// The boolean return value indicates whether v was previously stored.
func Do(key string, fn func() (interface{}, error)) (interface{}, error) {
value, err, _ := doer(key, fn)
return value, err
}

// makeDoer returns a function that executes and returns the results of the given function
func makeDoer(cache *sync.Map, group *singleflight.Group) func(string, func() (interface{}, error)) (interface{}, error, bool) {
return func(key string, fn func() (interface{}, error)) (interface{}, error, bool) {
// Check cache
value, found := cache.Load(key)
if found {
return value, nil, true
}

// Combine memoized function with a cache store
value, err, _ := group.Do(key, func() (interface{}, error) {
data, innerErr := fn()
if innerErr == nil {
cache.Store(key, data)
}

return data, innerErr
})

return value, err, false
}
}
Loading