Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NOSCRIPT error is send to Limiter but Eval is automatically run anyway #3203

Open
muhlemmer opened this issue Dec 6, 2024 · 0 comments
Open

Comments

@muhlemmer
Copy link

Expected Behavior

When Script.Run is called and doesn't return an error, don't pass an error to Limiter.ReportResult.

Current Behavior

When we use Script.Run() it does what it should following the documentation:

Run optimistically uses EVALSHA to run the script. If script does not exist it is retried using EVAL.

That means is EVALSHA gets the NOSCRIPT error from redis, it retries. If the retry works, the caller is not receiving an error. However, Limiter.ReportResult does seem to get the initial error.

Possible Solution

When an error is non-persistent and automatically retried, do not send it to the limiter. Instead, send only the final result.

Steps to Reproduce

We implement a Limiter using sony's Circuit Breaker pattern here:

https://github.com/zitadel/zitadel/blob/77cd430b3a67cd95ad9deec1e5b44a17638def06/internal/cache/connector/redis/circuit_breaker.go#L1-L91

Context (Environment)

Simple Redis, versions 7.2, 7.4 and 8.0 with go-redis/v9 and Go 1.22.

Detailed Description

As a cache we don't want errors to interrupt business flow. Instead, we want only to log errors. We prefer to log errors where happen. For example, right after we call Script.Run(). We also have the goal to move to Go's standard slog package using context.Context aware logging. It is therefore not feasible to log errors in the Limiter.ReportResult function.

NOSCRIPT and perhaps other errors are handled internally by go-redis. When the error is not returned to the caller it is impossible to correlate it to some request if it does get send to Limiter.ReportResult. It is also a pain to debug, the application never received an error but the limiter / Circuit breaker did somehow trip.

The first errors we would get are literally:

time="2024-12-06T18:29:09+02:00" level=warning msg="circuit breaker state change" caller="/home/tim/Repositories/zitadel/zitadel/internal/cache/connector/redis/circuit_breaker.go:64" from=closed name="redis cache" to=open
time=2024-12-06T18:29:09.524+02:00 level=ERROR source=/home/tim/Repositories/zitadel/zitadel/internal/cache/connector/redis/redis.go:110 msg="redis cache get" index=orgIndexByID key=296967579065463350 err="circuit breaker is open"

Possible Implementation workaround

We will include a check for NOSCRIPT prefix. However, it does make me worry there might be other, similar and undocumented errors which trigger an internal retry without the caller being aware.

func (l *limiter) ReportResult(err error) {
	done := <-l.inflight
	done(err == nil ||
		errors.Is(err, redis.Nil) ||
		errors.Is(err, context.Canceled) ||
		redis.HasErrorPrefix(err, "NOSCRIPT"))
}
muhlemmer added a commit to zitadel/zitadel that referenced this issue Dec 9, 2024
# Which Problems Are Solved

When Zitadel starts the first time with a configured Redis cache, the
circuit break would open on the first requests, with no explanatory
error and only log-lines explaining the state of the Circuit breaker.

Using a debugger, `NOSCRIPT No matching script. Please use EVAL.` was
found the be passed to `Limiter.ReportResult`. This error is actually
retried by go-redis after a
[`Script.Run`](https://pkg.go.dev/github.com/redis/go-redis/[email protected]#Script.Run):

> Run optimistically uses EVALSHA to run the script. If script does not
exist it is retried using EVAL.

# How the Problems Are Solved

Add the `NOSCRIPT` error prefix to the whitelist.

# Additional Changes

- none

# Additional Context

- Introduced in: #8890
- Workaround for: redis/go-redis#3203
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant