Skip to content

Commit

Permalink
Have a "common errors" page.
Browse files Browse the repository at this point in the history
Signed-off-by: Flynn <[email protected]>
  • Loading branch information
kflynn committed Mar 29, 2024
1 parent f033bd2 commit d3e640d
Show file tree
Hide file tree
Showing 6 changed files with 119 additions and 0 deletions.
23 changes: 23 additions & 0 deletions linkerd.io/content/2.15/common-errors/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
+++
title = "Common Errors"
weight = 3
[sitemap]
priority = 1.0
+++

Linkerd is generally robust, but things can always go wrong! You'll find
information here about the most common things that cause people trouble.

## When in Doubt, Start With `linkerd check`

Whenever you see anything that looks unusual about your mesh, **always** start
with `linkerd check`. It will check a long series of things that have caused
trouble for others and make sure that your configuration is sane, and it will
point you to help for any problems it finds. It's hard to overstate how useful
this command is.

## Common Errors

{{% sectiontoc "common-errors" %}}


18 changes: 18 additions & 0 deletions linkerd.io/content/2.15/common-errors/failfast.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
+++
title = "Failfast"
description = "Failfast means that no endpoints are available."
+++

If Linkerd reports that a given service is in the _failfast_ state, it
means that the proxy has determined that there are no available endpoints
for that service. In this situation there's no point in the proxy trying
to actually make a connection to the service - it already knows that it
can't talk to it - so it reports that the service is in failfast and
immediately returns an error from the proxy.

The error will be either a 503 or a 504; see below for more information,
but if you already know that the service is in failfast because you saw
it in the logs, that's the important part.

To get out of failfast, some endpoints for the service have to
become available.
11 changes: 11 additions & 0 deletions linkerd.io/content/2.15/common-errors/http-502.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
+++
title = "HTTP 502 Errors"
description = "HTTP 502 means connection errors between proxies."
+++

The Linkerd proxy will return a 502 error for connection errors between
proxies. Unfortunately it's fairly common to see an uptick in 502s when
first meshing a workload that hasn't previously been used with a mesh,
because the mesh surfaces errors that were previously invisible!

There's actually a whole page on [debugging 502s](../../tasks/debugging-502s/).
27 changes: 27 additions & 0 deletions linkerd.io/content/2.15/common-errors/http-503-504.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
+++
title = "HTTP 503 and 504 Errors"
description = "HTTP 503 and 504 mean overloaded workloads."
+++

503s and 504s show up when a Linkerd proxy is trying to make so many
requests to a workload that it gets overwhelmed.

When the workload next to a proxy makes a request, the proxy adds it
to an internal dispatch queue. When things are going smoothly, the
request is pulled from the queue and dispatched almost immediately.
If the queue gets too long, though (which can generally happen only
if the called service is slow to respond), the proxy will go into
_load-shedding_, where any new request gets an immediate 503. The
proxy can only get _out_ of load-shedding when the queue shrinks.

Failfast also plays a role here: if the proxy puts a service into
failfast while there are requests in the dispatch queue, all the
requests in the dispatch queue get an immediate 504 before the
proxy goes into load-shedding.

To get out of failfast, some endpoints for the service have to
become available.

To get out of load-shedding, the dispatch queue has to start
emptying, which implies that the service has to get more capacity
to process requests or that the incoming request rate has to drop.
35 changes: 35 additions & 0 deletions linkerd.io/content/2.15/common-errors/protocol-detection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
+++
title = "Protocol Detection Errors"
description = "Protocol detection errors indicate that Linkerd doesn't understand the protocol in use."
+++

Linkerd is capable of proxying all TCP traffic, including TLS connections,
WebSockets, and HTTP tunneling. In most cases where the client speaks first
when a new connection is made, Linkerd can detect the protocol in use,
allowing it to perform per-request routing and metrics.

If your proxy logs contain messages like `protocol detection timed out after
10s`, or you're experiencing 10-second delays when establishing connections,
you're probably running a situation where Linkerd cannot detect the protocol.
This is most common for protocols where the server speaks first, and the
client is waiting for information from the server. It may also occur with
non-HTTP protocols for which Linkerd doesn't yet understand the wire format of
a request.

You'll need to understand exactly what the situation is to fix this:

- A server-speaks-first protocol will probably need to be configured as a
`skip` or `opaque` port, as described in the [protocol detection
documentation](../../features/protocol-detection/#configuring-protocol-detection).

- If you're seeing transient protocol detection timeouts, this is more likely
to indicate a misbehaving workload.

- If you know the protocol is client-speaks-first but you're getting
consistent protocol detection timeouts, you'll probably need to fall back on
a `skip` or `opaque` port.

Note that marking ports as `skip` or `opaque` has ramifications beyond
protocol detection timeouts; see the [protocol detection
documentation](../../features/protocol-detection/#configuring-protocol-detection)
for more information.
5 changes: 5 additions & 0 deletions linkerd.io/layouts/partials/sidebar-2.html
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,11 @@
Frequently Asked Questions
</a>
</li>
<li class="menu-list-item {{ if eq $currentPage.RelPermalink "/common-errors/" }}is-active{{ end }}">
<a href="/common-errors/">
Common Errors
</a>
</li>
<li class="menu-list-item {{ if eq $currentPage.RelPermalink "/releases/" }}is-active{{ end }}">
<a href="/releases/">
Releases and Versions
Expand Down

0 comments on commit d3e640d

Please sign in to comment.