-
Notifications
You must be signed in to change notification settings - Fork 210
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' into travis/releases
- Loading branch information
Showing
5 changed files
with
112 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
+++ | ||
title = "Common Errors" | ||
weight = 10 | ||
[sitemap] | ||
priority = 1.0 | ||
+++ | ||
|
||
Linkerd is generally robust, but things can always go wrong! You'll find | ||
information here about the most common things that cause people trouble. | ||
|
||
## When in Doubt, Start With `linkerd check` | ||
|
||
Whenever you see anything that looks unusual about your mesh, **always** start | ||
with `linkerd check`. It will check a long series of things that have caused | ||
trouble for others and make sure that your configuration is sane, and it will | ||
point you to help for any problems it finds. It's hard to overstate how useful | ||
this command is. | ||
|
||
## Common Errors | ||
|
||
{{% sectiontoc "common-errors" %}} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
+++ | ||
title = "Failfast" | ||
description = "Failfast means that no endpoints are available." | ||
+++ | ||
|
||
If Linkerd reports that a given service is in the _failfast_ state, it | ||
means that the proxy has determined that there are no available endpoints | ||
for that service. In this situation there's no point in the proxy trying | ||
to actually make a connection to the service - it already knows that it | ||
can't talk to it - so it reports that the service is in failfast and | ||
immediately returns an error from the proxy. | ||
|
||
The error will be either a 503 or a 504; see below for more information, | ||
but if you already know that the service is in failfast because you saw | ||
it in the logs, that's the important part. | ||
|
||
To get out of failfast, some endpoints for the service have to | ||
become available. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
+++ | ||
title = "HTTP 502 Errors" | ||
description = "HTTP 502 means connection errors between proxies." | ||
+++ | ||
|
||
The Linkerd proxy will return a 502 error for connection errors between | ||
proxies. Unfortunately it's fairly common to see an uptick in 502s when | ||
first meshing a workload that hasn't previously been used with a mesh, | ||
because the mesh surfaces errors that were previously invisible! | ||
|
||
There's actually a whole page on [debugging 502s](../../tasks/debugging-502s/). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
+++ | ||
title = "HTTP 503 and 504 Errors" | ||
description = "HTTP 503 and 504 mean overloaded workloads." | ||
+++ | ||
|
||
503s and 504s show up when a Linkerd proxy is trying to make so many | ||
requests to a workload that it gets overwhelmed. | ||
|
||
When the workload next to a proxy makes a request, the proxy adds it | ||
to an internal dispatch queue. When things are going smoothly, the | ||
request is pulled from the queue and dispatched almost immediately. | ||
If the queue gets too long, though (which can generally happen only | ||
if the called service is slow to respond), the proxy will go into | ||
_load-shedding_, where any new request gets an immediate 503. The | ||
proxy can only get _out_ of load-shedding when the queue shrinks. | ||
|
||
Failfast also plays a role here: if the proxy puts a service into | ||
failfast while there are requests in the dispatch queue, all the | ||
requests in the dispatch queue get an immediate 504 before the | ||
proxy goes into load-shedding. | ||
|
||
To get out of failfast, some endpoints for the service have to | ||
become available. | ||
|
||
To get out of load-shedding, the dispatch queue has to start | ||
emptying, which implies that the service has to get more capacity | ||
to process requests or that the incoming request rate has to drop. |
35 changes: 35 additions & 0 deletions
35
linkerd.io/content/2.15/common-errors/protocol-detection.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
+++ | ||
title = "Protocol Detection Errors" | ||
description = "Protocol detection errors indicate that Linkerd doesn't understand the protocol in use." | ||
+++ | ||
|
||
Linkerd is capable of proxying all TCP traffic, including TLS connections, | ||
WebSockets, and HTTP tunneling. In most cases where the client speaks first | ||
when a new connection is made, Linkerd can detect the protocol in use, | ||
allowing it to perform per-request routing and metrics. | ||
|
||
If your proxy logs contain messages like `protocol detection timed out after | ||
10s`, or you're experiencing 10-second delays when establishing connections, | ||
you're probably running a situation where Linkerd cannot detect the protocol. | ||
This is most common for protocols where the server speaks first, and the | ||
client is waiting for information from the server. It may also occur with | ||
non-HTTP protocols for which Linkerd doesn't yet understand the wire format of | ||
a request. | ||
|
||
You'll need to understand exactly what the situation is to fix this: | ||
|
||
- A server-speaks-first protocol will probably need to be configured as a | ||
`skip` or `opaque` port, as described in the [protocol detection | ||
documentation](../../features/protocol-detection/#configuring-protocol-detection). | ||
|
||
- If you're seeing transient protocol detection timeouts, this is more likely | ||
to indicate a misbehaving workload. | ||
|
||
- If you know the protocol is client-speaks-first but you're getting | ||
consistent protocol detection timeouts, you'll probably need to fall back on | ||
a `skip` or `opaque` port. | ||
|
||
Note that marking ports as `skip` or `opaque` has ramifications beyond | ||
protocol detection timeouts; see the [protocol detection | ||
documentation](../../features/protocol-detection/#configuring-protocol-detection) | ||
for more information. |