diff --git a/linkerd.io/content/2.15/common-errors/_index.md b/linkerd.io/content/2.15/common-errors/_index.md new file mode 100644 index 0000000000..d635b5ef95 --- /dev/null +++ b/linkerd.io/content/2.15/common-errors/_index.md @@ -0,0 +1,21 @@ ++++ +title = "Common Errors" +weight = 10 +[sitemap] + priority = 1.0 ++++ + +Linkerd is generally robust, but things can always go wrong! You'll find +information here about the most common things that cause people trouble. + +## When in Doubt, Start With `linkerd check` + +Whenever you see anything that looks unusual about your mesh, **always** start +with `linkerd check`. It will check a long series of things that have caused +trouble for others and make sure that your configuration is sane, and it will +point you to help for any problems it finds. It's hard to overstate how useful +this command is. + +## Common Errors + +{{% sectiontoc "common-errors" %}} diff --git a/linkerd.io/content/2.15/common-errors/failfast.md b/linkerd.io/content/2.15/common-errors/failfast.md new file mode 100644 index 0000000000..5cd78c354e --- /dev/null +++ b/linkerd.io/content/2.15/common-errors/failfast.md @@ -0,0 +1,18 @@ ++++ +title = "Failfast" +description = "Failfast means that no endpoints are available." ++++ + +If Linkerd reports that a given service is in the _failfast_ state, it +means that the proxy has determined that there are no available endpoints +for that service. In this situation there's no point in the proxy trying +to actually make a connection to the service - it already knows that it +can't talk to it - so it reports that the service is in failfast and +immediately returns an error from the proxy. + +The error will be either a 503 or a 504; see below for more information, +but if you already know that the service is in failfast because you saw +it in the logs, that's the important part. + +To get out of failfast, some endpoints for the service have to +become available. diff --git a/linkerd.io/content/2.15/common-errors/http-502.md b/linkerd.io/content/2.15/common-errors/http-502.md new file mode 100644 index 0000000000..7205d049a1 --- /dev/null +++ b/linkerd.io/content/2.15/common-errors/http-502.md @@ -0,0 +1,11 @@ ++++ +title = "HTTP 502 Errors" +description = "HTTP 502 means connection errors between proxies." ++++ + +The Linkerd proxy will return a 502 error for connection errors between +proxies. Unfortunately it's fairly common to see an uptick in 502s when +first meshing a workload that hasn't previously been used with a mesh, +because the mesh surfaces errors that were previously invisible! + +There's actually a whole page on [debugging 502s](../../tasks/debugging-502s/). diff --git a/linkerd.io/content/2.15/common-errors/http-503-504.md b/linkerd.io/content/2.15/common-errors/http-503-504.md new file mode 100644 index 0000000000..a8777413af --- /dev/null +++ b/linkerd.io/content/2.15/common-errors/http-503-504.md @@ -0,0 +1,27 @@ ++++ +title = "HTTP 503 and 504 Errors" +description = "HTTP 503 and 504 mean overloaded workloads." ++++ + +503s and 504s show up when a Linkerd proxy is trying to make so many +requests to a workload that it gets overwhelmed. + +When the workload next to a proxy makes a request, the proxy adds it +to an internal dispatch queue. When things are going smoothly, the +request is pulled from the queue and dispatched almost immediately. +If the queue gets too long, though (which can generally happen only +if the called service is slow to respond), the proxy will go into +_load-shedding_, where any new request gets an immediate 503. The +proxy can only get _out_ of load-shedding when the queue shrinks. + +Failfast also plays a role here: if the proxy puts a service into +failfast while there are requests in the dispatch queue, all the +requests in the dispatch queue get an immediate 504 before the +proxy goes into load-shedding. + +To get out of failfast, some endpoints for the service have to +become available. + +To get out of load-shedding, the dispatch queue has to start +emptying, which implies that the service has to get more capacity +to process requests or that the incoming request rate has to drop. diff --git a/linkerd.io/content/2.15/common-errors/protocol-detection.md b/linkerd.io/content/2.15/common-errors/protocol-detection.md new file mode 100644 index 0000000000..515b065515 --- /dev/null +++ b/linkerd.io/content/2.15/common-errors/protocol-detection.md @@ -0,0 +1,35 @@ ++++ +title = "Protocol Detection Errors" +description = "Protocol detection errors indicate that Linkerd doesn't understand the protocol in use." ++++ + +Linkerd is capable of proxying all TCP traffic, including TLS connections, +WebSockets, and HTTP tunneling. In most cases where the client speaks first +when a new connection is made, Linkerd can detect the protocol in use, +allowing it to perform per-request routing and metrics. + +If your proxy logs contain messages like `protocol detection timed out after +10s`, or you're experiencing 10-second delays when establishing connections, +you're probably running a situation where Linkerd cannot detect the protocol. +This is most common for protocols where the server speaks first, and the +client is waiting for information from the server. It may also occur with +non-HTTP protocols for which Linkerd doesn't yet understand the wire format of +a request. + +You'll need to understand exactly what the situation is to fix this: + +- A server-speaks-first protocol will probably need to be configured as a + `skip` or `opaque` port, as described in the [protocol detection + documentation](../../features/protocol-detection/#configuring-protocol-detection). + +- If you're seeing transient protocol detection timeouts, this is more likely + to indicate a misbehaving workload. + +- If you know the protocol is client-speaks-first but you're getting + consistent protocol detection timeouts, you'll probably need to fall back on + a `skip` or `opaque` port. + +Note that marking ports as `skip` or `opaque` has ramifications beyond +protocol detection timeouts; see the [protocol detection +documentation](../../features/protocol-detection/#configuring-protocol-detection) +for more information.