Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DownstreamStatus is 0 in v3 when server-sent event response is aborted from client #10807

Open
2 tasks done
dj-shin opened this issue Jun 12, 2024 · 7 comments · May be fixed by #10819
Open
2 tasks done

DownstreamStatus is 0 in v3 when server-sent event response is aborted from client #10807

dj-shin opened this issue Jun 12, 2024 · 7 comments · May be fixed by #10819
Labels
contributor/wanted Participation from an external contributor is highly requested kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed.

Comments

@dj-shin
Copy link

dj-shin commented Jun 12, 2024

Welcome!

  • Yes, I've searched similar issues on GitHub and didn't find any.
  • Yes, I've searched similar issues on the Traefik community forum and didn't find any.

What did you do?

In Traefik v2.11, when client aborts server-sent events response stream, DownstreamStatus was 200 in the access log.
After upgrading to v3.0 the response itself is working as same as before, but in the access log DownstreamStatus is now logged as 0.

Reproducible docker compose setup

What did you see instead?

v2.11 access log:

{
  "@timestamp": "2024-06-12T07:35:16.316813044Z",
  "ClientUsername": "-",
  "DownstreamContentSize": 4493,
  "DownstreamStatus": 200,
  "Duration": 694375616,
  "GzipRatio": 0,
  "OriginContentSize": 0,
  "OriginDuration": 0,
  "OriginStatus": 0,
  "Overhead": 694375616,
  "RequestContentSize": 438,
  "RequestCount": 543,
  "RequestMethod": "POST",
  "RequestPath": "/v1/chat/completions",
  "RequestPort": "-",
  "RequestProtocol": "HTTP/1.1",
  "RequestScheme": "http",
  "RetryAttempts": 0,
  "StartLocal": "2024-06-12T07:35:15.622296575Z",
  "StartUTC": "2024-06-12T07:35:15.622296575Z",
  "downstream_Access-Control-Allow-Credentials": "REDACTED",
  "downstream_Content-Type": "REDACTED",
  "downstream_Date": "REDACTED",
  "downstream_Server": "REDACTED",
  "downstream_Vary": "REDACTED",
  "entryPointName": "inference-server",
  "level": "info",
  "msg": "",
  "request_Accept": "REDACTED",
  "request_Authorization": "REDACTED",
  "request_Content-Length": "REDACTED",
  "request_Content-Type": "REDACTED",
  "request_User-Agent": "REDACTED",
  "request_X-Amzn-Trace-Id": "REDACTED",
  "request_X-Forwarded-Host": "REDACTED",
  "request_X-Forwarded-Port": "REDACTED",
  "request_X-Forwarded-Proto": "REDACTED",
  "request_X-Forwarded-Server": "REDACTED",
  "request_X-Real-Ip": "REDACTED",
  "request_X-Request-Id": "da53fa8d71fe4953928e12468f18ffd8",
  "time": "2024-06-12T07:35:16Z"
}

v3.0.2 access log:

{
  "@timestamp": "2024-06-12T07:48:16.096445637Z",
  "DownstreamContentSize": 0,
  "DownstreamStatus": 0,
  "Duration": 343245914,
  "OriginContentSize": 0,
  "OriginDuration": 0,
  "OriginStatus": 0,
  "Overhead": 343245914,
  "RequestContentSize": 0,
  "RequestCount": 2,
  "RequestMethod": "POST",
  "RequestPath": "/v1/chat/completions",
  "RequestPort": "-",
  "RequestProtocol": "HTTP/1.1",
  "RequestScheme": "http",
  "RetryAttempts": 0,
  "StartLocal": "2024-06-12T07:48:15.753032977Z",
  "StartUTC": "2024-06-12T07:48:15.753032977Z",
  "entryPointName": "inference-server",
  "level": "info",
  "msg": "",
  "request_Accept": "REDACTED",
  "request_Authorization": "REDACTED",
  "request_Content-Length": "REDACTED",
  "request_Content-Type": "REDACTED",
  "request_User-Agent": "REDACTED",
  "request_X-Amzn-Trace-Id": "REDACTED",
  "request_X-Forwarded-Host": "REDACTED",
  "request_X-Forwarded-Port": "REDACTED",
  "request_X-Forwarded-Proto": "REDACTED",
  "request_X-Forwarded-Server": "REDACTED",
  "request_X-Real-Ip": "REDACTED",
  "request_X-Request-Id": "e37e4460614b428cbf4c1d474f1bdc5c",
  "time": "2024-06-12T07:48:16Z"
}

What version of Traefik are you using?

Tested in v3.0.1 & v3.0.2 and both behaviors where same.
v2.11 works fine.

What is your environment & configuration?

apiVersion: v1
data:
  traefik.yaml: |
    providers:
      file:
        directory: /etc/traefik/
        watch: true
    entryPoints:
      traefik:
        address: :8080
      inference-server:
        address: :8090
        transport:
          lifeCycle:
            graceTimeOut: 3600s
    api:
      insecure: true
      dashboard: true
      debug: true
    metrics:
      prometheus:
        entryPoint: traefik
    accessLog:
      filePath: "/logs/access.jsonl"
      format: json
      fields:
        defaultMode: keep
        headers:
          defaultMode: redact
          names:
            X-Request-Id: keep
    ping:
      entryPoint: "inference-server"
    log:
      level: DEBUG
kind: ConfigMap

Add more configuration information here.

If applicable, please paste the log output in DEBUG level

No response

@rtribotte
Copy link
Member

Hello @dj-shin,

Thanks for reporting this!

Could you please also share a minimal dynamic configuration to reproduce the issue?

@dj-shin
Copy link
Author

dj-shin commented Jun 12, 2024

Hello @rtribotte, thank you for the fast check!

Here's our dynamic config:

http:
  serversTransports:
    reuseConnection:
      maxIdleConnsPerHost: 5
      forwardingTimeouts:
        idleConnTimeout: 600s
  middlewares:
    CORS-Header:
      headers:
        accessControlAllowMethods:
          - GET
          - OPTIONS
          - PUT
          - POST
        accessControlAllowHeaders:
          - Authorization
          - Content-Type
        accessControlAllowOriginListRegex:
          - ...
        accessControlAllowCredentials: true
        accessControlMaxAge: 100
        addVaryHeader: true
  routers:
    router-test:
      entryPoints:
        - inference-server
      middlewares:
        - CORS-Header
        - auths
      service: test-cluster
      rule: PathPrefix(`/test`)
    router-inference:
      entryPoints:
      - inference-server
      middlewares:
      - CORS-Header
      - auths
      service: inference-cluster
      rule: PathPrefix(`/`)
  services:
    inference-cluster:
      loadBalancer:
        healthCheck:
          path: /ping
          interval: 15s
          timeout: 10s
        servers:
          - url: ...
        serversTransport: reuseConnection
    test-cluster:
      loadBalancer:
        servers:
        - url: ...
        serversTransport: reuseConnection

@nmengin
Copy link
Contributor

nmengin commented Jun 13, 2024

Hey @dj-shin ,

Thanks for the additional information.
If any community member can help us to move forward on it, we would love the help.

@nmengin nmengin added kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. contributor/wanted Participation from an external contributor is highly requested and removed status/0-needs-triage contributor/waiting-for-feedback labels Jun 13, 2024
@hood
Copy link
Contributor

hood commented Jun 17, 2024

I’d love to help here, but I guess a minimal repro with a server/client setup similar to OP’s would go a long way. @dj-shin if possible, could you provide a minimal dockerised reproduction of your setup?

@dj-shin
Copy link
Author

dj-shin commented Jun 17, 2024

Thank you for your attention, @hood !
I've created a docker compose setup to reproduce my issue.

I've also tested in this setup too that v2.11 logs different output

@hood
Copy link
Contributor

hood commented Jun 17, 2024

@nmengin Just reproduced this in my environment.
Looks like, at least when streaming a SSE response, we never get past here, but the status gets set here , so the logDataTable getting passed to the logger from here only has the initial default request data and that’s why, when aborting the connection from the client-side, we’re seeing the initial default value getting logged for DownstreamStatus.status.

Looks like moving the deferred closing function declaration right before the serveHTTP invocation (and so after capt is available for being consumed inside its scope) and passing it capt, in order to be able to capture all the request information it needs to log for the aborted request, solves the issue:

// `capt` initialised above... 

defer func(capt capture.Capture) {
  if h.config.BufferingSize > 0 {
    h.logHandlerChan <- handlerParams{
      logDataTable: logDataTable,
    }
      
     return
  }

  logDataTable.DownstreamResponse.status = capt.StatusCode()
  logDataTable.DownstreamResponse.size = capt.ResponseSize()
  logDataTable.Request.size = capt.RequestSize()

  h.logTheRoundTrip(logDataTable) // <- this now logs updated request data.
}(capt) // <- with `capt` initialised above, we can now pass it to our deferred fn.

next.ServeHTTP(rw, reqWithDataTable)

I’m not familiar with the codebase, so while I've run the tests and made sure nothing breaks (integration tests are not passing locally, but I guess it’s some misconfiguration on my side), I’d like to wait for a green light from you or other maintainers to make sure this is a decent enough solution to at least use as a starting point for a PR. If you confirm you’re ok with this, I’ll submit the PR and then we can refine it from there I’ve created a PR to use a starting point, then we can take it from there .

@dj-shin Could you please try this branch and confirm that it solves the issue you reported?

@dj-shin
Copy link
Author

dj-shin commented Jun 18, 2024

Thank you for your work, @hood !
I've tested your branch locally and checked that DownstreamStatus and DownstreamContentSize is logged properly.

@hood hood linked a pull request Jun 18, 2024 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor/wanted Participation from an external contributor is highly requested kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants