Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

send() failed (32: Broken pipe) when performing HTTP POST to Immich #77

Open
Matthias-vdE opened this issue Sep 27, 2024 · 50 comments
Open

Comments

@Matthias-vdE
Copy link

When running Immich (https://github.com/immich-app/immich) behind NPM(plus) and enabling Crowdsec/Appsec, it is not possible to upload files to the server via HTTP POST:

2024-09-27T11:33:13.859502771Z 2024/09/27 13:33:13 [error] 32855#32855: *10529 send() failed (32: Broken pipe), client: MY_IPADDRESS, server: immich.mydomain.org, request: "POST /api/assets HTTP/1.1", host: "immich.mydomain.org"

2024-09-27T11:33:13.859545312Z 2024/09/27 13:33:13 [error] 32855#32855: *10529 [lua] crowdsec.lua:578: AppSecCheck(): Fallback because of err: broken pipe, client: MY_IPADDRESS, server: immich.mydomain.org, request: "POST /api/assets HTTP/1.1", host: "immich.mydomain.org"

2024-09-27T11:33:13.859553917Z 2024/09/27 13:33:13 [error] 32855#32855: *10529 [lua] crowdsec.lua:651: Allow(): AppSec check: broken pipe, client: MY_IPADDRESS, server: immich.mydomain.org, request: "POST /api/assets HTTP/1.1", host: "immich.mydomain.org"

2024-09-27T11:33:13.859564759Z 2024/09/27 13:33:13 [alert] 32855#32855: *10529 [lua] crowdsec.lua:718: Allow(): [Crowdsec] denied 'MY_IPADDRESS' with 'ban' (by appsec), client: MY_IPADDRESS, server: immich.mydomain.org, request: "POST /api/assets HTTP/1.1", host: "immich.mydomain.org"

The issue was initially reported at ZoeyVid/NPMplus#1123.

@LaurenceJJones
Copy link
Contributor

LaurenceJJones commented Sep 27, 2024

When running Immich (https://github.com/immich-app/immich) behind NPM(plus) and enabling Crowdsec/Appsec, it is not possible to upload files to the server via HTTP POST:

2024-09-27T11:33:13.859502771Z 2024/09/27 13:33:13 [error] 32855#32855: *10529 send() failed (32: Broken pipe), client: MY_IPADDRESS, server: immich.mydomain.org, request: "POST /api/assets HTTP/1.1", host: "immich.mydomain.org"

2024-09-27T11:33:13.859545312Z 2024/09/27 13:33:13 [error] 32855#32855: *10529 [lua] crowdsec.lua:578: AppSecCheck(): Fallback because of err: broken pipe, client: MY_IPADDRESS, server: immich.mydomain.org, request: "POST /api/assets HTTP/1.1", host: "immich.mydomain.org"

2024-09-27T11:33:13.859553917Z 2024/09/27 13:33:13 [error] 32855#32855: *10529 [lua] crowdsec.lua:651: Allow(): AppSec check: broken pipe, client: MY_IPADDRESS, server: immich.mydomain.org, request: "POST /api/assets HTTP/1.1", host: "immich.mydomain.org"

2024-09-27T11:33:13.859564759Z 2024/09/27 13:33:13 [alert] 32855#32855: *10529 [lua] crowdsec.lua:718: Allow(): [Crowdsec] denied 'MY_IPADDRESS' with 'ban' (by appsec), client: MY_IPADDRESS, server: immich.mydomain.org, request: "POST /api/assets HTTP/1.1", host: "immich.mydomain.org"

The issue was initially reported at ZoeyVid/NPMplus#1123.

Hey 👋🏻

I have setup a similar setup environment and don't see to be experiencing the same issue. I am uploading a 3mb file as per your reddit thread I dont see any errors.

Is CrowdSec / AppSec running locally to NPMPlus (on same host to reduce latency)?

Could you provide the full nginx configuration that is generated by NPMplus as I am using nginx (since its the same code I dont want to spend time configuring NPMPlus since its the same lua code)

server {
    server_name _;

    listen *:80;

    # allow large file uploads
    client_max_body_size 50000M;

    # Set headers
    proxy_set_header Host              $http_host;
    proxy_set_header X-Real-IP         $remote_addr;
    proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;

    # enable websockets: http://nginx.org/en/docs/http/websocket.html
    proxy_http_version 1.1;
    proxy_set_header   Upgrade    $http_upgrade;
    proxy_set_header   Connection "upgrade";
    proxy_redirect     off;

    # set timeout
    proxy_read_timeout 600s;
    proxy_send_timeout 600s;
    send_timeout       600s;

    location / {
        proxy_pass http://127.0.0.1:2283;
    }
}

everything seems to be getting processed:

root@bookworm:/etc/nginx/sites-enabled# cscli metrics show appsec
Appsec Metrics:
╭─────────────────┬───────────┬─────────╮
│ Appsec Engine   │ Processed │ Blocked │
├─────────────────┼───────────┼─────────┤
│ 127.0.0.1:7422/ │ 296       │ -       │
╰─────────────────┴───────────┴─────────

@Matthias-vdE
Copy link
Author

Is CrowdSec / AppSec running locally to NPMPlus (on same host to reduce latency)?

Yup, both Crowdsec and NPMPlus are running in docker containers on the same host. Part of the same docker-compose:

services:
  npmplus:
    container_name: npmplus
    image: zoeyvid/npmplus:latest
    restart: always
    network_mode: host
    volumes:
      - "/opt/npm:/data"
    environment:
      - "TZ=Europe/Brussels" 
      - "NGINX_LOG_NOT_FOUND=true"
      - "LOGROTATE=true" 
      - "LOGROTATIONS=7" 
      - "GOA=true"

  crowdsec:
    container_name: crowdsec
    image: crowdsecurity/crowdsec:latest
    restart: always
    network_mode: bridge
    ports:
      - "127.0.0.1:7422:7422"
      - "127.0.0.1:8080:8080"
    environment:
      - "TZ=Europe/Brussels"
      - "COLLECTIONS=ZoeyVid/npmplus"
      - "LEVEL_FATAL=true"
      - "LEVEL_ERROR=true"
      - "LEVEL_WARN=true"
      - "LEVEL_INFO=false"
      - "LEVEL_DEBUG=false"
      - "LEVEL_TRACE=false"
    volumes:
      - "/opt/crowdsec/conf:/etc/crowdsec"
      - "/opt/crowdsec/data:/var/lib/crowdsec/data"
      - "/opt/npm/nginx:/opt/npm/nginx:ro"
      - "/var/run/docker.sock:/var/run/docker.sock:ro"

  geoipupdate:
    container_name: geoipupdate
    image: maxmindinc/geoipupdate:latest
    restart: always
    network_mode: bridge
    environment:
      - "TZ=Europe/Brussels"
      - "GEOIPUPDATE_EDITION_IDS=GeoLite2-Country GeoLite2-City GeoLite2-ASN"
      - "GEOIPUPDATE_ACCOUNT_ID=my_account_id"
      - "GEOIPUPDATE_LICENSE_KEY=my_license_key"
      - "GEOIPUPDATE_FREQUENCY=24"
    volumes:
      - "/opt/npm/etc/goaccess/geoip:/usr/share/GeoIP"

Immich is running on the same host as well, as a docker container. The config for that one is very standard with little to no customization.

My proxy config in NPMPlus looks like this:

image

image

Nothing in custom locations, and a regular certbot certificate to force enable HTTPS.

My crowdsec.conf file is this:

ENABLED=true
API_URL=http://127.0.0.1:8080
API_KEY=my_API_key
CACHE_EXPIRATION=1
# bounce for all type of remediation that the bouncer can receive from the local API
BOUNCING_ON_TYPE=ban
FALLBACK_REMEDIATION=ban
REQUEST_TIMEOUT=3000
UPDATE_FREQUENCY=10
# live or stream
MODE=live
# exclude the bouncing on those location
EXCLUDE_LOCATION=
#those apply for "ban" action
# /!\ REDIRECT_LOCATION and RET_CODE can't be used together. REDIRECT_LOCATION take priority over RET_CODE
BAN_TEMPLATE_PATH=/data/etc/crowdsec/ban.html
REDIRECT_LOCATION=
RET_CODE=
#those apply for "captcha" action
#valid providers are recaptcha, hcaptcha, turnstile
CAPTCHA_PROVIDER=
# Captcha Secret Key
SECRET_KEY=
# Captcha Site key
SITE_KEY=
CAPTCHA_TEMPLATE_PATH=/data/etc/crowdsec/captcha.html
CAPTCHA_EXPIRATION=3600
#APPSEC_URL=http://127.0.0.1:7422
#APPSEC_FAILURE_ACTION=deny

The two bottom lines are currently commented out to make it work. If I uncomment them, it breaks with the broken pipe error message when uploading anything.

@Zoey2936
Copy link

I don't use Immich myself, but there seems to be an Issue with it, I already have 3 discussions about it with multiple people having the same issues:
ZoeyVid/NPMplus#1168
ZoeyVid/NPMplus#1123
ZoeyVid/NPMplus#1241

the issue is always related to appsec and immich and disabling appsec (or changing APPSEC_FAILURE_ACTION to passthrough) fixed it (modsec also needs to be disabled) — sometimes increasing the timeouts also fixed (at least for Nextcloud which had similar issues), so I increased the default timeouts, but the issue still seems to exist on new installations with the new timeouts.

@MaximumFish
Copy link

MaximumFish commented Nov 22, 2024

To hopefully contribute, I started discussion 1241 that @Zoey2936 linked above. It was suggested I post here answering the below questions so here we go:

Is the deployment local or remote (VPS)? Local

Is the domain being proxied by a CDN like cloudflare? For full transparency I do use a Cloudflare tunnel, but only for external connections. The tunnel isn't involved while on the LAN.

Does it happen on upload like the OP or when you said when immich makes a backup? I am the OP in that thread. I believe the single image upload and backup feature are basically the same thing using the same endpoint. The backup feature just does it automatically and in batches.

This is what happens when I attempt to upload (or backup) a single photo:

2024/11/19 17:20:06 [warn] 80110#80110: *51203 a client request body is buffered to a temporary file /usr/local/nginx/client_body_temp/0000000050 while reading request body, client: 10.42.0.11, server: immich.example.com, request: "POST /api/assets HTTP/1.1", host: "immich.example.com"
2024/11/19 17:20:06 [error] 80110#80110: *51203 send() failed (32: Broken pipe), client: 10.42.0.11, server: immich.example.com, request: "POST /api/assets HTTP/1.1", host: "immich.example.com"
2024/11/19 17:20:06 [error] 80110#80110: *51203 [lua] crowdsec.lua:578: AppSecCheck(): Fallback because of err: broken pipe, client: 10.42.0.11, server: immich.example.com, request: "POST /api/assets HTTP/1.1", host: "immich.example.com"
2024/11/19 17:20:06 [error] 80110#80110: *51203 [lua] crowdsec.lua:651: Allow(): AppSec check: broken pipe, client: 10.42.0.11, server: immich.example.com, request: "POST /api/assets HTTP/1.1", host: "immich.example.com"
2024/11/19 17:20:06 [alert] 80110#80110: *51203 [lua] crowdsec.lua:718: Allow(): [Crowdsec] denied '10.42.0.11' with 'ban' (by appsec), client: 10.42.0.11, server: immich.example.com, request: "POST /api/assets HTTP/1.1", host: "immich.example.com"

(Domain name changed for privacy and yes, I use a non-typical subnet)

Disabling Appsec fixes the issue, albeit with the trade off of reduced security.

Thanks.

@blotus
Copy link
Member

blotus commented Nov 22, 2024

Hey,

Do you know how big is the body of the request when you get this error ?

Currently, the appsec will try to process any body it sees, regardless of the size, which will lead to issues (also tracked here: #71).

I haven't performed any real tests to see where the actual limit is currently, but I guess anything over a few hundred MBs will trigger this error.

Once #80 is merged (it includes a large refactoring of the code, so we have to wait for it), we plan to add additional configuration on how to handle large bodies (allow to set a maximum body size and whether to drop the request or just analyze the headers when it's over the limit).

@MFYDev
Copy link

MFYDev commented Nov 23, 2024

I just met this issue today with my own Ghost blog post editing. appsec will block me from time to time as I was editing posts. No solution found yet but fortunately saw this issue, for post editing I think the body is not huge at all as I just started.

@MaximumFish
Copy link

Do you know how big is the body of the request when you get this error ?

The photos are a couple of MB each and it does one request per photo, so definitely not in the few hundred MB territory.

@yurividal
Copy link

yurividal commented Dec 3, 2024

Having the same issue.
I'm using nginxproxymanager with crowdsec_openresty_bouncer and appsec.

image

Issue happens even when inside the LAN, so no cloudflare on anything else involved.
Appsec is, in theory, setup properly and running fine:

curl -I -X POST localhost:7422/ -i -H 'x-crowdsec-appsec-api-key: {redacted}' -H 'x-crowdsec-appsec-ip: 42.42.42.42' -H 'x-crowdsec-appsec-uri: /test' -H 'x-crowdsec-appsec-host: test.com' -H 'x-crowdsec-appsec-verb: GET'   
HTTP/1.1 200 OK
Date: Tue, 03 Dec 2024 23:04:30 GMT
Content-Length: 36
Content-Type: text/plain; charset=utf-8

what could be causing the broken pipe between the appsec and the proxy?

edit:

Initially, the containers were on different docker networks, and communicating via ports exposed to the host.
I then moved the 2 containers to the same docker network. Now the error is a little different. connection reset by peer, instead of broken pipe.

error 10.10.10.20 -> photos.****.com : Allow(): AppSec check: connection reset by peer - POST /api/assets HTTP/1.1 - 
error 10.10.10.20 -> photos.****.com : AppSecCheck(): Fallback because of err: connection reset by peer - POST /api/assets HTTP/1.1 - 

@LaurenceJJones
Copy link
Contributor

LaurenceJJones commented Jan 17, 2025

Just so it doesnt seem like we are being a PITA about replication steps, I have setup NPMPlus using the steps outlined in the repo.

I created a base immich deployment using immich setup, I then setup immich to be exposed locally via:

ports:
      - '127.0.0.1:2283:2283'

I then setup a local alias called app.debian.local for the immich app

Image

Image

I then proceed to upload an image of my cat which is 3.1mb and these are the logs for NPM and CrowdSec:

npmplus   | 2025/01/17 20:32:48 [warn] 369#369: *1 a client request body is buffered to a temporary file /usr/local/nginx/client_body_temp/0000000001 while reading request body, client: 192.168.121.1, server: app.debian.local, request: "POST /api/assets HTTP/1.1", host: "app.debian.local", referrer: "http://app.debian.local/photos"
crowdsec  | time="2025-01-17T20:33:04Z" level=info msg="172.17.0.1 - [Fri, 17 Jan 2025 20:33:04 GMT] \"GET /v1/decisions?ip=192.168.121.1 HTTP/1.1 200 7.569899ms \"crowdsec-npmplus-bouncer/v1.0.8\" \""

Note this is all done locally not over https as I dont have a spare domain and dont want to spend time spinning up a VPS as the TLS layer shouldnt impact this.

also appsec is live cause a request to .env causes a block

crowdsec  | time="2025-01-17T20:38:34Z" level=info msg="AppSec block: crowdsecurity/vpatch-env-access from 192.168.121.1 (172.17.0.1)"
npmplus   | 2025/01/17 20:38:34 [alert] 369#369: *240 [lua] crowdsec.lua:718: Allow(): [Crowdsec] denied '192.168.121.1' with 'ban' (by appsec), client: 192.168.121.1, server: app.debian.local, request: "GET /.env HTTP/1.1", host: "app.debian.local"

Edit: just to ensure it nothing with TLS, I generated a self signed certificate so http2 is at play cause I think npm doesnt enable it unless their is a certificate and still struggling to replicate.

@yurividal
Copy link

yurividal commented Jan 17, 2025

@LaurenceJJones
Thanks for that. Can you try with these nginx configs? this is what is recommended by the immich team (just add it to your custom nginx config in npm plus:


client_max_body_size 50000M;

# Set headers
proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;

# enable websockets: http://nginx.org/en/docs/http/websocket.html
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_redirect off;

# set timeout
proxy_read_timeout 600s;
proxy_send_timeout 600s;
send_timeout 600s;

I also tried adding this, but no luck:

modsecurity_rules 'SecRequestBodyLimit 1048576';
modsecurity_rules 'SecRequestBodyNoFilesLimit 1048576';

@yurividal
Copy link

One more thing, @LaurenceJJones
Where exactly is this error message being generated?

[lua] crowdsec.lua:651: Allow(): AppSec check: broken pipe, client: 10.42.0.11, server: immich.example.com, request: "POST /api/assets HTTP/1.1", host: "immich.example.com"

Does this mean that nginx lua component is having issues when talking to crowdsec's appsec agent?
What would be the best way to debug this? can the logs be made more verbose?

@Zoey2936
Copy link

client_max_body_size 50000M;
proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";

These are all set by npmplus, setting them twice causes more issues than anything else, also you should not use $http_host, $host is correct and the default used by npmplus

modsecurity_rules 'SecRequestBodyLimit 1048576';
modsecurity_rules 'SecRequestBodyNoFilesLimit 1048576';

you need to set them in the modsec config file, not as nginx directives

@yurividal
Copy link

Thanks @Zoey2936
I just removed all those custom settings, and still, when uploading to immich, i see the error:

2025/01/17 21:55:53 [error] 125180#125180: *228239 send() failed (32: Broken pipe), client: 172.29.0.1, server: photos.mydomain.com, request: "POST /api/assets HTTP/1.1", host: "photos.mydomain.com"
2025/01/17 21:55:53 [error] 125180#125180: *228239 [lua] crowdsec.lua:578: AppSecCheck(): Fallback because of err: broken pipe, client: 172.29.0.1, server: photos.mydomain.com, request: "POST /api/assets HTTP/1.1", host: "photos.mydomain.com"
2025/01/17 21:55:53 [error] 125180#125180: *228239 [lua] crowdsec.lua:651: Allow(): AppSec check: broken pipe, client: 172.29.0.1, server: photos.mydomain.com, request: "POST /api/assets HTTP/1.1", host: "photos.mydomain.com"

@LaurenceJJones
Copy link
Contributor

LaurenceJJones commented Jan 17, 2025

One more thing, @LaurenceJJones Where exactly is this error message being generated?

[lua] crowdsec.lua:651: Allow(): AppSec check: broken pipe, client: 10.42.0.11, server: immich.example.com, request: "POST /api/assets HTTP/1.1", host: "immich.example.com"

Does this mean that nginx lua component is having issues when talking to crowdsec's appsec agent? What would be the best way to debug this? can the logs be made more verbose?

Broken pipe typically means that when the data (most likely the body) is being transmitted to the appsec port the connection is closed before the appsec server responded to the nginx request. Typically the nginx workers tries to pool the connections so I would be surprised if it running out of connecting ports but this depends on how much stuff you are hosting / or the machine is OOMing whilst trying to process the request.

Edit: so check your free RAM / make sure you are not constricting the crowdsec container to limited resources as currently the appsec requests are processed in memory so if you upload a 50mb file it read 50mb into RAM/Heap allocation (this is something we need to allow you to configure the max upload size as @blotus pointed out).

@yurividal
Copy link

Broken pipe typically means that when the data (most likely the body) is being transmitted to the appsec port the connection is closed before the appsec server responded to the nginx request. Typically the nginx workers tries to pool the connections so I would be surprised if it running out of connecting ports but this depends on how much stuff you are hosting / or the machine is OOMing whilst trying to process the request.

Ok, so the error occurs when the nginx worker is sending the data TO appsec. Gotcha.
Thats helpful.
I'm running crowdsec on docker, and the appsec port is mapped from the container to the host

      - 7422:7422

Then, npmplus is trying to reach appsec via hostIP:7422
It obviously works, since appsec works in most cases. But, i wonder if something in that body is breaking the docker network...

When i moved the proxy and crowdsec to the same docker network and pointed to appsec via containername:7422, the error changed from broken pipe to connection reset by peer

error 10.10.10.20 -> photos.****.com : Allow(): AppSec check: connection reset by peer - POST /api/assets HTTP/1.1 - 
error 10.10.10.20 -> photos.****.com : AppSecCheck(): Fallback because of err: connection reset by peer - POST /api/assets HTTP/1.1 - 

@LaurenceJJones
Copy link
Contributor

LaurenceJJones commented Jan 17, 2025

Broken pipe typically means that when the data (most likely the body) is being transmitted to the appsec port the connection is closed before the appsec server responded to the nginx request. Typically the nginx workers tries to pool the connections so I would be surprised if it running out of connecting ports but this depends on how much stuff you are hosting / or the machine is OOMing whilst trying to process the request.

Ok, so the error occurs when the nginx worker is sending the data TO appsec. Gotcha. Thats helpful. I'm running crowdsec on docker, and the appsec port is mapped from the container to the host

      - 7422:7422

Then, npmplus is trying to reach appsec via hostIP:7422 It obviously works, since appsec works in most cases. But, i wonder if something in that body is breaking the docker network...

When i moved the proxy and crowdsec to the same docker network and pointed to appsec via containername:7422, the error changed from broken pipe to connection reset by peer

error 10.10.10.20 -> photos.****.com : Allow(): AppSec check: connection reset by peer - POST /api/assets HTTP/1.1 - 
error 10.10.10.20 -> photos.****.com : AppSecCheck(): Fallback because of err: connection reset by peer - POST /api/assets HTTP/1.1 - 

and with the connection reset, do you see any logs crowdsec side?

also might be useful in this debug session :D if we can enable debug log on crowdsec side so there more to see.

You can do this by setting:

listen_addr: 0.0.0.0:7422
log_level: debug ## Add this here
appsec_config: crowdsecurity/appsec-default
name: appsec
source: appsec
labels:
  type: appsec

@yurividal
Copy link

yurividal commented Jan 17, 2025

and with the connection reset, do you see any logs crowdsec side?

Nothing

also might be useful in this debug session :D if we can enable debug log on crowdsec side so there more to see.

That would be ideal. Do you know exactly how i enable debug on crowdsec appsec?

@LaurenceJJones
Copy link
Contributor

and with the connection reset, do you see any logs crowdsec side?

Nothing

also might be useful in this debug session :D if we can enable debug log on crowdsec side so there more to see.

That would be ideal. Do you know exactly how i enable debug on crowdsec appsec?

updated my comment to add how to enable debug

@yurividal
Copy link

yurividal commented Jan 17, 2025

Output of the crowdsec logs when the broken pipe happens. Nothing really stands out to me...

https://gist.github.com/yurividal/93c11f79806972bb070dc095e44be4e4

@yurividal
Copy link

I does write a LOT of log lines every single time. This entire log i uploaded was from simply taking one picture and opening immich.
Maybe, its the sheer ammount of data that is being sent is overwelming appsec ?

@LaurenceJJones
Copy link
Contributor

LaurenceJJones commented Jan 17, 2025

I does write a LOT of log lines every single time. This entire log i uploaded was from simply taking one picture and opening immich. Maybe, its the sheer ammount of data that is being sent is overwelming appsec ?

Could you try adding these options to the appsec cofnig and see if it improves:

listen_addr: 0.0.0.0:7422
log_level: debug ## Add this here
routines: 2 ## Also add this to add more concurrency
appsec_config: crowdsecurity/appsec-default
name: appsec
source: appsec
labels:
  type: appsec

However, the very odd thing is in your logs, it get the request and it states it returns the response 😕

@yurividal
Copy link

Adding routines: 2 didn't help.

2025/01/17 22:56:18 [warn] 786#786: *1716 a client request body is buffered to a temporary file /usr/local/nginx/client_body_temp/0000000004 while reading request body, client: 172.29.0.1, server: photos.mydomain.com, request: "POST /api/assets HTTP/1.1", host: "photos.mydomain.com"
2025/01/17 22:56:24 [error] 786#786: *1716 send() failed (32: Broken pipe), client: 172.29.0.1, server: photos.mydomain.com, request: "POST /api/assets HTTP/1.1", host: "photos.mydomain.com"
2025/01/17 22:56:24 [error] 786#786: *1716 [lua] crowdsec.lua:578: AppSecCheck(): Fallback because of err: broken pipe, client: 172.29.0.1, server: photos.mydomain.com, request: "POST /api/assets HTTP/1.1", host: "photos.mydomain.com"
2025/01/17 22:56:24 [error] 786#786: *1716 [lua] crowdsec.lua:651: Allow(): AppSec check: broken pipe, client: 172.29.0.1, server: photos.mydomain.com, request: "POST /api/assets HTTP/1.1", host: "photos.mydomain.com"

@LaurenceJJones
Copy link
Contributor

Adding routines: 2 didn't help.

2025/01/17 22:56:18 [warn] 786#786: *1716 a client request body is buffered to a temporary file /usr/local/nginx/client_body_temp/0000000004 while reading request body, client: 172.29.0.1, server: photos.mydomain.com, request: "POST /api/assets HTTP/1.1", host: "photos.mydomain.com"
2025/01/17 22:56:24 [error] 786#786: *1716 send() failed (32: Broken pipe), client: 172.29.0.1, server: photos.mydomain.com, request: "POST /api/assets HTTP/1.1", host: "photos.mydomain.com"
2025/01/17 22:56:24 [error] 786#786: *1716 [lua] crowdsec.lua:578: AppSecCheck(): Fallback because of err: broken pipe, client: 172.29.0.1, server: photos.mydomain.com, request: "POST /api/assets HTTP/1.1", host: "photos.mydomain.com"
2025/01/17 22:56:24 [error] 786#786: *1716 [lua] crowdsec.lua:651: Allow(): AppSec check: broken pipe, client: 172.29.0.1, server: photos.mydomain.com, request: "POST /api/assets HTTP/1.1", host: "photos.mydomain.com"

so it seems we never explicitly add logging to check if the response didnt encounter an error so my guess is there is an error there but it not currently logged, once we can get it merged, we can continue debugging if you point you container version to :dev but will need to wait to be merge or if you want to test a stable / tested version we have to wait to 1.6.5

@yurividal
Copy link

I'm happy to test with the :dev version. Just let me know when it is published.

@LaurenceJJones
Copy link
Contributor

I'm happy to test with the :dev version. Just let me know when it is published.

The dev image should be updated to include a log statement if the response failed to write back to the remediation, since its an Errorf you shouldn't need to keep the log level debug but may help us if you keep the context.

@yurividal
Copy link

@LaurenceJJones
Just pulled latest :dev, started crowdsec with appsec log level debug, took a video, and then opened immich.

I captured all the logs from crowdsec container, and the word "unable" doesn't appear in any log!

Image

Yet, the error still happened and the nginx logs show the broken pipe message.

@yurividal
Copy link

i also tried running it without the debug level, since its an error message. In that case, there were no logs at all printed on the crowdsec container

@LaurenceJJones
Copy link
Contributor

@LaurenceJJones Just pulled latest :dev, started crowdsec with appsec log level debug, took a video, and then opened immich.

I captured all the logs from crowdsec container, and the word "unable" doesn't appear in any log!

Image

Yet, the error still happened and the nginx logs show the broken pipe message.

with the dev container can you run cscli version just so I can ensure it is the correct version

@yurividal
Copy link

version: v1.6.4-rc4-40-g7d12b806
Codename: alphaga
BuildDate: 2025-01-18_12:16:51
GoVersion: 1.23.5
Platform: docker
libre2: C++
User-Agent: crowdsec/v1.6.4-rc4-40-g7d12b806-docker
Constraint_parser: >= 1.0, <= 3.0
Constraint_scenario: >= 1.0, <= 3.0
Constraint_api: v1
Constraint_acquis: >= 1.0, < 2.0
Built-in optional components: cscli_setup, datasource_appsec, datasource_cloudwatch, datasource_docker, datasource_file, datasource_http, datasource_journalctl, datasource_k8s-audit, datasource_kafka, datasource_kinesis, datasource_loki, datasource_s3, datasource_syslog, datasource_wineventlog

I also tried the direct container-to-container settings in npmplus appsec config, which again, changed the error to connection reset by peer and no additional logs in crowdsec container.

@yurividal
Copy link

Just found another piece of interesting information:
The issue only happens when I am directly connected to my network - either locally, or via wireguard.

When I am outside my network, access is proxied via cloudflare. In that case, it works, and i don't see any error messages in the nginx logs about appsec.
I'm not sure why this happens, or what could be different when the traffic comes via cloudflare. (it is still hitting npmplus, it just proxies through cloudflare first, before hitting my proxy).

weird, but just thought i'd mention it here.

Image

@yurividal
Copy link

I think this adds to the suspicion that appsec is being overwhelmed by the volume of data sent all at the same time.
Cloudflare slightly slows things down, and that would explain why its not overwhelming crowdsec. i dont know, its just a theory...

@LaurenceJJones
Copy link
Contributor

LaurenceJJones commented Jan 19, 2025

I think this adds to the suspicion that appsec is being overwhelmed by the volume of data sent all at the same time. Cloudflare slightly slows things down, and that would explain why its not overwhelming crowdsec. i dont know, its just a theory...

very very odd 😕

Imo, I would rather say since we can see appsec is processing the rules and is managing to send a response as we dont see the "unable to send response" in the log, it seems it something within nginx / lua code. Could you provide some specs on the machine like core count and stuff I can try to alter the VM locally to see if a more restricted spec causes issues as at the moment im running 4vcpu and 4gb RAM.

@yurividal
Copy link

Full specs of the machine running crowdsec and npm

OS: Linux Mint vanessa 21 x86_64
Host: 20B7S04702 (ThinkPad T440)
Kernel: Linux 5.15.0-122-generic
Uptime: 113 days(!), 1 hour, 21 mins
Packages: 2507 (dpkg)
Shell: zsh 5.8.1
CPU: Intel(R) Core(TM) i5-4300U (4) @ 2.90 GHz
GPU: Intel Haswell-ULT Integrated Graphics Controller @ 1.10 GHz [Integrated]
Memory: 7.63 GiB (58%)
Swap: 8.00 GiB (19%)
Disk (/): 103.61 GiB / 218.52 GiB (47%) - ext4

it seems it something within nginx / lua code.

Do you know if there is any way to enable further debugging on the lua side?

@yurividal
Copy link

Also, this is probably unrelated, but maybe worth the read:

hyperium/hyper#2384

In this project, users were seeing broken pipe message when 2 components were communicating, if the body was too large and the connection is being closed without consuming the full body of the message.

@LaurenceJJones
Copy link
Contributor

LaurenceJJones commented Jan 19, 2025

Full specs of the machine running crowdsec and npm

OS: Linux Mint vanessa 21 x86_64
Host: 20B7S04702 (ThinkPad T440)
Kernel: Linux 5.15.0-122-generic
Uptime: 113 days(!), 1 hour, 21 mins
Packages: 2507 (dpkg)
Shell: zsh 5.8.1
CPU: Intel(R) Core(TM) i5-4300U (4) @ 2.90 GHz
GPU: Intel Haswell-ULT Integrated Graphics Controller @ 1.10 GHz [Integrated]
Memory: 7.63 GiB (58%)
Swap: 8.00 GiB (19%)
Disk (/): 103.61 GiB / 218.52 GiB (47%) - ext4

it seems it something within nginx / lua code.

Do you know if there is any way to enable further debugging on the lua side?

Yes you can enable further debugging within nginx if it compiled with the debug flag (nginx -V 2>&1 | grep -- '--with-debug') then you can define a debug log file:

error_log /data/nginx/debug.log debug;

I would only recommend to add this to the immich vhost as if added globally this will cause a shed ton of logs 😅

However, testing the nginx compiled with NPMPlus it is not compiled with the debug flag.

nginx version: NPMplus/1.27.4 (freenginx)
built by gcc 14.2.0 (Alpine 14.2.0)
built with OpenSSL 3.1.7+quic 3 Sep 2024
TLS SNI support enabled
configure arguments: --build=freenginx --with-compat --with-threads --with-file-aio --with-libatomic --with-pcre --with-pcre-jit --with-openssl-opt='no-legacy --libdir=lib' --with-openssl=/usr/local/openssl --with-mail --with-mail_ssl_module --with-stream --with-stream_ssl_module --with-stream_ssl_preread_module --with-stream_geoip_module --with-stream_realip_module --with-http_v2_module --with-http_v3_module --with-http_ssl_module --with-http_geoip_module --with-http_realip_module --with-http_gunzip_module --with-http_addition_module --with-http_gzip_static_module --with-http_auth_request_module --with-http_geoip_module --with-http_sub_module --with-http_stub_status_module --add-module=/src/ngx_brotli --add-module=/src/ngx-fancyindex --add-module=/src/headers-more-nginx-module --add-module=/src/njs/nginx --add-module=/src/ngx_devel_kit --add-module=/src/lua-nginx-module --add-module=/src/ModSecurity-nginx --add-module=/src/ngx_http_geoip2_module --add-module=/src/nginx-ntlm-module

@yurividal
Copy link

@LaurenceJJones how big are the videos you were testing the upload with?
Here is my new theory:
As seen here, immich does not implement chunking in uploads. The file is uploaded as a single blob. (Which causes issues in cloudflare, since max body size on free cloudflare is locked to 100m, but thats not important for our discussion here).

When an upload is being sent directly to nginx, it is sending the entire request and body to appsec. Appsec cannot handle such big body-size, and drops the connection without closing it.

The lua component sees the connection being droped with no Connection: close header, and throws a broken pipe error.

The reason why it might be working when going trough cloudflare (assuming file is less than 100m) is that cloudflare might actually be breaking the payload into smaller chunks.

@LaurenceJJones
Copy link
Contributor

LaurenceJJones commented Jan 19, 2025

@LaurenceJJones how big are the videos you were testing the upload with? Here is my new theory: As seen here, immich does not implement chunking in uploads. The file is uploaded as a single blob. (Which causes issues in cloudflare, since max body size on free cloudflare is locked to 100m, but thats not important for our discussion here).

When an upload is being sent directly to nginx, it is sending the entire request and body to appsec. Appsec cannot handle such big body-size, and drops the connection without closing it.

The lua component sees the connection being droped with no Connection: close header, and throws a broken pipe error.

The reason why it might be working when going trough cloudflare (assuming file is less than 100m) is that cloudflare might actually be breaking the payload into smaller chunks.

I was testing with 3mb files as well as 50mb files as per the OP issue was any upload size was causing an issue.

Edit: just testing with 200mb and 600mb upload also and still no broken pipes for me so still very lost.

@LaurenceJJones
Copy link
Contributor

Hey to keep thread updated, Zoey added the --with-debug flag to freenginx build so within the next release we can enable debug logging to see what lua is reporting under the hood.

@LaurenceJJones
Copy link
Contributor

Hey Zoey managed to generate a build with the debug flag, when you get chance to continue debugging switch over to :develop tag on NPMPlus then you can add the above debug configuration to the immich server config.

error_log /data/nginx/debug.log debug;

then if we can get the logs, please note nginx debug is very very verbose so feel free to send the log to laurence at crowdsec.net . I type as words so dont get spammed by bots.

@LaurenceJJones
Copy link
Contributor

I saw the error, lets wait until its more stable before proceeding with debugging further

@yurividal
Copy link

Hey everyone. I managed to upgrade to the :develop branch, and enable the debug logs.

but, to my surprise, the ISSUE IS GONE.
I tried, in many forms, replicating the error, but it just doesn't happen. I was able to upload multiple videos to immich, of different sizes, and didn't get the appsec error not even once.
I also enabled appsec debug, just to make sure that requests were actually still being sent to appsec, and they were.

So, not sure exactly what changed in the :develop version, but it seems to have fixed the issue.

@Zoey2936
Copy link

Zoey2936 commented Jan 23, 2025

possible changes I could think of are theese two updates:

@LaurenceJJones
Copy link
Contributor

Hey everyone. I managed to upgrade to the :develop branch, and enable the debug logs.

but, to my surprise, the ISSUE IS GONE. I tried, in many forms, replicating the error, but it just doesn't happen. I was able to upload multiple videos to immich, of different sizes, and didn't get the appsec error not even once. I also enabled appsec debug, just to make sure that requests were actually still being sent to appsec, and they were.

So, not sure exactly what changed in the :develop version, but it seems to have fixed the issue.

😕 is an understatement at the moment 😕

@yurividal
Copy link

Yeah, kind of a bummer that we couldn't pin-point the exact cause. But, at least it seems to be fixed. Maybe we put this issue on pause, until Zoey updates the stable version of npmplus, and then more users can report if they still see the issue or not.

@LaurenceJJones
Copy link
Contributor

LaurenceJJones commented Jan 23, 2025

Yeah, kind of a bummer that we couldn't pin-point the exact cause. But, at least it seems to be fixed. Maybe we put this issue on pause, until Zoey updates the stable version of npmplus, and then more users can report if they still see the issue or not.

I agree, I wont close it as there might be something hidden somewhere 👍🏻

Just note as @blotus said there is a major update coming in #80 which add a new feature, adds tests and streamlines the code. So once this is merged and NPMPlus is stable and upgrades to have the metrics then I will be happy to class as complete.

Edit: also thank you @yurividal for your time in debugging and allowing us to use your environment as a test place to get to the not so bottom of this 😆

@LaurenceJJones
Copy link
Contributor

So just a ping to all, Zoey released a new update to latest, so if you can all try pulling down the latest and update whenever you can just confirm if this resolved your issue OR it was maybe a fluke resolve for some 😅

@Infiniteez
Copy link

Hi, I've been lurking in this discussion in the past few weeks. I pulled the latest tag and unfortunately the problem is still present for me:

2025/01/28 10:43:47 [warn] 66024#66024: *21195 a client request body is buffered to a temporary file /usr/local/nginx/client_body_temp/0000000030 while reading request body, client: 10.8.0.5, server: mydomain, request: "POST /api/assets HTTP/1.1", host: "mydomain"
2025/01/28 10:43:51 [error] 66024#66024: *21195 send() failed (32: Broken pipe), client: 10.8.0.5, server: mydomain, request: "POST /api/assets HTTP/1.1", host: "mydomain"
2025/01/28 10:43:51 [error] 66024#66024: *21195 [lua] crowdsec.lua:578: AppSecCheck(): Fallback because of err: broken pipe, client: 10.8.0.5, server: mydomain, request: "POST /api/assets HTTP/1.1", host: "mydomain"
2025/01/28 10:43:51 [error] 66024#66024: *21195 [lua] crowdsec.lua:651: Allow(): AppSec check: broken pipe, client: 10.8.0.5, server: mydomain, request: "POST /api/assets HTTP/1.1", host: "mydomain"
2025/01/28 10:43:51 [alert] 66024#66024: *21195 [lua] crowdsec.lua:718: Allow(): [Crowdsec] denied '10.8.0.5' with 'ban' (by appsec), client: 10.8.0.5, server: mydomain, request: "POST /api/assets HTTP/1.1", host: "mydomain"

The files I'm trying to upload to immich are ~4MB photos taken from the camera app on my Samsung Galaxy S24.

@LaurenceJJones
Copy link
Contributor

Hi, I've been lurking in this discussion in the past few weeks. I pulled the latest tag and unfortunately the problem is still present for me:

2025/01/28 10:43:47 [warn] 66024#66024: *21195 a client request body is buffered to a temporary file /usr/local/nginx/client_body_temp/0000000030 while reading request body, client: 10.8.0.5, server: mydomain, request: "POST /api/assets HTTP/1.1", host: "mydomain"
2025/01/28 10:43:51 [error] 66024#66024: *21195 send() failed (32: Broken pipe), client: 10.8.0.5, server: mydomain, request: "POST /api/assets HTTP/1.1", host: "mydomain"
2025/01/28 10:43:51 [error] 66024#66024: *21195 [lua] crowdsec.lua:578: AppSecCheck(): Fallback because of err: broken pipe, client: 10.8.0.5, server: mydomain, request: "POST /api/assets HTTP/1.1", host: "mydomain"
2025/01/28 10:43:51 [error] 66024#66024: *21195 [lua] crowdsec.lua:651: Allow(): AppSec check: broken pipe, client: 10.8.0.5, server: mydomain, request: "POST /api/assets HTTP/1.1", host: "mydomain"
2025/01/28 10:43:51 [alert] 66024#66024: *21195 [lua] crowdsec.lua:718: Allow(): [Crowdsec] denied '10.8.0.5' with 'ban' (by appsec), client: 10.8.0.5, server: mydomain, request: "POST /api/assets HTTP/1.1", host: "mydomain"

The files I'm trying to upload to immich are ~4MB photos taken from the camera app on my Samsung Galaxy S24.

Could you add the debug log above error_log /data/nginx/debug.log debug; to the immich server and then send the logs to [email protected] so we can look through it.

@Infiniteez
Copy link

Logs sent!

@LaurenceJJones
Copy link
Contributor

Just a quick update, from the logs you sent:

2025/01/28 12:05:54 [debug] 69325#69325: *21967 send: fd:14 2345456 of 4077509
2025/01/28 12:05:54 [debug] 69325#69325: *21967 send: fd:14 -1 of 1732053
2025/01/28 12:05:54 [error] 69325#69325: *21967 send() failed (32: Broken pipe), client: 10.8.0.5, server: subdomain.domain.tld, request: "POST /api/assets HTTP/1.1", host: "subdomain.domain.tld"

So it manages to write 2mb to the appsec component then the pipe is broken (there another issue in appsec I raised, that if the remediation fails / cancels the request it does not propogate to the appsec so the appsec always runs the requests which is not intentional) so it doesnt manage to write the rest of the body. Interesting it always around the 2mb area where it happens, 😕 but at least we can see where it gets the broken pipe from. So now we narrowed it down to nginx -> appsec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants