Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gunicorn causes signed GET requests (e.g. AUTHORIZED_FETCH) to fail in Production #2844

Closed
futzle opened this issue May 9, 2023 · 32 comments
Labels
bug Something isn't working deployment

Comments

@futzle
Copy link

futzle commented May 9, 2023

Describe the bug
On production branch, dockerless, still getting Unauthorized: /user/bookwyrmuser/inbox on follow from a Mastodon instance.

To Reproduce
Steps to reproduce the behavior:

  1. Install production branch on Debian 11, Dockerless instructions.
  2. Create a regular user on bookwyrm. Allow automatic follows without approval.
  3. Try to follow this user from a Mastodon 4.0.2 (Hometown 1.1.1) instance with AUTHORIZED_FETCH turned on.
  4. Error appears in bookwyrm log: Unauthorized: /user/bookwyrmuser/inbox
  5. On Mastodon instance, follow degrades to a request and becomes "Withdraw follow request".

Expected behavior
The follow should complete, and it should be visible at both ends.

Instance
https://outside.ofa.dog/
https://old.mermaid.town/

Additional context
This issue looks just like closed issues #2734, #1964, #2626, which are all supposed to be fixed by #2613. Yet I'm pretty sure that my installation, new on 8 May 2023, includes #2613 and I'm still getting the exact same behaviour.

Users on other instances can follow me on Bookwyrm, even users on other Mastodon instances.

I can't seem to follow any Bookwyrm instances from old.mermaid.town.

I tried two users on Mastodon instance old.mermaid.town and had the same result.

Mastodon server has errors in sidekiq logs:

"{\"@context\":\"https://www.w3.org/ns/activitystreams\",\"id\":\"https://old.mermaid.town/4fb96c8a-614f-478c-b4c4-24ad957527b3\",\"type\":\"Follow\",\"actor\":\"https://old.mermaid.town/users/Kymberly\",\"object\":\"https://outside.ofa.dog/user/futzle\"}",  109376236586682428, "https://outside.ofa.dog/user/futzle/inbox"

Mastodon::UnexpectedResponseError: https://outside.ofa.dog/user/futzle/inbox returned code 401
"{\"@context\":\"https://www.w3.org/ns/activitystreams\",\"id\":\"https://old.mermaid.town/users/futzle#follows/5190/undo\",\"type\":\"Undo\",\"actor\":\"https://old.mermaid.town/users/futzle\",\"object\":{\"id\":\"https://old.mermaid.town/9d8361bd-83ca-4d42-b529-910fb6833575\",\"type\":\"Follow\",\"actor\":\"https://old.mermaid.town/users/futzle\",\"object\":\"https://outside.ofa.dog/user/futzle\"}}", 109283781929862046, "https://outside.ofa.dog/user/futzle/inbox"

Stoplight::Error::RedLight: https://outside.ofa.dog/user/futzle/inbox

I tried to follow my Bookwyrm account from a user on Mastodon 4.1 instance aus.social (which AFAIK has AUTHORIZED_FETCH turned off) and the follow succeeded at both ends.


Desktop (please complete the following information):
- OS: MacOS
- Browser Firefox
- Version 112

@futzle futzle added the bug Something isn't working label May 9, 2023
@futzle futzle changed the title Still can't follow Bookwyrm user from Mastodon instance Still can't follow Bookwyrm user from Mastodon Hometown instance, may be AUTHORIZED_FETCH related May 9, 2023
@hughrun
Copy link
Contributor

hughrun commented May 9, 2023

I can confirm you have the bookwyrm.instance.actor user which means #2613 has been applied.

@futzle
Copy link
Author

futzle commented May 9, 2023

My Mastodon (Hometown) environment:

LOCAL_DOMAIN=old.mermaid.town
SINGLE_USER_MODE=false
SECRET_KEY_BASE=xxxxx
OTP_SECRET=xxxxx
VAPID_PRIVATE_KEY=xxxxx
VAPID_PUBLIC_KEY=xxxxx
DB_HOST=/var/run/postgresql
DB_PORT=5432
DB_NAME=mastodon_production
DB_USER=mastodon
DB_PASS=
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=
S3_ENABLED=true
S3_PROTOCOL=https
S3_BUCKET=st0
S3_REGION=xxxxx
S3_HOSTNAME=xxxxx
S3_ENDPOINT=xxxxx
AWS_ACCESS_KEY_ID=xxxxx
AWS_SECRET_ACCESS_KEY=xxxxx
SMTP_SERVER=localhost
SMTP_PORT=25
SMTP_AUTH_METHOD=none
SMTP_OPENSSL_VERIFY_MODE=none
SMTP_FROM_ADDRESS='Old Mermaid Town <[email protected]>'
MAX_TOOT_CHARS=1500
AUTHORIZED_FETCH=true
ES_ENABLED=true
ES_HOST=localhost
ES_PORT=9200

In particular I haven't messed with DISALLOW_UNAUTHENTICATED_API_ACCESS or LIMITED_FEDERATION_MODE.

My Bookwyrm environment:

SECRET_KEY="xxxxx"
DEBUG=false
USE_HTTPS=true
DOMAIN=outside.ofa.dog
EMAIL="[email protected]"
LANGUAGE_CODE="en-au"
DEFAULT_LANGUAGE="English"
MEDIA_ROOT=images/
PGPORT=5432
POSTGRES_PASSWORD=xxxxx
POSTGRES_USER=bookwyrm
POSTGRES_DB=bookwyrm
POSTGRES_HOST=localhost
MAX_STREAM_LENGTH=200
REDIS_ACTIVITY_HOST=localhost
REDIS_ACTIVITY_PORT=6379
REDIS_ACTIVITY_PASSWORD=
REDIS_BROKER_HOST=localhost
REDIS_BROKER_PORT=6379
REDIS_BROKER_PASSWORD=
FLOWER_PORT=8888
FLOWER_USER=admin
FLOWER_PASSWORD=changeme
EMAIL_HOST=localhost
EMAIL_PORT=25
EMAIL_HOST_USER=""
EMAIL_HOST_PASSWORD=
EMAIL_USE_TLS=false
EMAIL_USE_SSL=false
EMAIL_SENDER_NAME=editor
EMAIL_SENDER_DOMAIN=outside.ofa.dog
SEARCH_TIMEOUT=5
QUERY_TIMEOUT=5
ENABLE_THUMBNAIL_GENERATION=true
USE_S3=false
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
ENABLE_PREVIEW_IMAGES=False
PREVIEW_BG_COLOR=use_dominant_color_light
PREVIEW_TEXT_COLOR=#363636
PREVIEW_IMG_WIDTH=1200
PREVIEW_IMG_HEIGHT=630
PREVIEW_DEFAULT_COVER_COLOR=#002549
OTEL_EXPORTER_OTLP_ENDPOINT=
OTEL_EXPORTER_OTLP_HEADERS=
OTEL_SERVICE_NAME=
HTTP_X_FORWARDED_PROTO=false
TWO_FACTOR_LOGIN_VALIDITY_WINDOW=2
TWO_FACTOR_LOGIN_MAX_SECONDS=60
CSP_ADDITIONAL_HOSTS=

@hughrun
Copy link
Contributor

hughrun commented May 9, 2023

Thanks @futzle for all this detail. This is a confusing one. I'm trying to think through the possible causes here, and I may need to spin up a test mastodon server on the weekend to try to get to the bottom of it, but here's a list of possibilities I can think of:

  1. The fix in Enable communication with "authorized_fetch" Mastodon servers #2613 doesn't actually fix AUTHORIZED_FETCH/"secure mode" errors at all, but somehow we convinced ourselves that it did (unlikely, but we were working on a lot of unrelated federation problems at the time caused by a huge influx of new users, so it's possible);
  2. There is something specific to Hometown + secure mode that we didn't pick up (testing has been done on vanilla Mastodon with secure mode previously)
  3. There is something specific to non-Docker Bookwyrm that is causing this bug.
  4. There is something specific to your particular setup that is causing this bug.

Something that might be helpful to narrow things down is to test whether you can send DMs in both directions to/from non-secure Mastodon or other Bookwyrm instances. If that also throws a 401 then we have a more general problem with signed requests, rather than specifically GET requests signed by the generic server actor.

@futzle
Copy link
Author

futzle commented May 10, 2023

I caught some HTTP logs on the Mastodon Hometown instance (old.mermaid.town) and on the Bookwyrm instance (outside.ofa.dog).

Hometown instance old.mermaid.town IP: 2400:8907::f03c:93ff:fe78:43a2

Bookwyrm instance outside.ofa.dog IP: 2400:8907::f03c:93ff:fec5:bf6a

Search for https : //outside.ofa.dog/user/bookwyrm_user from old.mermaid.town:

outside.ofa.dog nginx log:

MISS - 2400:8907::f03c:93ff:fe78:43a2 [10/May/2023:00:19:57 +0000] "GET /user/bookwyrm_user HTTP/1.1" 200 1537 "-" "http.rb/5.1.0 (Mastodon/4.0.2+hometown-1.1.1; +https://old.mermaid.town/)" 0.032 0.032
MISS - 2400:8907::f03c:93ff:fe78:43a2 [10/May/2023:00:19:57 +0000] "GET /.well-known/webfinger?resource=acct:[email protected] HTTP/1.1" 200 288 "-" "http.rb/5.1.0 (Mastodon/4.0.2+hometown-1.1.1; +https://old.mermaid.town/)" 0.016 0.015
MISS - 2400:8907::f03c:93ff:fe78:43a2 [10/May/2023:00:19:57 +0000] "GET /user/bookwyrm_user/outbox HTTP/1.1" 200 295 "-" "http.rb/5.1.0 (Mastodon/4.0.2+hometown-1.1.1; +https://old.mermaid.town/)" 0.000, 0.016 0.017
MISS - 2400:8907::f03c:93ff:fe78:43a2 [10/May/2023:00:19:57 +0000] "GET /user/bookwyrm_user/following HTTP/1.1" 200 1320 "-" "http.rb/5.1.0 (Mastodon/4.0.2+hometown-1.1.1; +https://old.mermaid.town/)" 0.016 0.016
MISS - 2400:8907::f03c:93ff:fe78:43a2 [10/May/2023:00:19:57 +0000] "GET /user/bookwyrm_user/followers HTTP/1.1" 200 1320 "-" "http.rb/5.1.0 (Mastodon/4.0.2+hometown-1.1.1; +https://old.mermaid.town/)" 0.020 0.022

Other logs have no interesting traffic.

Attempt to follow bookwyrm_user from hometown_user at UTC 00:23:25:

outside.ofa.dog nginx log:

- - 2400:8907::f03c:93ff:fe78:43a2 [10/May/2023:00:23:25 +0000] "POST /user/bookwyrm_user/inbox HTTP/1.1" 401 0 "-" "http.rb/5.1.0 (Mastodon/4.0.2+hometown-1.1.1; +https://old.mermaid.town/)" 8.273 8.269
- - 2400:8907::f03c:93ff:fe78:43a2 [10/May/2023:00:23:47 +0000] "POST /user/bookwyrm_user/inbox HTTP/1.1" 401 0 "-" "http.rb/5.1.0 (Mastodon/4.0.2+hometown-1.1.1; +https://old.mermaid.town/)" 0.156 0.156

outside.ofa.dog bookwyrm journal:

May 10 00:23:25 outside gunicorn[106467]: Unauthorized: /user/bookwyrm_user/inbox
May 10 00:23:47 outside gunicorn[106467]: Unauthorized: /user/bookwyrm_user/inbox

old.mermaid.town nginx_log:

2400:8907::f03c:93ff:fec5:bf6a - - [10/May/2023:10:23:25 +1000] "GET /users/hometown_user HTTP/1.1" 401 29 "-" "python-requests/2.28.2 (BookWyrm/0.6.2; +https://outside.ofa.dog/)"
2400:8907::f03c:93ff:fec5:bf6a - - [10/May/2023:10:23:25 +1000] "GET /users/hometown_user HTTP/1.1" 401 72 "-" "python-requests/2.28.2"
2400:8907::f03c:93ff:fec5:bf6a - - [10/May/2023:10:23:47 +1000] "GET /users/hometown_user HTTP/1.1" 401 29 "-" "python-requests/2.28.2 (BookWyrm/0.6.2; +https://outside.ofa.dog/)"
2400:8907::f03c:93ff:fec5:bf6a - - [10/May/2023:10:23:47 +1000] "GET /users/hometown_user HTTP/1.1" 401 72 "-" "python-requests/2.28.2"

old.mermaid.town mastodon journal:

May 10 10:23:25 old bundle[1181]: [894a5c72-fcd6-4915-8a47-ddedc4b8ce3f] method=GET path=/users/hometown_user format=json controller=AccountsController action=show status=401 duration=9.80 view=0.44 db=1.79
May 10 10:23:25 old bundle[1181]: [70f83732-716f-41d1-9538-3e33642a7a25] method=GET path=/users/hometown_user format=json controller=AccountsController action=show status=401 duration=36.49 view=0.74 db=2.44 key=https://outside.ofa.dog/user/bookwyrm.instance.actor#main-key
May 10 10:23:26 old bundle[3339097]: 2023-05-10T00:23:26.140Z pid=3339097 tid=1q3ep5 WARN: {"context":"Job raised exception","job":{"retry":16,"queue":"push","dead":false,"args":["{\"@context\":\"https://www.w3.org/ns/activitystreams\",\"id\":\"https://old.mermaid.town/bb119164-c536-4301-8785-225695ade557\",\"type\":\"Follow\",\"actor\":\"https://old.mermaid.town/users/hometown_user\",\"object\":\"https://outside.ofa.dog/user/bookwyrm_user\"}",110341296691175779,"https://outside.ofa.dog/user/bookwyrm_user/inbox"],"class":"ActivityPub::DeliveryWorker","jid":"77771c5b932916dfc1b9c8e6","created_at":1683678197.5960422,"enqueued_at":1683678197.5962195}}
May 10 10:23:26 old bundle[3339097]: 2023-05-10T00:23:26.141Z pid=3339097 tid=1q3ep5 WARN: Mastodon::UnexpectedResponseError: https://outside.ofa.dog/user/bookwyrm_user/inbox returned code 401
May 10 10:23:47 old bundle[1174]: [ba144ed9-21f3-4daa-bfc7-ad940c1b1dd8] method=GET path=/users/hometown_user format=json controller=AccountsController action=show status=401 duration=8.99 view=0.73 db=1.03
May 10 10:23:47 old bundle[1174]: [6cffd5a3-ee63-464c-a8c0-7e4526e7439e] method=GET path=/users/hometown_user format=json controller=AccountsController action=show status=401 duration=24.88 view=0.50 db=3.28 key=https://outside.ofa.dog/user/bookwyrm.instance.actor#main-key
May 10 10:23:47 old bundle[3339097]: 2023-05-10T00:23:47.404Z pid=3339097 tid=1q47px WARN: {"context":"Job raised exception","job":{"retry":16,"queue":"push","dead":false,"args":["{\"@context\":\"https://www.w3.org/ns/activitystreams\",\"id\":\"https://old.mermaid.town/bb119164-c536-4301-8785-225695ade557\",\"type\":\"Follow\",\"actor\":\"https://old.mermaid.town/users/hometown_user\",\"object\":\"https://outside.ofa.dog/user/bookwyrm_user\"}",110341296691175779,"https://outside.ofa.dog/user/bookwyrm_user/inbox"],"class":"ActivityPub::DeliveryWorker","jid":"77771c5b932916dfc1b9c8e6","created_at":1683678197.5960422,"enqueued_at":1683678227.2260702,"error_message":"https://outside.ofa.dog/user/bookwyrm_user/inbox returned code 401","error_class":"Mastodon::UnexpectedResponseError","failed_at":1683678206.1401398,"retry_count":0}}
May 10 10:23:47 old bundle[3339097]: 2023-05-10T00:23:47.404Z pid=3339097 tid=1q47px WARN: Mastodon::UnexpectedResponseError: https://outside.ofa.dog/user/bookwyrm_user/inbox returned code 401

My wild guess as to the events:

  1. old.mermaid.town sends a POST to https : //outside.ofa.dog/user/bookwyrm_user/inbox, presumably the follow request.
  2. outside.ofa.dog counters with a GET to https : //old.mermaid.town/users/hometown_user which returns 401 because AUTHORIZED_FETCH
  3. outside.ofa.dog immediately follows with another GET to https : //old.mermaid.town/users/hometown_user but with key=https : //outside.ofa.dog/user/bookwyrm.instance.actor#main-key which also returns 401. I don't know a lot about the Fediverse's protocols but this feels like it shouldn't be happening.
  4. Having been rejected twice, outside.ofa.dog returns a 401 to old.mermaid.town.
  5. old.mermaid.town queues a retry of the follow at 00:23:26, which occurs at 00:23:47. The logs repeat (and do so again, but truncated in this post).

Is it possible that Hometown is rejecting the signature, or that it considers the signed context invalid?

@futzle
Copy link
Author

futzle commented May 10, 2023

I temporarily disabled AUTHORIZED_FETCH on Mastodon Hometown instance old.mermaid.town and the follow went straight through. Now that the follow is connected, both accounts can talk to each other just fine, even after re-enabling AUTHORIZED_FETCH.

Seems that it's just the following workflow that's being interfered with.

@hughrun
Copy link
Contributor

hughrun commented May 10, 2023

Ahhh I think the problem is /user/bookwyrm_user - this is a nonexistent user. I wonder where that is coming from... 🤔

@futzle
Copy link
Author

futzle commented May 10, 2023

Ahhh I think the problem is /user/bookwyrm_user - this is a nonexistent user. I wonder where that is coming from... 🤔

Au contraire, they do exist! I created a user bookwyrm_user on outside.ofa.dog, and a user hometown_user on old.mermaid.town. For testing, and to make the logfiles more readable.

@hughrun
Copy link
Contributor

hughrun commented May 10, 2023

Aha!

is it possible that Hometown is rejecting the signature, or that it considers the signed context invalid?

Yes. Sending a signed request with the instance actor after getting a 401 on an unsigned user request is intended behaviour to deal with authorized fetch. We should not be getting a 401 from this second request, however. That looks to be where we will find the source of the problem.

@hughrun
Copy link
Contributor

hughrun commented May 10, 2023

I temporarily disabled AUTHORIZED_FETCH on Mastodon Hometown instance old.mermaid.town and the follow went straight through. Now that the follow is connected, both accounts can talk to each other just fine, even after re-enabling AUTHORIZED_FETCH.

Is it possible for you to check whether following now works for other users on old.mermaid.town without having to disable AUTHORIZED_FETCH? I'm curious whether this is a "first use" problem related to the generic instance user.

@futzle
Copy link
Author

futzle commented May 11, 2023

Is it possible for you to check whether following now works for other users on old.mermaid.town without having to disable AUTHORIZED_FETCH?

OMG it is a "first use" problem. Following from Old Mermaid Town to a different user on Outside of a Dog works perfectly, with AUTHORIZED_FETCH subsequently turned on.

Logs are much simpler now, I guess because the two instances trust each other.

old.mermaid.town nginx log:

2400:8907::f03c:93ff:fec5:bf6a - - [11/May/2023:14:10:18 +1000] "POST /users/hometown_user/inbox HTTP/1.1" 202 5 "-" "python-requests/2.28.2 (BookWyrm/0.6.2; +https://outside.ofa.dog/)"

Note that there is no GET on /users/hometown_user like the above log, which is the bit that was getting a 401 error. Is that because outside.ofa.dog doesn't need to verify this server's existence any more? So the suspected failure path has been bypassed?

old.mermaid.town Mastodon log:

May 11 14:10:18 old bundle[3578943]: [85c4d57c-c262-4ecc-85bf-a493daa0cb11] method=POST path=/users/hometown_user/inbox format=*/* controller=ActivityPub::InboxesController action=create status=202 duration=22.79 view=0.00 db=2.49 key=https://outside.ofa.dog/user/bookwyrm_user#main-key

Same request as the nginx log shows.

outside.ofa.dog nginx log:

- - 2400:8907::f03c:93ff:fe78:43a2 [11/May/2023:04:10:18 +0000] "POST /user/bookwyrm_user/inbox HTTP/1.1" 200 0 "-" "http.rb/5.1.0 (Mastodon/4.0.2+hometown-1.1.1; +https://old.mermaid.town/)" 0.060 0.061

If I attempt to follow from Hometown instance old.mermaid.town a user on a different Bookwyrm server (one which has a bookwyrm.instance.actor user, so has #2613 applied), I get the originally described hanging behaviour.

@hughrun
Copy link
Contributor

hughrun commented May 11, 2023

Thanks this is very helpful.

I need to think about this more but from what you're saying yes this is something about when a whole new ActivityPub server is encountered, rather than simply a new-to-our-server user.

Note that there is no GET on /users/hometown_user like the above log, which is the bit that was getting a 401 error. Is that because outside.ofa.dog doesn't need to verify this server's existence any more?

The prior GET was to retrieve the user profile information and importantly, their public key. Once old.mermaid.town has that it makes a local copy and knows the user so there is no need to fetch it every time it receives a POST.

@hughrun
Copy link
Contributor

hughrun commented May 14, 2023

This appears to be something to do with production. Testing with the Docker setup:

The code added in #2613 appears to be the same on both branches, so it's something else. My guess is something to do with the bookwyrm.instance.actor ID url or that user's keys, but I can't work out what.

The only other thing that is different other than the branches, is that I'm using ngrok to maintain real domain names when testing: usually on main that means forwarding port 1333, but when running in production we have an inbuilt nginx I think? In any case, for production branch I have to forward port 80 instead of 1333. I can't think of any reason this matters, but leave it here for completeness.

@mouse-reeve do you have any ideas here?

@eldang
Copy link

eldang commented May 18, 2023

FWIW, I'm encountering the same issue. From my hometown account https://weirder.earth/web/@eldang I can't follow either of my Bookwyrm accounts https://outside.ofa.dog/user/eldang or https://books.theunseen.city/user/eldang

Because I have been able to follow my account on the flagship instance -- https://bookwyrm.social/user/eldang -- I had assumed that #2613 was working as intended and this had to be a config issue with books.theunseen.city. But when I joined outside.ofa.dog I found I had the same issue. The first use issue probably explains that discrepancy.

I'm afraid I have no insight into how this might be fixed, but if any permutation of my accounts are helpful for testing I'm happy to try stuff out.

@WesleyAC
Copy link
Member

WesleyAC commented Jun 3, 2023

@hughrun what is your HTTPS setup like on each of those? that's something that's different on the prod and main branches, prod has certbot by default. I presume ngrok does HTTPS termination in both cases?

@WesleyAC
Copy link
Member

WesleyAC commented Jun 3, 2023

I don't really understand how this is all supposed to work, but one question I have is — doesn't the Mastodon server need to get the key for bookwyrm.instance.actor? Why don't I see that request in the Bookwyrm logs?

@hughrun
Copy link
Contributor

hughrun commented Jun 3, 2023

Yes that's right. Mastodon should reject the initial GET from the user, then we should send it again as a signed GET from the instance actor: Mastodon should then send a request back to collect the instance actor's key.

This all works correctly on main in Docker but there appears to be something not quite right about Dockerless setups and/or production in certain circumstances.

I think I didn't set it up properly when I was trying to test production to find the problem so that could be a red herring.

@futzle is also having an issue where statuses are published twice: I suspect this is related and we also have an Issue where someone else had the same problem.

@OliverPifferi
Copy link

Having the same issue here with a working Mastodon- and BookWyrm-instance. The latter was reverted from an offline snapshot and since then, the user feed disappeared (to be specific, the status updates so far did - new ones are displayed as they should) and Mastodon wasn't able to follow the BookWyrm-user anymore.

I can request a follow (same procedure: Mastodon tells me that I am following, after refreshing and looking at the profile it seems to be "just" a follow request) but the BookWyrm-instance tells me that I've got a new follower - which is somehow odd as the account isn't displayed as being followed from Mastodon.

Using the latest commits here and everything went fine BEFORE the snapshot.

@hughrun
Copy link
Contributor

hughrun commented Jun 11, 2023

@OliverPifferi is your Bookwyrm instance using Docker, or the Dockerless setup?

@OliverPifferi
Copy link

@OliverPifferi is your Bookwyrm instance using Docker, or the Dockerless setup?

Sorry, forgot to mention that - it's deployed via Dockerless setup.

@hughrun
Copy link
Contributor

hughrun commented Aug 6, 2023

I have done further testing on this with latest Bookwyrm, and latest Mastodon+Hometown.

  1. Works correctly with Bookwyrm running main branch on Docker behind ngrok
  2. Does NOT work correctly with Bookwyrm running production branch on Docker
  3. Does NOT work correctly with Bookwyrm running production branch on Dockerless

So problem definitely seems to be with production.

@mouse-reeve is it possible the issue here is something gunicorn does differently to Django's runserver?

@hughrun hughrun added the help wanted Extra attention is needed label Aug 6, 2023
@mouse-reeve
Copy link
Member

I genuinely have no idea at this point 😬

@WesleyAC
Copy link
Member

WesleyAC commented Aug 7, 2023

@hughrun it should be possible to test that by running something like:

docker-compose up -d
docker-compose stop web
docker-compose run web /app/manage.py runserver 0.0.0.0:8000

on the production branch. or changing the command line in docker-compose.yml

@hughrun
Copy link
Contributor

hughrun commented Aug 7, 2023

@WesleyAC good thinking. Still getting the same behaviour.

OMG I just wasn't patient enough. IT WORKS!!!

So, this is caused by something in the way gunicorn is handling ...something.

@hughrun hughrun changed the title Still can't follow Bookwyrm user from Mastodon Hometown instance, may be AUTHORIZED_FETCH related Gunicorn causes signed GET requests (e.g. AUTHORIZED_FETCH) to fail in Production Aug 7, 2023
@hughrun
Copy link
Contributor

hughrun commented Aug 7, 2023

Thanks @WesleyAC I think I now have a solution for this 🎉

I don't profess to know enough about http requests and threading to understand exactly why this happens, but the answer looks to be "runserver is multi-threaded and gunicorn can be, but isn't by default".

See this Stack Overflow answer to a similar problem.

The answer refers to "workers" but they are actually talking about threads. See Gunicorn documentation on threads.

More on how to mix workers and threads in Gunicorn and how they differ.

I have successfully tested adding an extra thread in the production docker-compose.yml:

web:
  build: .
  env_file: .env
  command: gunicorn bookwyrm.wsgi:application --threads=2 --bind 0.0.0.0:8000

This fixes the problem reported here by @futzle.

If anything, adding one or more extra threads seems like it should result in improved performance in most instances, but I'd like more opinions from perf champ @WesleyAC and Fearless Leader @mouse-reeve before I log a simple PR for this, in case there are other ramifications.

@futzle
Copy link
Author

futzle commented Aug 7, 2023

!!!

I've edited /etc/systemd/system/bookwyrm.service (dockerless install) and added --threads=2 to the [Service] section:

ExecStart=/opt/bookwyrm/venv/bin/gunicorn bookwyrm.wsgi:application --threads=2 --bind 0.0.0.0:8000

Let's see how things go after that change.

Update: So it looks like this change has helped. The view from a Mastodon server now is:

  1. Mastodon user presses Follow. UI confirms that the Bookwyrm user is being followed.
  2. Mastodon user refreshes the page and sees that the follow has demoted to a follow request.
  3. Mastodon user waits 1-2 minutes and refreshes the page. The follow request has promoted back to a full follow again.

This dance seems to apply only when the first user on any given instance attempts to follow. Afterwards follow requests seem to be instant. (Needs further verification!)

I have seen the same behaviour when following a Takahē instance from a Mastodon server with AUTHORIZED_FETCH turn on, so I wonder if this is just normal for Mastodon.

@WesleyAC
Copy link
Member

WesleyAC commented Aug 7, 2023

fascinating! it's really strange that this has the effect it does.

on bookwyrm.social we're using gunicorn bookwyrm.wsgi:application --workers 12 --threads 10 --preload --max-requests 100 --max-requests-jitter 200 --bind 0.0.0.0:8000 right now, so making it default seems fine, although it does seem kinda like it's papering over what must be a real bug somewhere…

@WesleyAC
Copy link
Member

WesleyAC commented Aug 7, 2023

Ah, reading the stack overflow post you linked the deadlock makes sense, and the real fix is #2717, presumably the bookwyrm/views/follow.py ones.

@hughrun
Copy link
Contributor

hughrun commented Aug 7, 2023

This is also making me wonder if we should always send outgoing GET requests to webfinger and external ids as signed requests from bookwyrm.instance.user. That would remove a whole unnecessary request loop from this process. Increasingly signed GET requests are becoming more common in the fediverse.

@futzle
Copy link
Author

futzle commented Aug 8, 2023

Agree with everyone that it looks like there's something being unexpectedly re-entrant, and having strictly one thread is leading to something timing out because there's no thread left to handle the inner request. #2717 looks like a place to spend some effort.

After a day, with the extra --threads on Outside of a Dog, everything is still behaving. This seems to be the smallest fix sufficient to fix my reported issue.

@hughrun hughrun added deployment and removed help wanted Extra attention is needed labels Aug 11, 2023
hughrun added a commit to hughrun/bookwyrm that referenced this issue Aug 19, 2023
Various issues have been raised regarding requests that seem to be dropped.
The real fix for this in bookwyrm-social#2717 - "Stop making HTTP requests in views"

This commit is a quick fix in some circumstances - particularly bookwyrm-social#2844 - signed GET requests failing in production.
It adds an extra thread to Gunicorn to make the default configuration multi-threaded.
This will now match the multi-threaded nature of the development environment, which is why we have not been able to replicate these errors in dev.
@quisquous
Copy link

For what it's worth, I had msub2 add this on books.theunseen.city (on 0.6.4) and I am still having the same issue of being unable to follow my account (weirder.earth -> books.theunseen.city, one example the same as eldang above).

@futzle
Copy link
Author

futzle commented Jan 5, 2024

A user on Mastodon 4.2.3 instance eigenmagic.net is encountering this issue following a user on Bookwyrm 0.6.3 instance outside.ofa.dog, even with the --threads patch. Still seems there's an issue.

Edit: I don't have access to the eigenmagic.net logs but the logs on outside.ofa.dog are just the same as above.

@hughrun
Copy link
Contributor

hughrun commented Nov 29, 2024

This should be fixed by #3434

@hughrun hughrun closed this as completed Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working deployment
Projects
None yet
Development

No branches or pull requests

7 participants