Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gw): Ipfs-Gateway-Mode: path|trustless #495

Closed
wants to merge 1 commit into from

Conversation

lidel
Copy link
Member

@lidel lidel commented Oct 20, 2023

This PR adds boxo/gateway support for an opt-in HTTP header that CLI tools like CURL can send to disable browser-specific redirect to subdomain.

As suggested by @markg85 in https://curl.se/mail/lib-2023-10/0038.html
An IPIP and gateway conformance tests will follow.

Concerns

It assumes every HTTP Cache will be aware of user opt-in, and that is not the case.
HTTP caching is complex, there are many implementations, only a small of HTTP caching works reliably across vendors.

What happens when HTTP cache in front of gateway (CDN, nginx, loadbalancer etc) caches response produced for client with Ipfs-Gateway-Mode and then returns it for clients that did not request with Ipfs-Gateway-Mode?

Suggestions welcome, but unless we resolve below, the Ipfs-Gateway-Mode has no future.

Denial of Service

Many websites require Origin isolation and URL root to be at / and not /ip*s/name/.

A malicious actor could request popular websites over and over again with Ipfs-Gateway-Mode: path to force invalid payload to be placed in cache, effectively breaking them for other users.

Origin Isolation breakage and reveal of user secrets

A malicious actor could request /ipfs/cid/malicious-payload.html and /ipns/wallet.example.com with Ipfs-Gateway-Mode: path, and both responses are cached by middleware/CDN in front of a gateway.

Users who open /ipns/wallet.example.comwould get a cached response that does not redirect them to gateway, allowing/ipfs/cid/malicious-payload.html` (which is now in the same shared origin) to read all cookies, and private keys from local storage etc.

Tricking user into opening /ipfs/cid/malicious-payload.html will enable exfiltration of secrets.

Explored mitigations

  • 🔴 Force content-disposition: attachement on responses when Ipfs-Gateway-Mode header is set, this way browser will never render such payload.

    • This fixes secret leak, but makes denial of service even easier – cached response with content-disposition: attachement will NEVER render.
  • 🟠 return Vary header to indicate which other HTTP headers should be used in caching decisions

    • This works on paper, does not work in practice. Vary header history here is a good primer on the problem space. tldr: we could return Vary: Ipfs-Gateway-Mode but can't assume this works reliably across the stack, which means we would be gambling with security of end user.

TODO

  • Understand risks and mitigations. Decide if this is feasible at all.
  • Write IPIP
  • Gateway conformance

@codecov
Copy link

codecov bot commented Oct 20, 2023

Codecov Report

Merging #495 (c28eb8d) into main (40fb162) will increase coverage by 0.04%.
Report is 23 commits behind head on main.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #495      +/-   ##
==========================================
+ Coverage   65.90%   65.94%   +0.04%     
==========================================
  Files         205      206       +1     
  Lines       25185    25385     +200     
==========================================
+ Hits        16598    16740     +142     
- Misses       7125     7178      +53     
- Partials     1462     1467       +5     
Files Coverage Δ
gateway/handler.go 77.48% <100.00%> (+0.11%) ⬆️
gateway/hostname.go 73.63% <100.00%> (+0.24%) ⬆️

... and 13 files with indirect coverage changes

gateway/hostname.go Outdated Show resolved Hide resolved
This is opt-in HTTP header that CLI tools like CURL can send to disable
browser-specific redirect to subdomain and/or force trustless mode,
which errors instead of returning deserialized data.

Context: https://curl.se/mail/lib-2023-10/0038.html
An IPIP and gateway conformance tests will follow.
@lidel lidel force-pushed the feat/gateway-mode-header branch from 3c40238 to c28eb8d Compare October 29, 2023 18:27
@lidel lidel changed the title feat(gw): Ipfs-Gateway-Mode: path feat(gw): Ipfs-Gateway-Mode: path|trustless Oct 29, 2023
@lidel
Copy link
Member Author

lidel commented Oct 29, 2023

@markg85 I started writing tests for this, but quickly identified catastrophic security problems related to how opt-in Ipfs-Gateway-Mode could impact HTTP caching. Details at the top of this PR, in "Concerns" section.

The way I see it, this idea is dead on arrival, unless someone smarter proposes a reliable fix that works across HTTP implementations and middlewares and does not gamble with security of end user.

@lidel lidel added need/analysis Needs further analysis before proceeding status/blocked Unable to be worked further until needs are met labels Oct 29, 2023
@markg85
Copy link

markg85 commented Oct 30, 2023

Oww.

I'm glad you understand the implications and how to even test for them @lidel
I don't quite get how either can occur or be abused but I just take your word for it that they can be :)

Still, just for my learning experience. Could you elaborate on how the denial of service can happen exactly. The example you posted gives me a hint but it doesn't quite snap into place for me just yet. I'd be curious to know how you can, for example, abuse this to make ipfs.tech (full url: ipns://ipfs.tech) inaccessible?

@lidel
Copy link
Member Author

lidel commented Oct 30, 2023

@markg85

  1. Bad actor keeps requesting https://example.net/ipfs/cid with Ipfs-Gateway-Mode: path

    • Could be curl, could be any website user visited (this can be executed with fetch API) – all you need is to trick people into clicking into a link that is prewarming one of HTTP caches with invalid response.
    • /ipfs/cid response is immutable, returned with cache-control: public, max-age=29030400, immutable
  2. Path response is cached. This can happen at every HTTP hop.

    • HTTP service at example.net has a CDN in front of Kubo, it caches the path response
    • A web browser caches too
  3. Result of every permutation of the above is the same: a regular user opens https://example.net/ipfs/cid in browser (without special header) and gets "path" response from cache directly from the root origin at https://example.net/ipfs/cid instead of redirect to https://cid.ipfs.example.net, breaking origin isolation.

    • bypassing origin isolation is the biggest security issue here. undoing last 5 yea
    • additional bug: if website is written in a way that requires URL root to be at /, nothing works because relative paths are invalid. – cached path version of the website is unusable (JS fails to load, missing content etc)
    • additional bug: if we try to avoid origin escape, and set content-disposition: attachement on responses for Ipfs-Gateway-Mode: path that effectively makes it impossible to render HTML, and website is unreachable (end user is getting Save As dialog instea)
    • additional bug: if Ipfs-Gateway-Mode: trustless was requested instead of Ipfs-Gateway-Mode: path, and the requested content type was not trustless, then gateway returned response is an error informing deserialized responses are not supported. It may not have a valid Cache-Control, but might be cached by some CDN, just like Amazon Cloudfront in front of cid.contact is caching 404s (fix: lower 404 ttl to decrease end user failures ipni/storetheindex#1344)

Amount of issues is too high, and workarounds introduce own risks and bugs.
This is a red flag that should everyone a good pause before pushing this any further.

I am happy to revisit if someone proposes a comprehensive way of dealing with issues listed above,
but I don't believe in spending time on this myself.

@markg85
Copy link

markg85 commented Oct 31, 2023

@lidel ahh, now it makes more sense, thank you for elaborating!

The redirect logic is in the gateway itself. The service in front of it (nginx, cdn, etc...) that does the caching has no way of knowing that the request is a redirect (subdomain) or path (no redirect) request and thus the issues you mention can pop up..

Ouch.
I have no clue about a possible solution yet. I'm open to ideas.

@BigLep BigLep mentioned this pull request Nov 9, 2023
11 tasks
@lidel
Copy link
Member Author

lidel commented Nov 9, 2023

Closing for now. If anyone reading this in the future has idea that mitigates the risks, feel free to open a new issue in https://github.com/ipfs/specs

@lidel lidel closed this Nov 9, 2023
@lidel lidel deleted the feat/gateway-mode-header branch November 9, 2023 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/analysis Needs further analysis before proceeding skip/changelog status/blocked Unable to be worked further until needs are met
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants