Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addressing HTTP servers over Unix domain sockets #577

Closed
rkjnsn opened this issue Feb 6, 2021 · 66 comments
Closed

Addressing HTTP servers over Unix domain sockets #577

rkjnsn opened this issue Feb 6, 2021 · 66 comments

Comments

@rkjnsn
Copy link

rkjnsn commented Feb 6, 2021

It is often desirable to run various HTTP servers that are only locally connectable. These could be local daemons that expose an HTTP API and/or web GUI, a local dev instance of a web server, et cetera.

For these use cases, using Unix domain sockets provides two major advantages over TCP on localhost:

  1. Namespacing. If two users on a system are running the same service, TCP requires them both to pick, configure, and remember different port numbers. With Unix domain sockets, each socket can live in the respective user's runtime directory and be named after the service.
  2. Access control. Even if the service is diligent only to bind to localhost, TCP still allows any (non-sandboxed) process or user on the machine to connect. Any access control has to be implemented by the service itself, which often involves implementing (hopefully with sufficient security) its own password authentication mechanism. Unix domain sockets, on the other hand, can take advantage of the access control functionality provided by the filesystem, and thus can easily be restricted to a single user or set of users. In the event that a service wants to allows multiple users to connect and discriminate between them, several operating systems provide a means of querying the UID of the connecting process, again without requiring it's own authentication scheme.

Indeed, due to these advantages, many servers/services already provide options for listening via a Unix domain socket rather a local TCP port. Unfortunately, there is not currently an agreed-upon way to address such a service in a URL. As a result, clients who choose to support it end up creating there own bespoke approach (e.g., a special command-line flag, or a custom URL format), while others choose not to support it so as not to bring their URL parsing out-of-spec (among other potential concerns).

Here are some of the various URL formats I've seen used or suggested:

  • Transport only: unix:/path/to/socket.sock. This lacks both the protocol and resource path, so it can only be used for clients that already know they'll be speaking to a specific HTTP API, and is not generally usable.
  • HTTP with socket path as the port: http://localhost:[/path/to/socket.sock]/resource. Only allowed when host is localhost. Paths containing ] could either be disallowed or URL encoded.
  • Composite scheme with socket path as URL-encoded host: http+unix://%2Fpath%2Fto%2Fsocket.sock/resource. Distinct scheme allows existing http URL parsing to stay the same. URL encoding reduces read- and type-ability.
  • Combining ideas from the previous two: http+unix://[/path/to/socket.sock]/resource or just http://[/path/to/socket.sock]/resource. (The latter would require using the leading / of the socket path to disambiguate from an IPv6 address.)

References:
Archived Google+ post suggesting the socket-as-port approach:
https://web.archive.org/web/20190321081447/https://plus.google.com/110699958808389605834/posts/DyoJ6W6ufET
My request for this functionality if Firefox, which sent me here:
https://bugzilla.mozilla.org/show_bug.cgi?id=1688774
Some previous discussion that was linked in the Firefox bug:
https://daniel.haxx.se/blog/2008/04/14/http-over-unix-domain-sockets/
https://bugs.chromium.org/p/chromium/issues/detail?id=451721

@annevk
Copy link
Member

annevk commented Feb 6, 2021

It seems you don't need just addressing for this, but some kind of protocol as well. I recommend using https://wicg.io/ to see if there's interest to turn this into something more concrete.

@rkjnsn
Copy link
Author

rkjnsn commented Feb 6, 2021

I'm not sure I understand why any additional protocol would be necessary. It's just HTTP over a stream socket. The server accepts connections and speaks HTTP just like it would for a TCP socket. Indeed, I can set up such a server today, and it works fine provided that the client provides a way to specify the socket, e.g., curl --unix-socket /path/to/socket.sock http://localhost/resource.

@avakar
Copy link

avakar commented Jun 3, 2021

I don't even understand how this is not a thing yet. Especially now that Windows started supporting AF_UNIX sockets natively, it seems to be the best, cross-platform way to connect web and native apps without consuming a TCP port.

@annevk
Copy link
Member

annevk commented Oct 20, 2021

Let me take a step back, what exactly is the ask from the URL Standard here?

@rkjnsn
Copy link
Author

rkjnsn commented Oct 23, 2021

The ask is for the URL standard to specify a syntax for referring to a page served via HTTP over a UNIX domain socket. Currently, applications that want to support connecting to an HTTP service have to pick from one of the following three:

  1. Provide a bespoke mechanism for specifying the server's socket outside of the URL, such as curl's --unix-socket command-line argument.
  2. Accept a custom URL format outside of the URL standard for addressing resources served via HTTP over UNIX domain socket.
  3. Forgo the functionality altogether if 1 is impractical and 2 is undesired.

None of these are ideal. Deciding on a standardized URL syntax allows different implementations to implement the functionality in a common, standards-compliant way.

@annevk
Copy link
Member

annevk commented Oct 25, 2021

I see, https://wicg.io/ is the place for that. The URL standard defines the generic syntax. If you want to define the syntax for a particular URL scheme as well as behavior, you would do that in something that builds upon the URL standard. E.g., https://fetch.spec.whatwg.org/#data-urls for data: URLs.

@annevk annevk closed this as completed Oct 25, 2021
@rkjnsn
Copy link
Author

rkjnsn commented Oct 25, 2021

Let me rephrase: the specific ask for the URL standard is to provide an allowance in the URL syntax for specifying a UNIX domain socket, either in lieu of the port (e.g., http://localhost:[/path/to/socket.sock]/resource) or in lieu of the hostname (e.g., http://[/path/to/socket.sock]/resource), both of which are currently invalid according to the URL standard.

@annevk
Copy link
Member

annevk commented Oct 26, 2021

I recommend using something like unix:/path/to/socket.sock?url=http://localhost/resource. We can't change the URL syntax for each new protocol that comes along.

@cyanogilvie
Copy link

It's the same protocol over a stream socket, just a different address (ie. authority part). Ok, so it's a different protocol in the sense of IP, but so are IPPROTO_IP and IPPROTO_IPV6, and the URL standard doesn't treat those as different. The relevant comparison I think are address families for stream sockets, like AF_INET, AF_INET6 and AF_UNIX. Once the stream socket has been established (as specified by the authority part of the URL), HTTP software shouldn't care or even know how the stream is transported.

Most invented, non-standard approaches for HTTP-over-unix-sockets seem to gravitate to something like a different scheme (since the authority part can't really be disambiguated from a hostname if relative socket paths are allowed from what I can see), like http+unix or https+unix, and then percent-encoding the socket into the authority part, and then everything works naturally from there from what I can see.

I've also seen (and used) enclosing the socket path in [] in the authority part and keeping the scheme as http or https, but I think that namespace clashes with IPv6 style numeric addresses like [::1]:80. RFC 3986 (in section 3.2.2) kind of leaves space for this by anticipating future formats within the [], and providing a version prefix to disambiguate them. Overall I like this approach the best (it extends into the error space so it doesn't change the interpretation of any valid existing URL, lives in an extension space envisioned by the standard, minimally extends just the appropriate part of the standard (authority part), keeps the schemes http and https to mean "this is a resource we talk to this authority using the http(s) protocol for", and so preserves compatibility for software that uses the scheme to know what protocol to speak with the authority over the socket.

@annevk
Copy link
Member

annevk commented Nov 4, 2021

Changing the syntax of URLs is not really something we're willing to do. That has a substantive cost on the overall ecosystem. The benefits would have to be tremendous.

@michael-o
Copy link

michael-o commented Nov 4, 2021

Syntax in mod_proxy:

In 2.4.7 and later, support for using a Unix Domain Socket is available by using a target which prepends unix:/path/lis.sock|. For example, to proxy HTTP and target the UDS at /home/www.socket, you would use unix:/home/www.socket|http://localhost/whatever/.

@karwa
Copy link
Contributor

karwa commented Nov 12, 2021

The strongest argument I can think of for this is: http(s) URLs have special parsing quirks which don't apply if the scheme is http+unix. So for a perfect 1:1 behaviour match, UDSs would need to use an actual http URL, not a custom scheme (similar to IP addresses).

That said, I'm also not a fan of adding yet another kind of host (file paths). My preference would be to use a combination of:

  • Fake hostname (localhost, example, test and invalid are all reserved and will never be allowed as a TLD, so something like uds.localhost should work), and
  • Socket address in the fragment (HTTP clients should strip it before sending the request anyway)
http://uds.localhost/some/path?some=query#/path/to/socket.sock

This is a perfectly valid HTTP URL, and should be capable of representing any HTTP request target.

Alternatively, you could try to get uds or socket as reserved TLDs, but I'm not sure how you'd go about doing that.

(Note: this would also mean that all UDS URLs have the same origin, although that could be remedied by adding a discriminator to the fake hostname to make your own zones of trust, e.g. 123.uds.localhost)

@rkjnsn
Copy link
Author

rkjnsn commented Nov 12, 2021

I'm not sure using the fragment is really tenable for these use cases (and local web dev, especially). Many web applications use the fragment for their own purposes in JavaScript, whereas the host (at least it my experience) tends to be handled more opaquely.

What would be the main drawback for allowing additional characters within [] for the host portion of an HTTP URL?

@karwa
Copy link
Contributor

karwa commented Nov 12, 2021

Ah yes, you're right, it wouldn't work for local web development. I was thinking more about generic HTTP servers.

The main drawbacks IMO are:

  • Complexity. This standard has a hard-enough time trying to document existing browser behaviour without inventing new things. Then again, there is a reasonable counter-argument that URLs shouldn't have to stay frozen while other aspects of technology and the web evolve to meet new use-cases. There is a counter-counter argument that URLs are in a particularly sorry state compared to most other web technologies. Perhaps this is something for the future, once all browsers conform to this standard and things have stabilised a bit?

  • Possible loss of validation for IPv6 addresses. Unless we want to get in to the business of validating local system paths (and I'm quite sure nobody is thrilled by that idea) we would basically have to accept any non-empty string within [] in the host portion of HTTP URLs. How do we know http://[::::foo]/some/path doesn't refer to a valid path on some system somewhere?

@cyanogilvie
Copy link

cyanogilvie commented Nov 13, 2021

Yes, I think the place for the UDS socket is in the authority portion - that's the bit that has the responsibility for describing the endpoint of the stream socket to talk to for this resource. Putting it elsewhere feels like an abuse and likely to cause unforeseen problems (HTTP client software will certainly have the host portion of the URL available in the portion of the code that establishes the stream socket, but may not have the fragment).

I think the namespace collision with IPv6 literals and syntax validation for UDS paths can be solved by:

  • Reusing the syntax for the path portion of the URI: "/" is a separator, path elements must be percent encoded.
  • Socket paths must be absolute (start with "/" or "~"). This distinguishes them from IPv6 literals, and should be the case anyway (what would a relative path be relative to? No similar relative resolution for hostnames exists in the standard).
  • Possibly using a version prefix as envisioned by RFC 3986, putting it within the syntax anticipated in that standard, something like: http://[v1.uds:/tmp/mysock]/foo/bar.

It's up to the host to decode and translate the path into whatever native scheme that OS uses (just as it is for the path portion of the URI).

For me the motivation for supporting HTTP over UDS goes way beyond web browsers (and I would see that as a minor use case for this) - for better or worse HTTP has become a lingua franca protocol for anything that wants to communicate on the Internet (consider websockets for some of the forces that drive this), and that is increasingly machine to machine. For example: we run an online marketplace that serves about 10 million requests a day over HTTP (excluding static resources offloaded to a CDN), but each of those involve several HTTP interactions with other services to construct the response: Elasticsearch queries, S3 to fetch image sources that are resized, etc, a whole host of REST services for shipping estimates, geocoding, ratings and reviews, federated authentication providers etc. So, by volume, the overwhelming majority of HTTP requests our webservers are party to are between them and other servers, and aren't transporting web pages.

As the trend toward microservices and containerization continues this will only increase, and it's particularly there that I see HTTP-over-UDS being useful:

  • Communication over UDS is materially faster and lower latency than over the loopback interface because a lot of the complexities in the network stack can be skipped - packet filtering and transformation, TCP, etc. The loopback interface doesn't have network latency but it still has all these other things. Local sockets (UDS) are more or less just buffers managed by the kernel. This starts to really matter to page response times when generating the page involves many interactions with microservices.
  • The namespace for sockets is hierarchical for UDS rather than flat for ports on localhost, so there is a natural way to scope the namespace for each microservice, and which is self-describing. Compare http://localhost:1234/ with http://[/sockets/session/addrs]/ for the address of a microservice providing the address management service for the current session user.

The other trend is for UIs to be implemented in HTML rather than some OS-native widget set (Android, iOS, GTK, QT, MacOS native controls, Windows native controls, etc), even when the application is entirely local on the user's device. There are very good reasons for this:

  • HTML+Javascript is portable, greatly reducing the cost to develop the application if it has to run across platforms.
  • HTML+Javascript is much richer and more capable than those native widget sets in the types of UIs they can implement.
  • Essentially every developer these days already knows HTML and Javascript.
  • Gone are the days when users expect native OS controls. These days they expect web application style interfaces, since that's the majority of what they're exposed to (gmail, various cloud based office applications, twitter, etc.)

In this use case the hierarchical namespace issue is important and addresses a major downside to this pattern - choosing a port from the flat, system-wide shared namespace (ok, so the listening socket can specify 0 and have the OS pick a random unused port on some systems, but that's a bit ugly). Much nicer to use ~/.sockets/<app>/<pid>, and more discoverable. Another reason to use UDS in this case is that the user for the client side of the socket can be obtained from the OS in a way that only trusts the OS, solving the other issue with this pattern - knowing which user we're interacting with. If these issues were solved by HTTP-over-UDS, do you think something like Prusaslicer would use that (HTML, Javascript, webGL) rather than wxWidgets for its UI portability requirements? That would make porting to mobile devices like tablets much easier too.

Finally, consider things like headless Chrome in an automated CI/CD pipeline - the software managing the tests being run on the deployment candidate version could start a number of headless chrome instances and run tests in parallel, easily addressing the websocket each provides with a UDS path like /tmp/chrome/<pid> rather than somehow managing port assignments.

The tech already exists to make these obvious next steps in application provisioning and inter-service communication happen (even Windows supports Local sockets aka UDS), and the scope of the change for existing HTTP client software should be small and of limited scope (URL parsing, name resolution and stream socket establishment steps) but it can't happen unless there is a standardised way to address these sockets.

@annevk
Copy link
Member

annevk commented Nov 13, 2021

What exactly is wrong with #577 (comment)? @karwa uds.localhost can resolve locally.

@mnot
Copy link
Member

mnot commented Nov 15, 2021

Alternatively, you could try to get uds or socket as reserved TLDs, but I'm not sure how you'd go about doing that.

You ask the IETF, just like .onion did. Admittedly, there are some politics involved, but it's possible, and this is a pretty clearly technical use case. The backstop would be to use a subdomain of .arpa.

Personally, I'd go with something like:

http://%2Ftmp%2Fmysock.uds/foo/bar

Yes, the escaping is ugly, but it's much cleaner than overloading IPV6 in URLs. Alternatively, you might be able to get away with:

http://tmp.mysock.uds/foo/bar

@agowa
Copy link

agowa commented Apr 23, 2022

@mnot any update on this? Was it implemented? Should this ticket be reopened? I'm also interested in this.

@mnot
Copy link
Member

mnot commented Apr 23, 2022

I just left a comment with some context; I don't know that anything else has happened.

@thx1111
Copy link

thx1111 commented Jul 13, 2022

I haven't read anything here that seems to justify breaking with the familiar pattern, "<protocol>://<domain>/<filepath>" or injecting a lot of special characters into the URL, or mimicking an IPv6 address. The protocol is simply "http". The domain is right there in the name, "Unix Domain Socket". Like any other top level domain - net, com, org - the domain is simply "unix". I don't know any reason that a web browser application cannot parse the domain from a URL, recognize a nonstandard domain name, and invoke a special handler for a non-network socket. The difficulty seems to be in distinguishing the path to the socket from the path to the resource file.

The "HTTP with socket path as the port" option, above, makes the most sense. And since a special handler must already be invoked for this "unix domain", I expect that colons - ":" - can continue to be used as the "port" separator for the socket path.

Altogether, that suggests a straightforward URL, as in: "http://unix:/var/run/server/ht.socket:/path/to/resource.html".

Is there any reason that those repeating ":/" character sequences would pose a problem in a URL?

This approach would not impose any limitation on the use of ":" in the resource path name, since a "unix domain" must be followed by a socket path, and that path will always be delimited by ":/". Any subsequent colons must then be part of the resource path name.

And, of course, this URL format still supports specifying any arbitrary protocol, served through a unix domain socket. And there is nothing redundant or misleading in the URL, as would be the case with any format requiring the name "localhost" or involving special parameter passing.

@michael-o
Copy link

http+uds:///path/to/socket?

@rkjnsn
Copy link
Author

rkjnsn commented Jul 13, 2022

@michael-o, that doesn't provide any means to specify the resource path, as it is putting the path to the socket where the resource path should go.

@randomstuff
Copy link

randomstuff commented Nov 27, 2023

http://host.example.com.uds.localhost/path/to/socket//path/to/resource

What you absolutely don't want is the ability for any web server in the wild to use your browser to issue arbitrary HTTP requests to arbitrary Unix sockets.

It is already quite difficult for people to grasp the notion that LAN-only services and localhost-ony services can be attacked by remote web servers (CSRF, DNS rebinding attacks to LAN services or localhost-services). If a web browser, were to allow arbitrary websites to issue HTTP request to arbitrary UNIX sockets, this would open up a wide range of attack opportunity (eg. using DNS rebinding attacks to attack UNIX-socket bound Docker servers) including attacks based on protocol-confusion.

If you wanted such a feature to be mostly safe, you would have to actively opt-in:

  • either by having the user actively map a Unix-socket into a HTTP domain;
  • or by having a default location for UNIX sockets which wants to be mapped to a domain name / URI (eg. /run/user/{pid}/published/80/XXX → http://XXX).

Firefox currently allows to use a SOCKS proxy over UNIX socket (including multiple suchs proxies when using FoxyProxy). It would be possible to have a Unix-bound SOCKS proxy which would resolve some domain names to Unix socket.

@agowa
Copy link

agowa commented Dec 2, 2023

@randomstuff only because it is addressable doesn't mean it is reachable. And after all websites currently can already contain "file:///" urls or similar.

@karwa
Copy link
Contributor

karwa commented Dec 2, 2023

You don't really want to put the UDS path in the URL's path, because somebody could write:

<a href="/help">...</a>

And that would overwrite the path to the UDS, meaning a broken link.

Instead, you really want this to be part of the hostname. Hostnames are intrinsically abstract already, so there is no fundamental reason they can't resolve to a local socket. In other words, @randomstuff 's project is doing the conceptually correct thing by providing a mapping from hostnames to sockets.

And perhaps most importantly, it shows that this need can be met without changing the URL standard.

@thx1111
Copy link

thx1111 commented Dec 10, 2023

Reading back through this discussion, it has not at all been established that there is a consensus as to "where" the underlying issue should lie, and so, any "solution" offered can appear to simply "miss the point", depending upon your point of view. I find myself back-and-forth about the various approaches suggested, including my own.

I can summarize at least four alternatives proposed here to the issue of, to generalize, "Addressing Unix Domain Sockets".

  1. RFC 3986 "Uniform Resource Identifier (URI): Generic Syntax" must be modified to allow addressing unix domain sockets.

  2. The URI Shemes in BCP 35/RFC 7595 "Guidelines and Registration Procedures for URI Schemes" must define a new URI Scheme and Owner which specifically supports unix domain socket addressing.
    Review here: https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml

  3. The existing http/https schemes defined in RFC 8615 "Well-Known Uniform Resource Identifiers (URIs)" must be expanded to explicitly support addressing unix domain sockets.

  4. Ignore the URI standard RFCs and just write or modify an html display client to support unix domain socket addressing.

Without first saying which approach we are thinking about, the conversation can become kind of silly, since any solution which "works", works. Otherwise, it may be that I both enjoy, and cringe at, "bike shedding" as much as anyone else.

@randomstuff
Copy link

randomstuff commented Dec 10, 2023

For context about the pitfalls of stuffing/smuggling a Unix socket path in a HTTP URI, the Node.js Requests and got libraries would allow stuffing a Unix domain socket path in a HTTP URI like so: http://unix:/var/run/docker.sock:/containers/json. It turned out this could be exploited by a remote web server to target a local Unix domain socket through a HTTP redirect. In got, this feature is now disabled by default and HTTP redirects to Unix sockets are now disabled.

I would think that the ability to address arbitrary Unix domain sockets in HTTP(S) URIs is fraught with peril. If this were part of the URI standards, client applications and libraries would be expected to implement this feature and this would certainly end up generating a lot of vulnerabilities such as CVE-2022-33987: attacks on arbitrary Unix domain socket application through malicious redirects or more generally through malicious URIs.

What might be useful is:

  • the ability for the user to map domain names to Unix domain sockets in client applications (not super user friendly);
  • maybe associating some domain name suffix to system-local bound services which are explicitly designed to be used this way (eg. *.user.alt for user services and *.system.alt for system services),
    • with some way for applications to expose themselves this way,
    • these domain names could be considered as secure contexts.

but this is really outside of the scope of the URL standard.

@kevincox
Copy link

kevincox commented Dec 11, 2023

While you have a good point it is sort of a shame to block UNIX sockets due to this. The same problems exist for local services, LAN servers (like routers) and even cloud VM metadata servers are open to vulnerabilities due to this. Really every redirect target should be carefully considered, and every DNS lookup should have the resulting IP treated with scrutiny. Unfortunately that isn't the world that we live in, developers are careless and many (most?) popular HTTP libraries don't even expose the primitives to do this. I am not aware of even a single library that prevents this by default. In practice things like Origin headers and CORS are used to ensure that requests are coming from the right place and not tricked redirections. These hacks have worked OK, and particularly vulnerable services like browsers are more strict (such as preventing public sites from accessing your router's web UI in most cases).

However while this vulnerability is not specific to UNIX sockets it is maybe wise to avoid adding more surfaces that can be accessed via this common issue.

@kevincox
Copy link

the ability for the user to map domain names to Unix domain sockets in client applications

Isn't this just security through obscurity? Or is the idea that the service hosting the domain socket needs to opt-in. Presumably because it has some sort of heuristics to block misdirected requests.

@randomstuff
Copy link

randomstuff commented Dec 11, 2023

Or is the idea that the service hosting the domain socket needs to opt-in.

Yes.

One motivation of OP was access control:

Access control. Even if the service is diligent only to bind to localhost, TCP still allows any (non-sandboxed) process or user on the machine to connect. Any access control has to be implemented by the service itself, which often involves implementing (hopefully with sufficient security) its own password authentication mechanism.

However, in order to increase the security of some local application (reduction of the attack surface, rely on implicit authentication through UID and filesystem access control), this might end-up:

  • increasing the attack surface of already existing services;
  • undermining the implicit authentication through UID and filesystem access control of already existing services (confused deputy problem).

Some opt-in mechanism could mitigate these issues to some extent.

@kevincox
Copy link

While this may increase the attack surface of some services it will also decrease the attack surface of others as the original message explains. So it is important to weight the benefits as well as consider possible mitigations that can make the tradeoffs more favourable.

@thx1111
Copy link

thx1111 commented Dec 11, 2023

Given the ambiguity in addressing unix domain sockets, I am still inclined to fault the basic RFC 3986. So, here is a brief review, several rants, and another suggestion for unix domain socket addressing, simply using the square bracket "hack".

Assuming the general concept of "Uniform Resource Identifier" from Section 1.1.3., the basic structure is defined in Section 3 as having 5 components: scheme, authority, path, query, and fragment. First off, then, what type of URI component is a unix domain socket (UDS) address?

The original context here is "HTTP servers", and "http" is, itself, a type of "scheme". So, UDS as "scheme" is not my first choice.

Now, RFC 3986 uses the term "resource" without much constraint, saying 'This specification does not limit the scope of what might be a resource; rather, the term "resource" is used in a general sense for whatever might be identified by a URI.' Effectively, a "resource" is whatever the user wants it to be. Is a UDS a "resource" itself? For the purpose here, "no". The "resource" implied by an HTTP server is some other specific data delivered using HTTP.

Then, is a UDS a type of "path", "query", or "fragment"?

From Section 3.3, "The path component contains data, usually organized in hierarchical form, that, along with data in the non-hierarchical query component (Section 3.4), serves to identify a resource within the scope of the URI's scheme and naming authority (if any)." Since the UDS is not the "resource", and, since the "path" identifies a "resource", then the UDS cannot be a "path".

Similarly, from Sections 3.4. Query and 3.5 Fragment, both of these components are also references to the "resource". So the UDS is also not either a "query" or a "fragment".

And that leads to the inference that the UDS must be a kind of "authority". RFC 3986 actually subdivides the "authority" component itself into three parts, in Section 3.2.:

 authority   = [ userinfo "@" ] host [ ":" port ]

And here, the same analysis can be applied. Is the UDS a type of "userinfo"? Section 3.2.1. says, "The userinfo subcomponent may consist of a user name and, optionally, scheme-specific information about how to gain authorization to access the resource." Hmm - "scheme-specific information about how to gain authorization to access the resource" - "how to gain authorization". Does the UDS tell "how to gain authorization"? Sort of - maybe - not really - I'd say "no".

Is the UDS a type of "host"? From Section 3.2.2., "The host subcomponent of authority is identified by an IP literal encapsulated within square brackets, an IPv4 address in dotted- decimal form, or a registered name." Is, then, the UDS a type of "IP literal", "IPv4 address", or a "registered name"? Hmm - what is an "IP literal"? Again, from Section 3.2.2.:

 IP-literal = "[" ( IPv6address / IPvFuture  ) "]"

Since a UDS is not any of an "IPv6address / IPvFuture", an "Pv4 address", or a "registered name", then "no", a UDS is also not any type of "host".

And then, using RFC 3986, there is only one interpretation remaining. Is the UDS a type of "port"? From Section 3.2.3. Port:

 The port subcomponent of authority is designated by an optional port number in decimal following the
 host and delimited from it by a single colon (":") character.

  port        = *DIGIT

Well, clearly, and as has been mentioned previously in this discussion, the UDS is not a "DIGIT". And here is where I find fault with RFC 3986, in its limited scope when defining "port". Except that, Section 3.2.3. goes on to say, "The type of port designated by the port number (e.g., TCP, UDP, SCTP) is defined by the URI scheme." And that statement suggests asking "What sort of Communication Protocol is UDS?" Of course a UDS is not itself a kind of communication protocol, but the relationship should become apparent. It may be more illuminating to ask the converse, "What sort of Sockets are TCP, UDP, and SCTP?" And then, the Unix - in this case Linux - man pages offer some guidance.

 man 7 tcp:     tcp_socket = socket(AF_INET, SOCK_STREAM, 0);
 man 7 udp:     udp_socket = socket(AF_INET, SOCK_DGRAM, 0);
 man 7 sctp:    sctp_socket = socket(PF_INET, SOCK_STREAM, IPPROTO_SCTP);
                sctp_socket = socket(PF_INET, SOCK_SEQPACKET, IPPROTO_SCTP);

And generally, "What is a 'socket'"? In part:

 man 2 socket:
        Name            Purpose                         Man page
        AF_UNIX         Local communication             unix(7)
        AF_LOCAL        Synonym for AF_UNIX
        AF_INET         IPv4 Internet protocols         ip(7)

 HISTORY
        The  manifest  constants  used under 4.x BSD for protocol families are PF_UNIX, PF_INET, and so
        on, while AF_UNIX, AF_INET, and so on are used for address families.  However, already the BSD
        man page promises: "The protocol family generally is the same as the address family", and
        subsequent standards use  AF_*  everywhere.

and then:

 man 7 unix:    unix_socket = socket(AF_UNIX, type, 0);

Here is my first rant about RFC 3986. The "port" component of the defined URI has presumed an Address Family, here implying AF_INET exclusively, along with what is a merely incidental association with a port "number". There is no explanation or justification given for this presumption.

Alternatively, it might be supposed that this presumption of an Address Family is an erroneous interpretation by the reader of RFC 3986. It may instead be supposed that the "port" component of the URI is simply a general concept to be associated with any Address Family which might be included from the list given from man(2)socket.

And so, I believe that this is the interpretation, while not "official", yet, that must be taken with RFC 3986.

Then, "What is the 'port' subcomponent of authority of an Address Family AF_UNIX socket?"

Here, man(7)unix tells us, "Traditionally, UNIX domain sockets can be either unnamed, or bound to a filesystem pathname (marked as being of type socket)." In our case, we are looking for a URI, so "unnamed" is not useful. Instead, the man page offers "a filesystem pathname". That seems clear enough.

Therefore, an RFC 3986 URI "port" for an AF_UNIX socket might also be interpreted as simply "a filesystem pathname", instead of exclusively as a number.

Allowing that, then the remaining problem only involves appropriate delimiters, to allow correctly parsing the resulting URI for the AF_UNIX "port".

Referring again to Section 2.2.:

      reserved    = gen-delims / sub-delims

      gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

      sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

Incidentally, it may be noted that this RFC 3986 list of delimiters is missing the percent "%", from Section 2.1 Percent-Encoding, and the set of White Space characters generally. The reader is now well into the realm of "inferring", "guessing", and "interpreting", instead of specifically "defining".

Here is my second rant about RFC 3986, related to the use of delimiters. The Section 3. URI syntax explicitly defines the ":" as separating the "scheme" from the "authority". Subsequently, in Section 3.2., it says 'The authority component is preceded by a double slash ("//") and is terminated by the next slash ("/"), question mark ("?"), or number sign ("#") character, or by the end of the URI.' Taken together, this double slash actually provides no information whatsoever in the URI and only serves to "poison" the parsing of the URI, by requiring the parser to distinguish potentially between ":///...", "://...", and ":/...". For instance, the "file" scheme, RFC 8089, supports optionally leaving out this useless "//" altogether. RFC 3986 offers no explanation or justification for this use the double slash "//". The delimiter might as well have been defined explicitly as "://". This makes any use of the slash "/" as a delimiter in the URI potentially problematic, where it is also used as an essential component of any unix "filesystem pathname", when referring to the proposed UDS AF_UNIX "port", as well as, already, referring to an actual "resource" by pathname.

A third rant regards Section 3.2.2 Host, which says:

 A host identified by an Internet Protocol literal address, version 6 [RFC3513] or later, is
 distinguished by enclosing the IP literal within square brackets ("[" and "]").  This is the only place
 where square bracket characters are allowed in the URI syntax.

The only reason that these square brackets are needed is because of the repeated and overloaded use of the colon ":" as a delimiter in the "authority", in Section 3.2 preceding the "port", and in Section 3.2.1, potentially subdividing the "userinfo". Considering that RFC 3513 defines the use of colon ":" as the field delimiter in an IPv6 address, this should have glaringly suggested that the same ":" would be a bad choice for a delimiter in the RFC 3986 "authority" component and subcomponents of the URI. And there are plenty of alternative characters to choose, from the small ASCII character set, for use as delimiters in the "authority".

The use of the square brackets, then, is a "hack", consequent of a bad choice for delimiter in the "authortiy" component of the URI. Be that as it may, suppose that the prohibition "This is the only place where square bracket characters are allowed in the URI syntax", is ignored. Then, this same "hack" can be applied equally to the unfortunate choice of the slash "/" as a delimiter within the URI syntax with respect to the "port" subcomponent of the "authority", as with the "host" subcomponent.

I propose now another alternative to addressing unix domain sockets. By example, using the square bracket "hack", the result would allow, for instance, all of:

http://:[/path/to/socket]/path/to/resource.html?...#...
http://localhost:[/path/to/socket]/path/to/resource.html...
http://[::1]:[/path/to/socket]/path/to/resource.html...
http://user:password@[::1]:[/path/to/socket]/path/to/resource.html...
http://unix:[/path/to/socket]/path/to/resource.html...

All of these examples otherwise strictly follow the RFC 3986 URI syntax.

That is the least intrusive "hack" to UDS addressing and merely extends an existing URI "hack". A "cleaner" revision to RFC 3986 would be to eliminate the use of either the colon ":" or the slash "/" as delimiters in the URI syntax delineating its components and subcomponents, except for the initial ":" separating the "scheme" and "authority". There are 11 other "sub-delims" defined in RFC 3986 that seem perfectly usable as delimiters in the URI "authority", which would obviate the need for using these square bracket "hacks" completely.

With reference to previous remarks about security issues, it may be noted that man(7)unix describes AF_UNIX as supporting communication "between processes on the same machine", so there would be no "remote access" possible, despite the http/https "scheme", if that constraint were followed. And, since the UDS "port" is just a Unix "filesystem pathname", there are many existing security measures available.

On the other hand, this suggested UDS AF_UNIX "port" addressing clearly does lend itself to replacing "localhost" with "some-remote-host", to access some UDS on, literally, a remote host. But then, any http/https "server" will be providing its own security measures, should it allow UDS addressing at all, so that's a different issue and not really a problem here. This does introduce another concept, access to a UDS by a local http/https server, as opposed to UDS access only by a local html display client.

There is still the question of whether the http/https schemes would need to be formally updated to acknowledge any kind of UDS AF_UNIX "port" addressing. Reading at RFC 9110, Sections 4.2.1. http URI Scheme and 4.2.2. https URI Scheme:

        The origin server for an "http[/https]" URI is identified by the authority component, which
        includes a host identifier ([URI], Section 3.2.2) and optional port number ([URI], Section
        3.2.3).

By my reading, "no". The http/https schemes simply refer to the RFC 3986 URI "optional port number" definition, and would therefore follow any update to RFC 3986 itself.

The much more difficult issue remains with any html display client, which must be taught to recognize any kind of UDS AF_UNIX "port" addressing. Again, strictly, that is a separate issue. But this does point-out that the proposal here implies that there are two distinct "solution" arenas to confront: first, RFC 3986 itself, and second, the various de facto standard html display clients extent.

The Node.js security issue mentioned by @randomstuff is - well - a Node.js security issue, as was mentioned. It's not a server security issue and has nothing to do with UDS AF_UNIX "port" addressing per se. Of course, that also doesn't mean that html display client security issues go away. It's just a separate problem - though, it's still a problem. It is interesting that this raises the question of security in the "reverse" direction, from a remote "server" potentially accessing a local "client resource", through a UDS.

That is not something inherent in the original concept of http client/server communication, but a consequence of allowing the "client" to potentially act, itself, as a kind of "server", using some client facility, as with javascript, to access a local resource. The security model, then, requires simply that the client be smart enough not to do "anything stupid" at the behest of the server. Ha!

@mnot
Copy link
Member

mnot commented Dec 11, 2023

Lots of different proposals have been made above:

  • Changing the URL syntax
  • Adding a new DNS TLD
  • Appending a suffix to the URL scheme
  • Defining a new URL scheme

Changing the URL syntax requires coming up with a solution for all URLs, not just HTTP. Backwards compatibility needs to be considered for a very large ecosystem, and incremental deployment needs to be considered. As Anne said above, these factors raise the bar considerably for any proposal, and so should be a last resort (there's currently an effort by IPv6 people to do a similar thing, and it's not going well for these reasons).

Creating a new TLD for one protocol isn't good architecture, and a lot of people are going to push back on it. Again, a proposal in this area is likely to hit friction from other, unrelated communities (in this case, DNS).

Appending a suffix to the URL scheme implies that the suffix makes sense for other URL schemes. This means that wider review and discussion will need to take place to get it adopted.

That makes defining a new URL scheme the approach that's most likely to succeed. Such a scheme could define itself to use an authority that is not grounded in DNS, so it could be something like:

httpu://tmp.mysock/path/to/resource?query&string

Defining it as a new scheme would also provide an opportunity to answer a lot of questions like "is HTTP/1 or HTTP/2 used"? "does it use TLS"? and so on.

But that's just my opinion.

If there's interest in solving this problem, I'd suggest that someone write a document outlining a proposal and bring it to the IETF HTTP WG - there are are larger diversity of HTTP implementers represented there that can provide feedback.

@thx1111
Copy link

thx1111 commented Dec 12, 2023

@mnot:

Creating a new TLD for one protocol isn't good architecture, and a lot of people are going to push back on it.

On reflection, I'm going to totally agree with that.

Changing the URL syntax requires coming up with a solution for all URLs, not just HTTP.
...
That makes defining a new URL scheme the approach that's most likely to succeed.

There is nothing in any of my, or several other, proposals that is specific to only the http/https "schemes", as the term is defined in RFC 3986. Again, RFC 8820, Section 2.1, "URI Schemes", strongly discourages the introduction of new "schemes".

I have suggest three alternatives for - to put it generally - Address Family "port" addressing.

Extending the overloaded use of the colon ":" delimiter:

 http://:/path/to/socket:/path/to/resource.html...
 http://user@[::1]:/path/to/socket:/path/to/resource.html...

Extending the square bracket hack:

 http://:[/path/to/socket]/path/to/resource.html...
 http://user@[::1]:[/path/to/socket]/path/to/resource.html...

Using alternate delimiters, eliminating the double slash "//", the square bracket hack "["..."]", and
the overloaded use of the colon ":" delimiter, as for instance:

 http:&/path/to/socket+/path/to/resource.html...
 http:user@::1&/path/to/socket+/path/to/resource.html...

More generally, any specific delimiter between RFC 3986 "authority" and "path" would solve the URI issue raised here. To illustrate, where RFC 3986 has defined:

      URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

      hier-part   = "//" authority path-abempty
                  / path-absolute
                  / path-rootless
                  / path-empty

      authority   = [ userinfo "@" ] host [ ":" port ]

This would instead become:

      URI = scheme ":" [ userinfo "@" ] host [ ":" port ] "your-favorite-delimiter-here" path-something [ "?" query ] [ "#" fragment ]

The essential problem for Address Family "port" addressing comes down to RFC 3986 failing to just define a specific delimiter between its "authority" and "path" components, or, stating this another way, failing to define a specific
delimiter which precedes its "path" component. And then, RFC 3986 struggles desperately to overcome this failure in Section 3.3. Path, explaining "The ABNF requires five separate rules to disambiguate these cases, only one of which will match the path substring within a given URI reference."

Section 3.3. even provides an unconvincing example of "path" while trying to "paper-over" this failure:

   A path consists of a sequence of path segments separated by a slash
   ("/") character.  A path is always defined for a URI, though the
   defined path may be empty (zero length).  Use of the slash character
   to indicate hierarchy is only required when a URI will be used as the
   context for relative references.  For example, the URI
   <mailto:[email protected]> has a path of "[email protected]", whereas
   the URI <foo://info.example.com?fred> has an empty path.

Why try to "shoehorn" mailto:[email protected] into an example of "path"? "[email protected]" looks like
a perfectly good example of 'userinfo "@" host' to me. There is no need to call it something else, attempting to justify the missing useless double slash "//", which otherwise requires "mailto://[email protected]".

@mnot
Copy link
Member

mnot commented Dec 12, 2023

Again, RFC 8820, Section 2.1, "URI Schemes", strongly discourages the introduction of new "schemes".

I wrote that RFC. That is not what Section 2.1 says.

@thx1111
Copy link

thx1111 commented Dec 12, 2023

Hmm - copying the text:
https://www.rfc-editor.org/rfc/rfc8820

Abstract
...  While it is common for schemes to further
delegate their substructure to the URI's owner, publishing independent standards that mandate particular
forms of substructure in URIs is often problematic.
...
2.1. URI Schemes
...
A Specification that defines substructure for URI schemes overall (e.g., a prefix or suffix for URI scheme
names) MUST do so by modifying [BCP35] (an exceptional circumstance).

and, https://www.rfc-editor.org/info/bcp35

Abstract
This document updates the guidelines and recommendations, as well as
the IANA registration processes, for the definition of Uniform
Resource Identifier (URI) schemes. It obsoletes RFC 4395.

Then, by "exceptional circumstance", you meant modifying, literally, the document BCP35 itself, and not the resulting list of registered "schemes" referencing BCP35? I stand corrected.

Still, there is the problem of modifying existing, or creating new, applications able to utilize any particular scheme. I don't expect that my web browser actually supports the currently 374 different registered schemes available. In fact, the trend has been for, for instance, web browsers to drop support for less commonly used schemes - no more gopher, ftp, or mailto - with some functionality being replaced by specialized scheme applications or by "groupware" suites.

I still don't agree that defining and registering a new scheme, exclusively to support html rendering from a local unix domain socket, is a good idea. Rather, that use case does serve to illuminate a deeper systemic fault in RFC 3986.

I did rather like gopher, though, ...

@agowa
Copy link

agowa commented Dec 14, 2023

@mot: A while ago I already worte to the IETF mailing lists about such a change, but they just forwarded me here. I don't remember all the details, as the whole thing started ages ago (ok, probably more like about one year), but I could try to look for these related mails.

You already have seen my initial suggestion in another ticket?
#778 (comment)

It would be backwards compatible by allowing for default values to be omitted. It would work with everything that currently uses the URL Schema (in a standards compliant way, at least). And it would also allow for the very verbose way of specifying all the protocols down to the wire....

@mnot
Copy link
Member

mnot commented Dec 20, 2023

@thx1111:

Then, by "exceptional circumstance", you meant modifying, literally, the document BCP35 itself, and not the resulting list of registered "schemes" referencing BCP35? I stand corrected.

Understand that RFC 8820 is best current practice for applications that use HTTP (what some people call "HTTP APIs" or "REST APIs") -- it's saying that it's exceptional that a one of them would require a new scheme.

Still, there is the problem of modifying existing, or creating new, applications able to utilize any particular scheme. I don't expect that my web browser actually supports the currently 374 different registered schemes available. In fact, the trend has been for, for instance, web browsers to drop support for less commonly used schemes - no more gopher, ftp, or mailto - with some functionality being replaced by specialized scheme applications or by "groupware" suites.

Browsers are going to have to change if they want to support anything that happens here, so that isn't a decisive factor regarding syntax.

I still don't agree that defining and registering a new scheme, exclusively to support html rendering from a local unix domain socket, is a good idea.

HTTP isn't just for HTML.

To be clear, I don't think a new scheme is the only way to do this; it's just more straightforward than other suggestions so far.

@agowa:

You already have seen my initial suggestion in another ticket?
#778 (comment)

I hadn't, but that seems like a lot of work (and abstraction) to get to the goals here.

Normally, protocols can negotiate transitions like this (see eg the evolution from HTTP 1-3). What's different here is that unix domain sockets have a completely different authority, and a subtly different transport (as opposed to TCP).

@PHJArea217
Copy link

PHJArea217 commented Sep 30, 2024

Just wanted to chime in here with my own opinion on this.

Unix domain sockets are an OS-specific transport. Windows has named pipes instead. Due to their local nature, simply embedding the Unix socket path or named pipe path would result in two different and somewhat incompatible representations for applications which must work on both Windows and Unix-like systems.

The ideal solution, would be to have some kind of alternative, ideally OS-neutral, namespace, perhaps under the IPv6 link-local or other reserved range, which maps directly to OS-specific network transports like Unix domain sockets or named pipes. Note that unlike as stated above, it is expected that the top level domain or IPv6 prefix, under which the Unix domain sockets or named pipes will be mapped, be configurable, to prevent collisions. For example, one could connect to fe8f::3:6:0 port 12345, and it would map to a Unix socket at /run/00006/00000_12345. This may look weird at first, but the resulting path can still be symlinked to the actual socket to connect to. More generally, such a namespace and mapping could be configured by the user in a way such that an agreed-upon domain name or IP address could appear to connect to a local service, in a way that could be consistent across different operating systems, such that the agreed-upon domain name or IP address can be used to access the services in an OS-independent manner.

The rationale for link local here is because at least on Linux, they fail "closed" i.e. will not result in any actual TCP connection if they are not recognized. The 127.0.0.0/8 range can also be used, in which case they can still leak a TCP connection, but the surface is still limited to the local host.

@randomstuff
Copy link

Unix domain sockets are an OS-specific transport. Windows has named pipes instead.

AFAIU, AF_UNIX has come to Windows.

For example, one could connect to fe8f::3:6:0 port 12345, and it would map to a Unix socket at /run/00006/00000_12345.

One benefit of filesystem sockets is that you can skip the numeric address part and directly map human-friendly names (virtual hostnames) into (human-friendly) paths. This way you avoid the cumbersome task of managing a mapping human-firendly virtual hostnames into numeric addresses.

@PHJArea217
Copy link

PHJArea217 commented Sep 30, 2024

Unix domain sockets are an OS-specific transport. Windows has named pipes instead.

AFAIU, AF_UNIX has come to Windows.

True, I know that AF_UNIX does exist on Windows. But the idea of mapping IPv6 addressing to Unix sockets would not be limited to Unix sockets, but rather also to other stream-based TCP-like transports like AF_VSOCK.

For example, one could connect to fe8f::3:6:0 port 12345, and it would map to a Unix socket at /run/00006/00000_12345.

One benefit of filesystem sockets is that you can skip the numeric address part and directly map human-friendly names (virtual hostnames) into (human-friendly) paths. This way you avoid the cumbersome task of managing a mapping human-firendly virtual hostnames into numeric addresses.

Hostnames tend to be stable, Unix domain sockets tend to be not.

The same effect could be accomplished by putting one of those IP addresses into the /etc/hosts file, resulting in a mapping from a human-friendly domain to the IP address which maps to the unix socket. This also means that there would not need to be any changes to SSL/TLS certificates either, one can continue to use DNS subject alt names.

The connection to fe8f::3:6:0 port 12345 is not an actual TCP connection, but is specially interpreted by a modification of the connect() system call in the TCP/IP socket API, causing a connection to a Unix domain socket rather than a TCP socket. This is implemented by my socketbox and u-relay-tproxy projects (see my github profile).

The advantage of this mapping is that the set of allowed Unix domain sockets that could be connected to is naturally restricted to the end-user-defined mapping of IPv6 prefixes to filesystem path prefixes. Only the unix sockets under path prefixes mentioned in a user-defined mapping would be visible to the application. (To put things another way, the file:// URL scheme could also be sandboxed to a chroot, and the view of the filesystem as observed through file:// URLs can still be totally valid.)

@randomstuff
Copy link

randomstuff commented Sep 30, 2024

The same effect could be accomplished by putting one of those IP addresses into the /etc/hosts file, resulting in a mapping from a human-friendly domain to the IP address which maps to the unix socket.

That's what I was saying by "you don't have to manage numeric addresses". In your proposition, you would still have to maintain a (/etc/hosts file) mapping host names into IP addresses and your filesystem is now filled with numeric Unix socket addresses which are a lot less clearer than eg. /run/user/1000/dev/app.foo.localhost.

Only the unix sockets under path prefixes mentioned in a user-defined mapping would be visible to the application.

You can achieve the same effect by directly mapping some domain names into Unix socket paths (without the intermediate IPv6 address).

Note: if you map host names to special IP addresses which are mapped to Unix sockets, you then have to decide what happens when you receive one of those special IPv6 address from the DNS. Accepting them might be a vulnerability (and for example open up your service to DNS-rebinding attacks). Loopback IP addresses and private IP addresses are often filtered for this reason.

@PHJArea217
Copy link

The same effect could be accomplished by putting one of those IP addresses into the /etc/hosts file, resulting in a mapping from a human-friendly domain to the IP address which maps to the unix socket.

That's what I was saying by "you don't have to manage numeric addresses". In your proposition, you would still have to maintain a (/etc/hosts file) mapping host names into IP addresses and your filesystem is now filled with numeric Unix socket addresses which are a lot less clearer than eg. /run/user/1000/dev/app.foo.localhost.

Only the unix sockets under path prefixes mentioned in a user-defined mapping would be visible to the application.

You can achieve the same effect by directly mapping some domain names into Unix socket paths (without the intermediate IPv6 address).

Sure, and I guess you might be kind of right about that.

The mapping of IPv6 addressing to unix domain sockets in this manner was something that could be easily done in an LD_PRELOAD library, which ensured that it would also work with applications that resolved the domain name to an IP and connected to that IP without having to change the application. But the mapping might be a little bit more flexible because you can map multiple domains to a single IP, which can be useful for testing name based virtual hosting.

Note: if you map host names to special IP addresses which are mapped to Unix sockets, you then have to decide what happens when you receive one of those special IPv6 address from the DNS. Accepting them might be a vulnerability (and for example open up your service to DNS-rebinding attacks). Loopback IP addresses and private IP addresses are often filtered for this reason.

Any sane application will use security features like checking the Host header or rejecting an SSL certificate to prevent this. Besides, it doesn't need to be under fe8f::3:0:0/96, it could also be under an IPv4 loopback prefix.

@minfrin
Copy link

minfrin commented Oct 2, 2024

Just wanted to chime in here with my own opinion on this.

Unix domain sockets are an OS-specific transport. Windows has named pipes instead. Due to their local nature, simply embedding the Unix socket path or named pipe path would result in two different and somewhat incompatible representations for applications which must work on both Windows and Unix-like systems.

All of this is orthogonal to the problem, which is we need a standardised way to express an http(s) URL that points to a unix domain socket. There is no reason why an arbitrary (and legacy) difference between two arbitrary operating systems should place a limit on a standard like an URL definition.

Other types of sockets on other platforms are discussions that should go under a separate issue.

@minfrin
Copy link

minfrin commented Oct 2, 2024

To be clear, I don't think a new scheme is the only way to do this; it's just more straightforward than other suggestions so far.

What is the next step, a draft RFC?

I keep hitting this problem at https://github.com/apache/httpd, if an RFC is the way forward I can make some time for it.

@PHJArea217
Copy link

You're right, this is the url repo on GitHub, and as such would be the place to define a URL standard for encoding a Unix domain socket path.

The Unix socket paths would of course be interpreted in an OS-dependent manner, which is generally not a problem at all. Consider file:/// URLs in Windows, which are typically of the form file:///C:/Users/Username/Example. Even though Windows uses drive letters rather than a root hierarchy like on Unix-like systems, it is still able to make use of file:/// URLs, by encoding the pathname in a clear and straightforward manner, representing C:\Users\Username\Example as /C:/Users/Username/Example.

All of the comments above would effectively be means of encoding a file path string in place of the domain name or IP address of a URL string. This is not much different from the specification of an "interface" in a programming language like Java or Go, which generally specify what is available to a user and what functions an implementation has to implement, but they generally do not specify how they should be implemented. And as discussed above, Windows file:/// URLs are different from Unix file:/// URLs. That is, while both Windows and Unix use the common interface of a file:/// URL, they implement them differently. And that's not a problem at all.

Which means that we can effectively generalize this issue from "encoding a Unix socket path" to "encoding a filesystem path like string which acts as the equivalent of a hostname, which could be interpreted in a system-dependent manner". And just like how I mentioned that file:/// URLs can be chrooted, all of the following could theoretically be possible for the "Unix domain socket URL scheme":

  • Limit the set of available sockets to a certain directory.
  • Map certain prefixes of the filesystem path string to a multitude of directories.
  • Interpret certain prefixes of the filesystem path string as other types of sockets.
  • Apply different restrictions to different prefixes of the filesystem path string.
  • Create an LD_PRELOAD library which intercepts AF_UNIX calls to connect to sockets of other types.

So I'm quite supportive of the issue at hand, so long as it's not restricted to AF_UNIX sockets, because if we need to support other types of transport, then we won't need to have all of this discussion again.

@PHJArea217
Copy link

But one thing I sort of need to point out in this context is still the fact that the URL standard is still an "interface". Which means that we need to differentiate between attempts to modify the "interface" of a URL by changing the URL standard, and attempts to modify the "implementation" of URLs by changing individual implementations, the former of which is merely a means of abstractly expressing a Unix domain socket or similar string in a URL with few constraints on the actual implementation, and the latter of which is the actual means of connecting to a Unix domain socket i.e. x = socket(AF_UNIX, SOCK_STREAM, 0); x.connect({AF_UNIX, "/some_unix_path"});.

All of the above syntax proposals would effectively be modifying the "interface" of a URL to support an extra method of connecting to a Unix domain socket. In many cases, interface implementations are not required to implement every possible method, if it is known that users of the interface will not use that method, and that is certainly true in other contexts (such as the Java Collections API with immutable or unmodifiable collections used with functions that only attempt to read from the collection). Sorry, but the Liskov substitution principle is not very applicable, otherwise every client that implements URLs would have to support every single URL scheme, and that is simply infeasible. Which means that even if we do have a standard for encoding a Unix domain socket path or similar string in a URL string, we cannot guarantee that every implementation of URLs (such as in browsers) will honor it.

This effectively means that many of the linked issues regarding non-support of Unix domain sockets in various clients that take in URLs might be considered to be wishful thinking, that is, even if a scheme for encoding a Unix domain socket path is devised, there is no guarantee that every app will end up supporting it.

On the other hand, my and @randomstuff's proposals of proxy servers or LD_PRELOAD libraries to support the connection of clients to Unix domain sockets are means of changing the implementation of URLs. It is similar to adding support for a new filesystem in an operating system kernel: the new filesystem can be used by applications transparently, by referencing paths on that filesystem in file access APIs, without having to change the application, because all the different filesystems all share the same interface. This is generally much more feasible to accomplish.

LD_PRELOAD might not be possible for Go binaries at this moment, but this is being worked on.

Ultimately, this means that the mere act of connecting to a Unix domain socket is not necessarily something that requires changing the URL standard, if it is possible to shoehorn it into some existing interface. It may seem very hacky or unsightly, but the major advantage is that client applications do not need to be changed, considering how many HTTP clients or web browsers there exists out in the wild.

A similar issue exists in issue #392 where there is discussion on encoding an IPv6 link local zone identifier in a URL. The mere act of connecting to an ipv6 link local address is something that can simply be done by changing the implementation. For example, interpreting subdomains of the ipv6-literal.example domain as "resolving" a string encoding an IPv6 link local with scope into a sockaddr_in6 which fills sin6_scope_id. It is not necessary to change the interface of a URL in doing so, because it reuses the existing "domain" interface.

A more relatable example is the fact that the URL syntax did not need to be changed in order for connections to domain names to go over IPv6. If the URL standard did not have the square bracket notation, then it would have still been possible to connect to IPv6 websites on the network layer, the only limitation would have been that it would have required the use of a domain name to do so. The main reason why the URL standard ultimately did need to be changed in that case is because of the legitimate interest in connecting to IPv6 literal websites in the same way we could have done it with IPv4.

@pauldraper

This comment was marked as abuse.

@mnot

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

17 participants