Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow multipart param request/file size limit customization #98

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dm3
Copy link

@dm3 dm3 commented Oct 31, 2013

Commons FileUpload allows limiting uploaded file/request size (throwing exception when the size is exceeded) by counting the bytes received over the wire.

ring.middleware.multipart-params now accepts two more options:

  • max-request-size - maximum size of the whole request in bytes
  • max-file-size - maximum size of a single file in bytes

@dm3
Copy link
Author

dm3 commented Nov 10, 2013

I understand this change is somewhat misguided. Would you be willing to discuss alternative (better) means of achieving the same result?

The use-case is to hard-limit the size of files sent with multipart/form-data without having to receive their full content on the server.

Different web servers have different options for limiting, e.g. Tomcat can only limit application/x-www-form-urlencoded requests via maxPostSize attribute, while Jetty allows for limiting both form-data and urlencoded and Http-kit doesn't have any limiting capabilities (last time I checked).

@weavejester
Copy link
Member

I'm not sure whether this would be better as an option on the middleware itself, or as part of the file store.

Why do you think the change is misguided?

@dm3
Copy link
Author

dm3 commented Nov 10, 2013

I inferred that incorrectly from the lack of activity on your part :) Sorry.

I think this patch is the simplest implementation considering we're using the commons-fileupload, although it does compromise the clarity of the middleware. Also, the vanilla commons-fileupload leaks exceptions.
This I'm not sure how to deal with. Maybe the middleware should short-circuit and provide a (configurable?) error response like {:status 500 :body "File size is too large"}?

Limiting the size in the store should also work, as the stream isn't forced until then. However, custom file-store implementations would have to deal with the limiting themselves, so the options to the middleware (max-file-size) would be unenforceable, which means the options should be moved to the file store too.

To summarize:

Limiting in the FileUpload:
(+) Reuse the commons-fileupload internals
(-) Declaration of limits in code distant from the place of enforcement

Limiting in the store:
(+) Natural place for file-size limits
(-) Must be reimplemented in all custom stores

What do you think?

@weavejester
Copy link
Member

Do the commons-fileupload internals do anything special that just closing the input stream wouldn't duplicate? Since we're splitting up a single input stream into smaller input streams.

It's possible to wrap the store itself in some middleware that would limit the file size, but that would depend on whether closing the file input stream will prompt commons-fileupload to skip to the next file.

@dm3
Copy link
Author

dm3 commented Nov 10, 2013

Closing the FileItem's input stream will skip to the next file on the main request input stream.

I'm not sure I follow your approach.

@weavejester
Copy link
Member

Well, suppose we assume that limiting a stream to a specific size is something we want to do a little more generically. For instance, a web service might be accepting JSON through the body of a POST request. You might not want an attacker to be able to trigger an out-of-memory exception just by pushing through a very large request body.

As it turns out, there's a class in Commons I/O that does just that, BoundedInputStream. So let's wrap that in a function in the ring.util.io namespace for convenience:

(defn bounded-input-stream
  "Return a stream that will return bytes only up to the supplied maximum length."
  [^InputStream stream ^long max-length]
  (BoundedInputStream. stream max-length))

Next, let's create some store middleware that wraps the stream in the BoundedInputStream:

(defn bound-store
  "Bound a store function to a maximum file length."
  [store max-length]
  (fn [item]
    (store (update-in item [:stream] bounded-input-stream max-length))))

Now we can bound the length of any store function:

(-> (temp-file-store) (bound-store 8e6))

Or we could create some middleware to bound the request body as a whole.

@weavejester
Copy link
Member

By the way, apologies for the delay in replying to this. I'm not as good as I could be at getting around to integrating Ring patches. Sometimes it helps to poke me. :)

@dm3
Copy link
Author

dm3 commented Nov 10, 2013

Thanks for the clarification,

BoundedInputStream seems good, although it changes semantics by storing max bytes instead of producing an error. This raises the question - should we provide separate bound-store wrappers for:

  • saving max bytes of each file (should work with your bound-store implementation)
  • discarding the files which are larger than max, but storing the smaller ones (and adding metadata about the discarded files to the request)
  • discarding the whole request

or go with a single one?

I'd go with discarding the whole request as the default one. What do you think?

@weavejester
Copy link
Member

Well, there's a limit to what we can do unless we extend the storage interface to allow discarding of stored data. The only solution we can achieve without adding to the storage API is to cut off the file after a maximum number of bytes.

@dm3
Copy link
Author

dm3 commented Nov 10, 2013

This solution doesn't seem that useful. What would be the point in storing truncated files? To be honest, I can't think of a single use-case.
If you don't like throwing exceptions from the middleware, I'd go with extending the storage interface.

@weavejester
Copy link
Member

The solution would prevent the case where an attacker attempts to use up temporary file or memory space. If the user is malicious, it doesn't matter if the file is truncated or not.

However, that doesn't cover the cases where a legitimate user tries to upload a file that's too large.

Ideally we'd like an exception or error of some kind, but not one that's specific to Apache Commons. Should we use something like Slingshot, which would introduce a new dependency? Or should we handle it by a returned error response, which can be interpreted by other middleware yet still provide a useful error if uncaught?

@dm3
Copy link
Author

dm3 commented Nov 12, 2013

I think the way which captures the maximum information is

  1. storing files which obey the size limit
  2. providing additional entries in the request map (e.g. :multipart-errors) for files which went over the limit (with name, content-type, content-length if available).

This way upload handler will not force the client to reupload the smaller files and will be able to provide granular error messages. The responsibility to fail (e.g. when all of the files were too large) will then be moved to the file upload handler, not the ring middleware. The drawback is a more complicated API. What do you think?

@weavejester
Copy link
Member

Give me a little time to consider the options.

@hugo
Copy link

hugo commented Jan 23, 2014

Sorry to bug, @weavejester, but I need this functionality in an app I'm currently building. For what it's worth, I think the solution proposed by @dm3 would be optimal, and the added API complexity is negligible, given the extra flexibility it affords the developer.

@weavejester
Copy link
Member

The reason I haven't merged this in, is that I plan on rewriting the multipart-params API for Ring 1.3 in order to remove the dependency on servlet/servlet-api.

In the meantime, you can presumably solve your particular case with a custom multipart store, right?

@hugo
Copy link

hugo commented Jan 23, 2014

In my specific use case file uploads will ultimately be stored in CouchDB, so on further reflection, what I really want to do is just pass clj-http an InputStream of the request body and in effect have the client upload directly to Couch (can't have actual direct client uploads to Couch because the DB server isn't public facing.

@jonathanj
Copy link

What is the current state of this pull request? Conversation died about a year ago and there's no further information about possible solutions or recent changes that might have made this behaviour easier to implement.

@weavejester
Copy link
Member

There's a branch custom-multipart that contains a work-in-progress redesign of the multipart middleware for Ring 1.4.

In the meantime, this should be solvable with a custom multipart store.

@jonathanj
Copy link

@weavejester Would you mind giving some kind of brief overview of what the custom multipart store might do? You previously mentioned BoundedInputStream but that's unfortunately only really useful for thwarting malicious attempts and not for informing users. A 413 Request Entity Too Large response would be ideal but I'm not sure how to achieve this using the store API.

@ertugrulcetin
Copy link

@dm3 @weavejester I think this imp. needs error handling?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants