HTTP body content is truncated in some cases despite setting max-size to a high value #394

yudelevi · 2023-09-04T19:59:44Z

On latest main.

An example site to reproduce:

[root@bullseye zgrab2]# echo "www.gigenet.com" | ./zgrab2 http --use-https --port 443 --max-size=204800 --max-redirects=5 --cipher-suite portable > zgrab_test
[root@bullseye zgrab2]# cat zgrab_test | jq '.data.http.result.response.body' |  wc
      1    9881  598195

attached full output as zgrab_test.txt
Adjusting max-size to different values made no difference

The same via cURL

[root@bullseye zgrab2]# curl -L https://www.gigenet.com > curl_test
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   162  100   162    0     0     71      0  0:00:02  0:00:02 --:--:--    71
100  918k    0  918k    0     0   300k      0 --:--:--  0:00:03 --:--:-- 3707k
[root@bullseye zgrab2]# wc curl_test
  3875  22298 940954 curl_test

About 340K of content did not make it into the zgrab body, despite having max-size of 20MB

The text was updated successfully, but these errors were encountered:

yudelevi · 2023-09-04T20:02:54Z

To rule out a silly, re-tested curl with HTTP1.1 and TLS1.2 - it's not the culprit

mzpqnxow · 2023-09-07T00:33:58Z

A couple of ideas...

Try specifying the with-body-size bool flag. This won't fix the issue but it might be helpful to see if it's reporting short in the output (consistent with what's actually saved) or if it's reporting a value consistent with the content-length
Try increasing the global timeout, which is separate from the http module in the global options. I'm not 100% certain but I believe this may be applied not only to connect operations but globally to i/o operations on a connected socket. Perhaps it's hitting the deadline while the response is in the process of streaming? This would result in a short read regardless of what the flag says

I read through the http module and didn't see anything obvious, though I'm far from an authority

The length calculations seem to be consistent, using the standard idiom (roughly):

readLen = min(contentLength, maxReadLen)

EDIT: Thanks to your (very large) output, I think I might see the issue. The length enforcement calculation depends upon content-length, but the response is using chunked encoding:

"content_length":-1,"transfer_encoding": ["chunked"],

While you could make changes in zgrab2 code to accommodate this, it would probably be easier to instruct the server to not use chunked encoding. I don't recall off the top of my head, perhaps there is a header the client can send to inhibit that behavior from the server? Maybe not?

If this is in fact the issue, I'm a little surprised that nobody noticed this. Seems you could avoid chunked-encoding entirely by specifying HTTP/1.0 in your request as a possible test to determine if it is in fact the issue, since HTTP 1.0 doesn't support chunked encoding

yudelevi · 2023-09-07T17:44:08Z

@mzpqnxow, thanks for your response. I think you are onto something.

I tried using --with-body-size, but it didn't change anything.
I didn't find an easy method to force an HTTP 1.0 request in zgrab, it seems like 1.1 request is always sent (perhaps it makes sense to add a flag for a 1.0 request, but this is probably another item altogether).

I attempted to send the Accept-Encoding: Identity header to the server (--custom-headers-names=Accept-Encoding --custom-headers-values=identity), but it actually cut off the body even shorter.

I think this might be caused by this portion:

zgrab2/lib/http/transfer.go

Lines 477 to 499 in 97ba87c

    
           // Prepare body reader. ContentLength < 0 means chunked encoding 
        
           // or close connection when finished, since multipart is not supported yet 
        
           switch { 
        
           case chunked(t.TransferEncoding): 
        
           	if noResponseBodyExpected(t.RequestMethod) { 
        
           		t.Body = NoBody 
        
           	} else { 
        
           		t.Body = &body{src: NewChunkedReader(r), hdr: msg, r: r, closing: t.Close} 
        
           	} 
        
           case realLength == 0: 
        
           	t.Body = NoBody 
        
           case realLength > 0: 
        
           	t.Body = &body{src: io.LimitReader(r, realLength), closing: t.Close} 
        
           default: 
        
           	// realLength < 0, i.e. "Content-Length" not mentioned in header 
        
           	if t.Close { 
        
           		// Close semantics (i.e. HTTP/1.0) 
        
           		t.Body = &body{src: r, closing: t.Close} 
        
           	} else { 
        
           		// Persistent connection (i.e. HTTP/1.1) 
        
           		t.Body = NoBody 
        
           	} 
        
           }

With HTTP1.1, the connection is assumed to be persistent, and t.Body = NoBody is set for the second pass.

mzpqnxow · 2023-09-09T13:56:52Z

With HTTP1.1, the connection is assumed to be persistent, and t.Body = NoBody is set for the second pass.

I'm not sure what you mean by second pass (I'm on mobile, couldn't follow the code very easily) but that phrase makes me think of a somewhat common chunked encoding pattern, which is to send multiple chunks

Often this is because the application/reverse-proxy/server is pulling content from multiple sources, especially when one may block or have an unknown size Common example, the first chunk may be a static header from a memory cache where the size is known and no blocking will occur and the second chunk may be content that may be coming from a backend API over the network and streamed rather than buffered

When you use curl -vvvv (or the relevant trace params, or tcpdump if it's not TLS) do you see multiple chunks in the response? Wondering if only the first chunk is being processed. Maybe you already considered this and that's what you meant by "second pass"?

Perhaps you're already familiar with chunked encoding, but from what I recall of chunked-encoding, 2 chunks of sizes of 4 and 8 would look like this on the wire:

GET / HTTP 1.1
...


HTTP 200 OK
...
Encoding: chunked

00000004
XXXX

00000008
YYYYYYYY

00000000<EOF>

With tcpdump or curl trace output, you should be able to see each chunk length and chunk

If there are in-fact multiple chunks, you can check to see of the zgrab2 output is ending where the second chunk in curl/tcpdump is beginning

Either way, it sounds like the "fix" (if multiple chunks is actually the issue- possibly a big assumption on my part) would be:

Change nothing; maybe this is deliberate behavior, to reduce waiting for additional chunks. A tradeoff between performance and completeness
Implement a generic solution; ensure chunked responses are fully received, by adding new logic to the response handling code. Cease receiving chunks when the max response body length is hit or the i/o deadline/timeout is reached. Note this might not be ideal as users depending on the current behavior may see performance impact, particularly if a lot of servers currently send multiple chunks, and at least one of them blocks for a non-negligible period of time. Since you're the first to notice, current users seem to not be too bothered by the current behavior, I guess. probably they haven't noticed. To avoid impacting those users, this could be opt-in with a receive-chunks flag
Change the internal chunked logic to support a "min-response-chunks" or "max-response-chunks" flag that is applied only to chunked responses and defaults to 1. This allows users that want different/new behavior (like you, and actually probably me) to opt-in to new behavior, and also offers more fine-grained control of how many chunks will be received if desired by opt-in users
Add an HTTP1.0 flag to zgrab2 and pass it through to the http interface, preventing chinked responses entirely

Some caveats...

These are "off the cuff" ideas that make assumptions that the issue is that "multiple chunk responses are not being handled." I have not tested or reviewed the code in full (yeah, still on mobile)
It's very possible none of these ideas is acceptable. Ultimately, after confirming the cause, we would want to defer to a zmap project dev before choosing one. I'm sure @dadrian will have a clever suggestion on hiw to handle anything and/or or guidance on the preferred approach. I don't want to waste his time until we can confirm what the precise cause is, otherwise I would immediately punt this 😀

For 4, if there was any concern about this changing (breaking) things for some users who encounter servers which may require sending a chunked response (e.g. if there are some that refuse to or are unable to buffer all of the response data) it could be an opt-in behavior. It could also be "just-in-time", applied only when needed using a flag similar to how retry-https and fail-https-to-http work. At a high level, those flags abandon the first response if a "problem" is detected and then re-request with the workaround version of the request. For this case, I think the re-request would be sent only when the response is chunked and the first chunk is < the max response body length. The re-request would explicitly request HTTP 1.0, to prevent a chunked response

Assuming there is a modification made to reliably receive all chunks, the implementation could contain a short-circuit to ensure additional chunks are only received when necessary:

noMoreChunks = connIsClosed()
if noMoreChunks || chunkLen <= userMaxBodyLen:
    // the chunk is big enough to fulfill user max response
    return body
if no_more_chunks
// First chunk didn't meet

Apologies for any incoherence- I'm playing the "sorry, on mobile" card one last time

mzpqnxow · 2024-01-07T22:45:54Z

So I spent a 20 minutes today looking at what would be needed to force the golang HTTP package to send HTTP/1.0 (and potentially other values). That's the quickest blunt-instrument solution for mishandling of chunked encoding responses - since HTTP/1.0 signals to the server not to use chunked encoding in the response. What I found was not cool. It also might be neat to be able to use invalid values (e.g. 1.3, or "" !)

I went to the http package and noticed this, where Protocol.Major and Protocol.Minor are set to 1 and 1 respectively:

zgrab2/lib/http/request.go

Lines 818 to 825 in 97ba87c

    
           req := &Request{ 
        
           	Method: method, 
        
           	URL:    u, 
        
           	Protocol: Protocol{ 
        
           		Name:  "HTTP/1.1", 
        
           		Major: 1, 
        
           		Minor: 1, 
        
           	},

As a test, I made a quick change to use 1 and 0 - but I saw no change in behavior, it was still emitting "HTTP/1.1".

So I looked more closely and noticed this:

zgrab2/lib/http/request.go

Lines 576 to 579 in 97ba87c

    
           _, err = fmt.Fprintf(w, "%s %s HTTP/1.1\r\n", valueOrDefault(req.Method, "GET"), ruri) 
        
           if err != nil { 
        
           	return err 
        
           }

The Major/Minor aren't consulted, the literal "HTTP/1.1" is put in place, regardless of what's set in Protocol

I didn't look much beyond this as I barely had the time to clean up a few of my branches, but it seems at first glance that making changes would be a pretty invasive change to the HTTP code which would break other (zgrab2 internal) uses of http.NewRequest - so those would need to be fixed (IPP is one)

That's not considering the > 150 uses of http.NewRequest from inside of the http package itself (mostly tests, I guess ...)

Another option would be to make a duplicate of that function with a suffix, and instruct http/scanner.go to use that function, to avoid disturbing other callers

And/or, perhaps it would be best to have the code properly honor Protocol.Minor and Protocol.Major - I didn't dig into whether it would be too late, but it's possible to directly set the Protocol.Minor, Protocol.Major and Protocol.Name values after the request is returned by http.NewRequest

Either way, I don't have to time to go further into it right now but would like to return to it at some point. Aside from possibly addressing any issues with chunked encoding, there may be some servers out there that have interesting behavior when HTTP is not 1.1 (though I'm sure it would only be a tiny fraction if that is the case)

So basically, this update is a no-op, or some basic info if anyone wants to go ahead and spend some time making the protocol, major and minor customizable via a flag

Sorry if this is incoherent - written a bit hastily

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTP body content is truncated in some cases despite setting max-size to a high value #394

HTTP body content is truncated in some cases despite setting max-size to a high value #394

yudelevi commented Sep 4, 2023

yudelevi commented Sep 4, 2023

mzpqnxow commented Sep 7, 2023 •

edited

Loading

yudelevi commented Sep 7, 2023 •

edited

Loading

mzpqnxow commented Sep 9, 2023

mzpqnxow commented Jan 7, 2024

HTTP body content is truncated in some cases despite setting max-size to a high value #394

HTTP body content is truncated in some cases despite setting max-size to a high value #394

Comments

yudelevi commented Sep 4, 2023

yudelevi commented Sep 4, 2023

mzpqnxow commented Sep 7, 2023 • edited Loading

yudelevi commented Sep 7, 2023 • edited Loading

mzpqnxow commented Sep 9, 2023

mzpqnxow commented Jan 7, 2024

mzpqnxow commented Sep 7, 2023 •

edited

Loading

yudelevi commented Sep 7, 2023 •

edited

Loading