Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Etag, Last-Modified, If-None-Match and If-Modified-Since http headers on request and response #111

Open
requaos opened this issue Dec 29, 2018 · 8 comments

Comments

@requaos
Copy link

requaos commented Dec 29, 2018

Expected behavior

Getting the Etag and Last-Modified header values on the Feed object

Actual behavior

These values are not inspected for or provided

@mmcdole
Copy link
Owner

mmcdole commented Dec 29, 2018

I agree, this should be something we support.

I honestly never expected people to use ParseURL() very much, and instead expected them to make their own HTTP requests and feed the reader to gofeed.

But it seems people do want to use it, so I should make this interface more robust.

@requaos
Copy link
Author

requaos commented Jan 2, 2019

I'll just leave this here:


import (
	"fmt"
	"net/http"
	"time"

	"github.com/pkg/errors"

	"github.com/mmcdole/gofeed"
)

var gmtTimeZoneLocation *time.Location

func init() {
	loc, err := time.LoadLocation("GMT")
	if err != nil {
		panic(err)
	}
	gmtTimeZoneLocation = loc
}

var ErrNotModified = errors.New("not modified")

type Reader interface {
	ReadFeed(url string, etag string, lastModified time.Time) (*Feed, error)
}

func New(client *http.Client) Reader {
	return &reader{
		feedReader: gofeed.NewParser(),
		client:     client,
	}
}

type reader struct {
	feedReader *gofeed.Parser
	client     *http.Client
}

type Feed struct {
	*gofeed.Feed

	ETag         string
	LastModified time.Time
}

func (r *reader) ReadFeed(url string, etag string, lastModified time.Time) (*Feed, error) {
	req, err := http.NewRequest(http.MethodGet, url, nil)
	if err != nil {
		return nil, err
	}
	req.Header.Set("User-Agent", "Gofeed/1.0")

	if etag != "" {
		req.Header.Set("If-None-Match", fmt.Sprintf(`"%s"`, etag))
	}

	req.Header.Set("If-Modified-Since", lastModified.In(gmtTimeZoneLocation).Format(time.RFC1123))

	resp, err := r.client.Do(req)

	if err != nil {
		return nil, err
	}

	if resp != nil {
		defer func() {
			ce := resp.Body.Close()
			if ce != nil {
				err = ce
			}
		}()
	}

	if resp.StatusCode == http.StatusNotModified {
		return nil, ErrNotModified
	}

	if resp.StatusCode < 200 || resp.StatusCode >= 300 {
		return nil, gofeed.HTTPError{
			StatusCode: resp.StatusCode,
			Status:     resp.Status,
		}
	}

	feed := &Feed{}

	feedBody, err := r.feedReader.Parse(resp.Body)
	if err != nil {
		return nil, err
	}
	feed.Feed = feedBody

	if eTag := resp.Header.Get("Etag"); eTag != "" {
		feed.ETag = eTag
	}

	if lastModified := resp.Header.Get("Last-Modified"); lastModified != "" {
		parsed, err := time.ParseInLocation(time.RFC1123, lastModified, gmtTimeZoneLocation)
		if err == nil {
			feed.LastModified = parsed
		}
	}

	return feed, nil
}

@requaos
Copy link
Author

requaos commented Jan 2, 2019

@mmcdole I only needed to extend the struct to add Etag and LastModified to provide this functionality

@requaos
Copy link
Author

requaos commented Jan 4, 2019

I could make a PR that modifies the ParseURL function like this:
ParseURL(url string, opts ...Options)
And define an etag option and a lastmodified option. By changing the signature to accept a variadic second parameter we would be able to add this functionality without introducing a breaking change to the API.

@mmcdole
Copy link
Owner

mmcdole commented Jan 6, 2019

@requaos I like the idea of the variadic parameter. I think that would be a good idea.

Some items in Options that I know people have requested:

  • ETAG

  • LastModified

  • User-Agent

  • Timeout

@aliml92
Copy link

aliml92 commented May 8, 2023

Are there any updates on this issue yet?

@infogulch
Copy link
Contributor

By changing the signature to accept a variadic second parameter we would be able to add this functionality without introducing a breaking change to the API.

I like this idea, but how would the operator retrieve the Etag/Last-Modified header values from the ParseURL api? There's nowhere to output it, so I guess you'd have to modify the Feed struct... but then that doesn't work for cases like Parse which doesn't get the whole request just the body.

I think the only option may be a new api like the idea by requaos above. #111 (comment)

@DamonHD
Copy link

DamonHD commented Jun 11, 2024

Please also consider somehow facilitating responding well to (200 and 304) Cache-Control max-age and (429 and 503) Retry-After in each case to generate a "do-not-refetch-before" time?

See: https://www.earth.org.uk/RSS-efficiency.html

Rgds

Damon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants