Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with slightly malformed feeds (missing timezone information) #87

Open
yringot opened this issue Feb 25, 2018 · 2 comments
Open

Comments

@yringot
Copy link

yringot commented Feb 25, 2018

action: add new feed (with missing timezone information) and set a download-from date two weeks ago to avoid downloading a lot of stuff
expected result: only podcasts published after that date get downloaded by greg
actual result: greg complains about being unable to parse time information and downloads all podcasts of that entire feed, disregarding the --downloadfrom parameter and apparently the firstsync = 10 parameter as well.

I subscribe to several podcast feeds via bitlove.org, a podcast distribution website that distributes podcasts via bittorrent. I have had some problems with malformed feeds causing greg to trip up and ignoring the --downloadfrom-date that I had set (now) when adding those feeds (to avoid downloading podcasts which I had already listened to before setting up greg).

One of these feeds is the following: http://bitlove.org/metaebene/freakshow-mp3/downloads.rss When I do greg sync on this feed, greg outputs I cannot parse the time information of this feed.I'll use your current local time instead. When I use W3C feed checker on that feed (https://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fbitlove.org%2Fmetaebene%2Ffreakshow-mp3%2Fdownloads.rss) it highlights among other minor mistakes the following error:

line 11, column 35: pubDate must be an RFC-822 date-time: Thu, 15 Feb 2018 11:24:50 (25 occurrences) [help]

<pubDate>Thu, 22 Feb 2018 12:53:02 </pubDate>
                                   ^

The only thing that seems to be missing is the time zone after HH:mm:ss, yet greg just throws up its arms and says that it cant parse the date. Would it not make more sense for greg to throw a warning ("malformed date") and then just take the current time zone? (as is already implied in the above error message but doesn't seem to work)

I understand that this problem is not greg's fault, but that the feed is strictly speaking malformed. However, given that the date is otherwise fine and that only the time zone is missing, I would like to ask whether adding some additional logic to greg so that it ignores this minor error would be possible.

Here is my greg.conf: https://gist.github.com/yringot/f57d8116f4a4392a3695505c0bceeb9c

@yringot yringot changed the title Dealing with slightly malformed feeds Dealing with slightly malformed feeds (missing timezone information) Feb 25, 2018
@yringot
Copy link
Author

yringot commented Feb 25, 2018

(I will also inform the producer of that podcast of the validation error.)

@thomasboehm
Copy link

It would be also nice to have an option to ignore the timezone altogether to be able to use {date} as intended by the publisher of the podcast.

I subscribed to a daily podcast which is in a timezone 1 hour ahead of me and they publish the episodes 5 past midnight. I save them with the date in the filename. So the files have always the date of the day before. Not a big issue, but it would be nice to have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants