Skip to content

Commit

Permalink
Merge pull request #125 from w3c/default-xml-encoding
Browse files Browse the repository at this point in the history
Remove assumption of us-ascii for text/xml
  • Loading branch information
dontcallmedom authored Nov 17, 2023
2 parents df71e0a + 02d200b commit 9f68a4d
Show file tree
Hide file tree
Showing 19 changed files with 29 additions and 36 deletions.
2 changes: 1 addition & 1 deletion docs-xml/error/InvalidAddrSpec.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<p><code>foo</code> must be an email address</p>
</div>
<div id='explanation'>
<p>MUST conform to the "addr-spec" production in <a href="http://www.faqs.org/rfcs/rfc2822.html">RFC 2822</a></p>
<p>MUST conform to the "addr-spec" production in <a href="http://www.rfc-editor.org/rfc/rfc2822.html">RFC 2822</a></p>
</div>
<div id='solution'>
<p>Convert the email address to a valid form. Examples of valid email
Expand Down
2 changes: 1 addition & 1 deletion docs-xml/error/InvalidContact.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<p>Invalid email address</p>
</div>
<div id='explanation'>
<p>Email addresses must conform to <a href="http://www.faqs.org/rfcs/rfc2822.html">
<p>Email addresses must conform to <a href="http://www.rfc-editor.org/rfc/rfc2822.html">
RFC 2822</a></p>
</div>
<div id='solution'>
Expand Down
2 changes: 1 addition & 1 deletion docs-xml/error/InvalidMIMEAttribute.xml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<p>The attribute value specified is not a value MIME type.</p>
</div>
<div id='solution'>
<p>This attribute must be a valid MIME content type as defined by <a href="http://www.faqs.org/rfcs/rfc2045.html">RFC 2045</a>.</p>
<p>This attribute must be a valid MIME content type as defined by <a href="http://www.rfc-editor.org/rfc/rfc2045.html">RFC 2045</a>.</p>

<p>This is an example of a valid MIME type: <samp>text/html</samp></p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs-xml/error/InvalidMIMEType.xml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<p>This attribute is not a valid MIME type.</p>
</div>
<div id='solution'>
<p>Valid MIME types are specified in <a href="http://www.faqs.org/rfcs/rfc2046.html">RFC 2046</a>.</p>
<p>Valid MIME types are specified in <a href="http://www.rfc-editor.org/rfc/rfc2046.html">RFC 2046</a>.</p>

<p>Examples of valid MIME types:</p>

Expand Down
2 changes: 1 addition & 1 deletion docs-xml/error/InvalidRFC2822Date.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<p><code>element</code> must be an RFC-822 date-time</p>
</div>
<div id='explanation'>
<p>Invalid date-time. The value specified must meet the Date and Time specifications as defined by <a href="http://www.faqs.org/rfcs/rfc822.html">RFC822</a>, with the exception that the year should be expressed as four digits.</p>
<p>Invalid date-time. The value specified must meet the Date and Time specifications as defined by <a href="http://www.rfc-editor.org/rfc/rfc822.html">RFC822</a>, with the exception that the year should be expressed as four digits.</p>
</div>
<div id='solution'>
<p>Change the date-time format to comply with RFC822. Here are examples of valid RFC822 date-times:</p>
Expand Down
2 changes: 1 addition & 1 deletion docs-xml/error/InvalidRFC3339Date.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<p><code>foo</code> must be an RFC 3339 date-time</p>
</div>
<div id='explanation'>
<p>The content of this element MUST conform to the "date-time" production as defined in <a href="http://www.faqs.org/rfcs/rfc3339.html">RFC 3339</a>. In addition, an uppercase "T" character MUST be used to separate date and time, and an uppercase "Z" character MUST be present in the absence of a numeric time zone offset.</p>
<p>The content of this element MUST conform to the "date-time" production as defined in <a href="http://www.rfc-editor.org/rfc/rfc3339.html">RFC 3339</a>. In addition, an uppercase "T" character MUST be used to separate date and time, and an uppercase "Z" character MUST be present in the absence of a numeric time zone offset.</p>
</div>
<div id='solution'>
<p>Change the date-time format to comply with RFC 3339. Here are examples of valid RFC 3339 date-times:</p>
Expand Down
4 changes: 2 additions & 2 deletions docs-xml/error/InvalidURN.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<p><code>foo</code> is not a valid URN</p>
</div>
<div id='explanation'>
<p>Value is not a valid URN, as defined by <a href="http://www.faqs.org/rfcs/rfc2141.html">RFC 2141</a>.</p>
<p>Value is not a valid URN, as defined by <a href="http://www.rfc-editor.org/rfc/rfc2141.html">RFC 2141</a>.</p>
</div>
<div id='solution'>
<p>URNs have very picky syntax requirements. A common problem is trying to use domain names as namespace identifiers in URNs. For example, this is an <em>invalid URN</em>:</p>
Expand All @@ -17,7 +17,7 @@

<p>Note that the periods in the domain name have been replaced by dashes.</p>

<p>If this is not your problem, try reading <a href="http://www.faqs.org/rfcs/rfc2141.html">RFC 2141</a>. It's quite short. Section 2.1 talks about what's allowed in namespace identifiers (immediately after the "urn:" part); section 2.2 talks about what's allowed in the rest of it.</p>
<p>If this is not your problem, try reading <a href="http://www.rfc-editor.org/rfc/rfc2141.html">RFC 2141</a>. It's quite short. Section 2.1 talks about what's allowed in namespace identifiers (immediately after the "urn:" part); section 2.2 talks about what's allowed in the rest of it.</p>
</div>
</div>
</fvdoc>
2 changes: 1 addition & 1 deletion docs-xml/warning/EncodingMismatch.xml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ web server's version takes preference, but many aggregators ignore this.
Note that, if you are serving content as '<code>text/*</code>', then
the default charset is US-ASCII, which is probably not what you want.
(See
<a href="http://www.faqs.org/rfcs/rfc3023.html" title="RFC 3023 (rfc3023) - XML Media Types">RFC 3023</a> for technical details.)</p>
<a href="http://www.rfc-editor.org/rfc/rfc3023.html" title="RFC 3023 (rfc3023) - XML Media Types">RFC 3023</a> for technical details.)</p>
<p>RSS feeds should be served as <code>application/rss+xml</code>
(RSS 1.0 is an RDF format, so it may be served as
<code>application/rdf+xml</code> instead).
Expand Down
2 changes: 1 addition & 1 deletion docs-xml/warning/ProblematicalRFC822Date.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
</div>
<div id='explanation'>
<p>The specified date-time value, while technically valid, is likely to cause interoperability issues.</p>
<p>The value specified must meet the Date and Time specifications as defined by <a href="http://www.faqs.org/rfcs/rfc822.html">RFC822</a>, with the exception that the year SHOULD be expressed as four digits.</p>
<p>The value specified must meet the Date and Time specifications as defined by <a href="http://www.rfc-editor.org/rfc/rfc822.html">RFC822</a>, with the exception that the year SHOULD be expressed as four digits.</p>
<p>Additionally:</p>
<ul>
<li><a href="http://www.w3.org/Protocols/rfc822/3_Lexical.html#z3">RFC 822 &#167; 3.4.2:</a>
Expand Down
2 changes: 1 addition & 1 deletion docs/error/InvalidAddrSpec.html
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ <h2>Message</h2>
<h2>Explanation</h2>

<div class="docbody">
<p>MUST conform to the "addr-spec" production in <a href="http://www.faqs.org/rfcs/rfc2822.html">RFC 2822</a></p>
<p>MUST conform to the "addr-spec" production in <a href="http://www.rfc-editor.org/rfc/rfc2822.html">RFC 2822</a></p>
</div>
<h2>Solution</h2>
<div class="docbody">
Expand Down
2 changes: 1 addition & 1 deletion docs/error/InvalidContact.html
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ <h2>Message</h2>
<h2>Explanation</h2>

<div class="docbody">
<p>Email addresses must conform to <a href="http://www.faqs.org/rfcs/rfc2822.html">
<p>Email addresses must conform to <a href="http://www.rfc-editor.org/rfc/rfc2822.html">
RFC 2822</a></p>
</div>
<h2>Solution</h2>
Expand Down
2 changes: 1 addition & 1 deletion docs/error/InvalidMIMEAttribute.html
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ <h2>Explanation</h2>
</div>
<h2>Solution</h2>
<div class="docbody">
<p>This attribute must be a valid MIME content type as defined by <a href="http://www.faqs.org/rfcs/rfc2045.html">RFC 2045</a>.</p>
<p>This attribute must be a valid MIME content type as defined by <a href="http://www.rfc-editor.org/rfc/rfc2045.html">RFC 2045</a>.</p>

<p>This is an example of a valid MIME type: <samp>text/html</samp></p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/error/InvalidMIMEType.html
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ <h2>Explanation</h2>
</div>
<h2>Solution</h2>
<div class="docbody">
<p>Valid MIME types are specified in <a href="http://www.faqs.org/rfcs/rfc2046.html">RFC 2046</a>.</p>
<p>Valid MIME types are specified in <a href="http://www.rfc-editor.org/rfc/rfc2046.html">RFC 2046</a>.</p>

<p>Examples of valid MIME types:</p>

Expand Down
2 changes: 1 addition & 1 deletion docs/error/InvalidRFC2822Date.html
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ <h2>Message</h2>
<h2>Explanation</h2>

<div class="docbody">
<p>Invalid date-time. The value specified must meet the Date and Time specifications as defined by <a href="http://www.faqs.org/rfcs/rfc822.html">RFC822</a>, with the exception that the year should be expressed as four digits.</p>
<p>Invalid date-time. The value specified must meet the Date and Time specifications as defined by <a href="http://www.rfc-editor.org/rfc/rfc822.html">RFC822</a>, with the exception that the year should be expressed as four digits.</p>
</div>
<h2>Solution</h2>
<div class="docbody">
Expand Down
2 changes: 1 addition & 1 deletion docs/error/InvalidRFC3339Date.html
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ <h2>Message</h2>
<h2>Explanation</h2>

<div class="docbody">
<p>The content of this element MUST conform to the "date-time" production as defined in <a href="http://www.faqs.org/rfcs/rfc3339.html">RFC 3339</a>. In addition, an uppercase "T" character MUST be used to separate date and time, and an uppercase "Z" character MUST be present in the absence of a numeric time zone offset.</p>
<p>The content of this element MUST conform to the "date-time" production as defined in <a href="http://www.rfc-editor.org/rfc/rfc3339.html">RFC 3339</a>. In addition, an uppercase "T" character MUST be used to separate date and time, and an uppercase "Z" character MUST be present in the absence of a numeric time zone offset.</p>
</div>
<h2>Solution</h2>
<div class="docbody">
Expand Down
4 changes: 2 additions & 2 deletions docs/error/InvalidURN.html
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ <h2>Message</h2>
<h2>Explanation</h2>

<div class="docbody">
<p>Value is not a valid URN, as defined by <a href="http://www.faqs.org/rfcs/rfc2141.html">RFC 2141</a>.</p>
<p>Value is not a valid URN, as defined by <a href="http://www.rfc-editor.org/rfc/rfc2141.html">RFC 2141</a>.</p>
</div>
<h2>Solution</h2>
<div class="docbody">
Expand All @@ -38,7 +38,7 @@ <h2>Solution</h2>

<p>Note that the periods in the domain name have been replaced by dashes.</p>

<p>If this is not your problem, try reading <a href="http://www.faqs.org/rfcs/rfc2141.html">RFC 2141</a>. It's quite short. Section 2.1 talks about what's allowed in namespace identifiers (immediately after the "urn:" part); section 2.2 talks about what's allowed in the rest of it.</p>
<p>If this is not your problem, try reading <a href="http://www.rfc-editor.org/rfc/rfc2141.html">RFC 2141</a>. It's quite short. Section 2.1 talks about what's allowed in namespace identifiers (immediately after the "urn:" part); section 2.2 talks about what's allowed in the rest of it.</p>
</div>
<h2>Not clear? Disagree?</h2>
<div class="docbody">
Expand Down
2 changes: 1 addition & 1 deletion docs/warning/EncodingMismatch.html
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ <h2>Explanation</h2>
Note that, if you are serving content as '<code>text/*</code>', then
the default charset is US-ASCII, which is probably not what you want.
(See
<a href="http://www.faqs.org/rfcs/rfc3023.html" title="RFC 3023 (rfc3023) - XML Media Types">RFC 3023</a> for technical details.)</p>
<a href="http://www.rfc-editor.org/rfc/rfc3023.html" title="RFC 3023 (rfc3023) - XML Media Types">RFC 3023</a> for technical details.)</p>
<p>RSS feeds should be served as <code>application/rss+xml</code>
(RSS 1.0 is an RDF format, so it may be served as
<code>application/rdf+xml</code> instead).
Expand Down
2 changes: 1 addition & 1 deletion docs/warning/ProblematicalRFC822Date.html
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ <h2>Explanation</h2>

<div class="docbody">
<p>The specified date-time value, while technically valid, is likely to cause interoperability issues.</p>
<p>The value specified must meet the Date and Time specifications as defined by <a href="http://www.faqs.org/rfcs/rfc822.html">RFC822</a>, with the exception that the year SHOULD be expressed as four digits.</p>
<p>The value specified must meet the Date and Time specifications as defined by <a href="http://www.rfc-editor.org/rfc/rfc822.html">RFC822</a>, with the exception that the year SHOULD be expressed as four digits.</p>
<p>Additionally:</p>
<ul>
<li><a href="http://www.w3.org/Protocols/rfc822/3_Lexical.html#z3">RFC 822 § 3.4.2:</a>
Expand Down
25 changes: 9 additions & 16 deletions src/feedvalidator/xmlEncoding.py
Original file line number Diff line number Diff line change
Expand Up @@ -210,26 +210,19 @@ def decode(mediaType, charset, bs, loggedEvents, fallback=None):
encoding = None

if charset and encoding and charset.lower() != encoding.lower():
# RFC 3023 requires us to use 'charset', but a number of aggregators
# ignore this recommendation, so we should warn.
# Warn about discrepancies between charset param and encoding
# See also https://datatracker.ietf.org/doc/html/rfc7303#section-3
loggedEvents.append(logging.EncodingMismatch({"charset": charset, "encoding": encoding}))

if mediaType and mediaType.startswith("text/") and charset is None:
loggedEvents.append(logging.TextXml({}))

# RFC 3023 requires text/* to default to US-ASCII. Issue a warning
# if this occurs, but continue validation using the detected encoding
try:
bs.decode("US-ASCII")
except:
if not encoding:
try:
bs.decode(fallback)
encoding=fallback
except:
pass
if encoding and encoding.lower() != 'us-ascii':
loggedEvents.append(logging.EncodingMismatch({"charset": "US-ASCII", "encoding": encoding}))
if not encoding:
try:
bs.decode(fallback)
encoding=fallback
except:
pass

enc = charset or encoding
if enc is None:
Expand Down

0 comments on commit 9f68a4d

Please sign in to comment.