Skip to content

Filters

4poc edited this page Feb 26, 2011 · 3 revisions

Filters are applied before and after readability tries to extract the main content and can be used to improve or correct the detection on specific sites. In general the extraction algorithm of readability works pretty good, but sometimes it is unavoidable to use some kind of pre or post processing to fix false-positives and other problems.

Feedability supports 5 different types of rules per url pattern (regular expression against the article url), that are applied (either for pre or post-processing) in some way on specific HTML elements. The elements are specified using jQuery selectors. You can get documentation about them at the jQuery API documentation or at w3schools. The different types are:

replace

Within a pre or post group.

Use regular expressions to replace specific content. The replace argument supports placeholders, currently the only supported is %{URL_Base}, that gets replaced with the article url base.

remove

Within a pre or post group.

Remove rules can be used to strip specific elements from the article html that are known to be causing false positive matches of main content by readability.

exclusive

Within a pre or post group.

Elements that are selected by exclusive rules are replacing the body of the document. (so, currently it only makes sense to specify one element, but this may change in the future)

prepend

Needs to be outside pre/post.

Selected html by those rules are prepended to the final extracted text. This is useful for headings/tailings that are not included by readability.

append

Needs to be outside pre/post.

Selected html by those rules are appended to the final extracted text.

Example Filters

If you change the filter rules you need to remove the *.rdby caching files to apply the new filters on already fetched articles. Example:

"rules": {
  "sixserv.org": {
    "pre": {
      "remove": ["#sidebar", ".commentlist"],
      "exclusive": ["#content"]
    }
  }
}

For a short tutorial on how to create remove filter rules read Filter-Tutorial.

Clone this wiki locally