Skip to content
This repository has been archived by the owner on Aug 14, 2021. It is now read-only.

Releases: andreskrey/readability.php

v0.2.1

31 May 12:05
Compare
Choose a tag to compare
  • Added normalizeEntities flag to convert UTF-8 characters to its HTML Entity equivalent. Fixes bugs on htmls with mixed encoding.
  • Added more information to the readme.md file
  • New way to create a backup DOM: not creating a backup. In the previous version, the system cloned the $this->dom object to keep it as a backup in order to restart the algorithm with other flags, if needed. This seemed to work until I realized that sometimes the backup changes even if we are not touching it. Seems that the dom and backupdom objects are linked and some changes on the dom object reach the bakcupdom object. The new approach consists in deleting the backupdom object and recreating from scratch the dom object. Of course this has a performance impact, but seems to be quite low.

v0.2.0

10 Mar 11:11
Compare
Choose a tag to compare

We ARE a 100% complete port of Readability.js!

  • Every test unit passes
  • Readability.php produces the same exact output as Readability.js
  • I'm happy :)

Fixed

  • Lots of bugs
  • Merged PR by DavidFricker to avoid exceptions while grabbing the document content

Added

  • substituteEntities flag, to avoid replacing especial characters with HTML entities. There's nothing we can do about  , that entity is replaced by libxml and there's no way to disable it.
  • Named data sets so it's easier to detect which test case is failing.

Removed

  • Couple of test cases that involved broken JS. There's nothing we can do about JS spilling onto the text.

v0.1.2

26 Dec 23:14
Compare
Choose a tag to compare

This release includes the following changes:

  • New way to get the metadata of the article.

v0.1.1

26 Dec 12:18
Compare
Choose a tag to compare

This release includes the following changes:

  • Small fix to clean style tags after creating the final article

First non-alpha version

24 Dec 11:56
Compare
Choose a tag to compare

Happy Holidays!

I've finally managed to port 100% of the code and make (most) of the test cases to pass! There's a lot of work to do but the current release behaves mostly as the original JS project.

Enjoy!

Lots of progress!

26 Nov 00:13
Compare
Choose a tag to compare
Lots of progress! Pre-release
Pre-release

We are getting closer to be a 100% complete port of Readability.js!

  • Added prepArticle to remove junk after selecting the top candidates.
  • Added a function to restore score after selecting top candidates. This basically works by scanning the data-readability tag and restoring the score to the contentScore variable. This is an horrible hack and should be removed once we ditch the Element interface of html-to-markdown and start extending the DOMDocument object.
  • Switched all strlen functions to mb_strlen
  • Fixed lots of bugs and pretty sure that introduced a bunch of new ones.

Last realease I'm using master as the main development branch

13 Nov 09:22
Compare
Choose a tag to compare

All the current development will be done in the develop branch.

First version

07 Nov 12:05
Compare
Choose a tag to compare
First version Pre-release
Pre-release

Pre release of the first version. Lots to do, lots to fix. But it's a nice start!