This repository has been archived by the owner on Aug 14, 2021. It is now read-only.
Releases: andreskrey/readability.php
Releases · andreskrey/readability.php
v0.2.1
- Added
normalizeEntities
flag to convert UTF-8 characters to its HTML Entity equivalent. Fixes bugs on htmls with mixed encoding. - Added more information to the readme.md file
- New way to create a backup DOM: not creating a backup. In the previous version, the system cloned the $this->dom object to keep it as a backup in order to restart the algorithm with other flags, if needed. This seemed to work until I realized that sometimes the backup changes even if we are not touching it. Seems that the
dom
andbackupdom
objects are linked and some changes on the dom object reach the bakcupdom object. The new approach consists in deleting the backupdom object and recreating from scratch the dom object. Of course this has a performance impact, but seems to be quite low.
v0.2.0
We ARE a 100% complete port of Readability.js!
- Every test unit passes
- Readability.php produces the same exact output as Readability.js
- I'm happy :)
Fixed
- Lots of bugs
- Merged PR by DavidFricker to avoid exceptions while grabbing the document content
Added
- substituteEntities flag, to avoid replacing especial characters with HTML entities. There's nothing we can do about
, that entity is replaced by libxml and there's no way to disable it. - Named data sets so it's easier to detect which test case is failing.
Removed
- Couple of test cases that involved broken JS. There's nothing we can do about JS spilling onto the text.
v0.1.2
This release includes the following changes:
- New way to get the metadata of the article.
v0.1.1
This release includes the following changes:
- Small fix to clean style tags after creating the final article
First non-alpha version
Happy Holidays!
I've finally managed to port 100% of the code and make (most) of the test cases to pass! There's a lot of work to do but the current release behaves mostly as the original JS project.
Enjoy!
Lots of progress!
We are getting closer to be a 100% complete port of Readability.js!
- Added prepArticle to remove junk after selecting the top candidates.
- Added a function to restore score after selecting top candidates. This basically works by scanning the data-readability tag and restoring the score to the contentScore variable. This is an horrible hack and should be removed once we ditch the Element interface of html-to-markdown and start extending the DOMDocument object.
- Switched all strlen functions to mb_strlen
- Fixed lots of bugs and pretty sure that introduced a bunch of new ones.
Last realease I'm using master as the main development branch
All the current development will be done in the develop branch.
First version
Pre release of the first version. Lots to do, lots to fix. But it's a nice start!