- New features
- Added support for Scala 2.13.
- Deprecations
ProxyUtils
was deprecated in favor of setting proxy servers perBrowser
instance (see below);
- New features
JsoupBrowser
andHtmlUnitBrowser
can now be created with proxy settings that are applied only to the created instance, superseeding the usage ofProxyUtils
;- Added a new
table
context extractor allowing the extraction of cells from HTML tables.
- Breaking changes
- Extracting using a CSS query string as extractor will now extract elements instead of text. This allows easier
chaining of extractors and CSS selectors and fits more nicely the current extractor model. The old behavior can be
recovered by wrapping the CSS query string in the
texts
content extractor, e.g.doc >> texts("myQuery")
; HtmlExtractor
,HtmlValidator
andElementQuery
now have an additional type parameter for the type ofElement
they work on. If you have custom instances of one of those classes, filling the missing parameter withElement
(which is a superclass of all elements) should be enough for them to work with all source code using scala-scraper 1.x;- Methods for loading extractors and validators from a config were extracted to a separate module. In order to use
them users must add
scala-scraper-config
to their SBT dependencies and importnet.ruippeixotog.scalascraper.config.dsl.DSL._
; - The implicit conversion of
Validated/Either
to aRightProjection
in order to exposeforeach
,map
andflatMap
in for comprehensions was moved to a separate object that is not imported together with the DSL. Either upgrade to Scala 2.12 (in whichEither
is already right-biased) or import the newnet.ruippeixotog.scalascraper.util.EitherRightBias
support object;
- Extracting using a CSS query string as extractor will now extract elements instead of text. This allows easier
chaining of extractors and CSS selectors and fits more nicely the current extractor model. The old behavior can be
recovered by wrapping the CSS query string in the
- Deprecations
SimpleExtractor
andSimpleValidator
are now deprecated. The classes remain available for the time being, but DSL methods that returned those classes now return onlyHtmlExtractor
andHtmlValidator
instances;- The
Validated
type alias is now deprecated. Users should now useEither
,Right
andLeft
directly; - The
asDate
content parser was deprecated in favor ofasLocalDate
andasDateTime
; - The DSL validation operator
~/~
was renamed to>/~
in order to have the same precedence as the extraction operators>>
and>?>
; - The
and
DSL operator is deprecated and will be removed in future versions;
- New features
- The concrete type of the models in scala-scraper is now passed down from the
Browser
toElement
instances extracted from documents. This allows users to use features unique of each browser (such as modifying or interacting with elements) while still using the scala-scraper DSL to exteact and query them; HtmlExtractor[E, A]
is now a proper instance ofElementQuery[E] => A
and havemap
andmapQuery
methods to map the extraction results and the preceding query, respectively;- Content extractors, which were previously just functions, are now full-fledged
HtmlExtractor
instances and can be used by themselves, e.g.doc >> elements
,doc >> elementList("myQuery") >> formData
; - A new
PolyHtmlExtractor
class was created, allowing the implementation of extractors whose return type depends on the type of the element or document being extracted; - Overall code cleanup and simplification of some concepts.
- The concrete type of the models in scala-scraper is now passed down from the
- Bug fixes
- Fix type parameter usage in three-arg
>?>
DSL operator.
- Fix type parameter usage in three-arg
- New features
- Support for Scala 2.12;
- New method
closeAll
inHtmlUnitBrowser
, for closing opened windows; - New model
Node
representing a DOM node - in this library, either aElementNode
or aTextNode
; - New methods
childNodes
andsiblingNodes
inElement
.
- New features
- New methods
clearCookies
,parseInputStream
andparseResource
inBrowser
; - New methods
hasAttr
andsiblings
inElement
; - Support for SOCKS proxies.
- New methods
- Bug fixes
- Correct handling of missing name and value attributes in the
formData
extractor.
- Correct handling of missing name and value attributes in the
First stable version.