-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for HTML 5 #282
Comments
Maybe https://github.com/HtmlUnit/htmlunit-neko is of help here.
Because my time is limited i can't provide a impl but i will support this if you like... |
Because neko fixes many issues of real world documents, i was also able to open https://www.htmlunit.org/ |
@rbri I would like to encourage you to make a pull request which allows using the neko-htmlunit html parser in Flying Saucer, in a default way without any hassle configuration, because this html parser is clearly better than the current xml sax parser. This could make it much easier to recommend using Flying Saucer to the developers in the company I work, because at the moment FS is no good because it only supports strict xhtml and developers look for alternatives to FS now. |
@andreasrosdal PR is there ;-) if guess we need some discussion about the right way to do it (maybe a service and a different subproject?) |
FS should support HTML 5.
To update the
flyingsaucerproject/flyingsaucer
library for essential HTML5 support, focus on key areas that are most impactful for modern web document standards: (chatgpt suggestions:)HTML5 Parsing: Integrate an HTML5-compliant parser to accurately handle HTML5 documents. This is crucial for recognizing new semantic elements and properly parsing the document structure.
CSS3 Enhancements: Update the CSS rendering engine to support important CSS3 features such as flexbox for layout, media queries for responsive design, and transitions for visual effects. These are foundational for modern web design practices.
Semantic Elements Support: Specifically target support for new semantic elements like
<article>
,<section>
,<nav>
,<header>
,<footer>
, and<figure>
. Ensuring these elements are correctly interpreted and rendered is essential for modern web documents.Form Controls and Input Types: Enhance support for the new form elements and input types introduced in HTML5. This includes types like
email
,date
,range
, andcolor
, which are increasingly used in web forms.JavaScript Interface: Since HTML5 relies on JavaScript for dynamic content, consider how
flyingsaucer
might either interface with JavaScript or provide hooks for external JavaScript interaction, especially for form validation and handling new input types.Test Suite for HTML5: Develop a targeted test suite focusing on HTML5 features to ensure compatibility and adherence to standards. Utilize parts of the W3C HTML5 Test Suite for comprehensive coverage.
Documentation and Modular Approach: Update documentation to reflect the support for HTML5 and consider a modular approach for HTML5 features, allowing users to enable specific functionalities as needed. This strategy helps in managing performance implications and maintains backward compatibility.
By concentrating on these aspects,
flyingsaucer
can significantly improve its HTML5 support, aligning it with current web standards and enhancing its utility for modern web document rendering.Integrating an HTML5-compliant parser into the
flyingsaucerproject/flyingsaucer
library involves several detailed steps to ensure accurate handling of HTML5 documents. These steps are crucial for recognizing new semantic elements and properly parsing the document structure:Evaluate Existing Parser: Assess the capabilities and limitations of the current parsing mechanism in
flyingsaucer
to understand how it handles HTML and where it falls short with HTML5 content.Select an HTML5 Parser: Choose an HTML5-compliant parser that can be integrated into
flyingsaucer
. Popular Java-based parsers like Jsoup or HTMLUnit have strong support for HTML5 and offer a good balance between performance and ease of use.https://www.w3.org/TR/2011/WD-html5-20110405/
https://html.spec.whatwg.org/
Possibly some implementation details can be copied from:
https://github.com/openhtmltopdf/openhtmltopdf/
The text was updated successfully, but these errors were encountered: