Skip to content

Conversation

@Dhruv-Maradiya
Copy link
Contributor

Fixes #1090 by updating the DOM parser to handle <br> elements and insert line breaks (\n) when converting HTML content to plain text.

Initially, I thought adding a simple condition might not be a reliable solution. So, I decided to check how HTML-to-text conversion is handled in Chromium and found a similar approach. Here's the link.


  • I’ve reviewed the contributor guide and applied the relevant portions to this PR.
Contribution guidelines:

Note that many Dart repos have a weekly cadence for reviewing PRs - please allow for some latency before initial review feedback.

@Dhruv-Maradiya Dhruv-Maradiya changed the title Fix(html): Handle <br> elements to insert line breaks in text Fix(html): Handle <br> elements to insert line breaks in text Dec 27, 2024
@kevmoo kevmoo requested a review from devoncarew December 28, 2024 02:55
@github-actions
Copy link

github-actions bot commented Dec 30, 2024

PR Health

Breaking changes ⚠️
Package Change Current Version New Version Needed Version Looking good?
html Breaking 0.15.6 0.15.7-wip 0.16.0
Got "0.15.7-wip" expected >= "0.16.0" (breaking changes)
⚠️

This check can be disabled by tagging the PR with skip-breaking-check.

Changelog Entry ✔️
Package Changed Files

Changes to files need to be accounted for in their respective changelogs.

This check can be disabled by tagging the PR with skip-changelog-check.

Coverage ✔️
File Coverage
pkgs/html/lib/dom.dart 💚 65 % ⬆️ 1 %

This check for test coverage is informational (issues shown here will not fail the PR).

This check can be disabled by tagging the PR with skip-coverage-check.

API leaks ⚠️

The following packages contain symbols visible in the public API, but not exported by the library. Export these symbols or remove them from your publicly visible API.

Package Leaked API symbol Leaking sources
html HtmlTokenizer html/parser.dart::HtmlParser::tokenizer
html Token tokenizer.dart::HtmlTokenizer
tokenizer.dart::HtmlTokenizer::tokenQueue
tokenizer.dart::HtmlTokenizer::currentToken
tokenizer.dart::HtmlTokenizer::currentToken
token.dart::TagToken
token.dart::DoctypeToken
token.dart::StringToken
tokenizer.dart::HtmlTokenizer::current
token.dart::StartTagToken
token.dart::CommentToken
html/parser.dart::Phase::processComment
html/parser.dart::Phase::processDoctype
token.dart::CharactersToken
html/parser.dart::Phase::processCharacters
token.dart::SpaceCharactersToken
html/parser.dart::Phase::processSpaceCharacters
html/parser.dart::Phase::processStartTag
html/parser.dart::Phase::startTagHtml
token.dart::EndTagToken
html/parser.dart::Phase::processEndTag
html/parser.dart::HtmlParser::inForeignContent::token
html/parser.dart::HtmlParser::parseRCDataRawtext::token
html/parser.dart::BeforeHeadPhase::startTagOther
html/parser.dart::BeforeHeadPhase::endTagImplyHead
html/parser.dart::InHeadPhase::startTagOther
html/parser.dart::InHeadPhase::endTagHtmlBodyBr
html/parser.dart::AfterHeadPhase::startTagOther
html/parser.dart::AfterHeadPhase::endTagHtmlBodyBr
html/parser.dart::InBodyPhase::startTagProcessInHead
html/parser.dart::InBodyPhase::startTagButton
html/parser.dart::InBodyPhase::startTagOther
html/parser.dart::InBodyPhase::endTagHtml
html/parser.dart::InTablePhase::startTagCol
html/parser.dart::InTablePhase::startTagImplyTbody
html/parser.dart::InTablePhase::startTagTable
html/parser.dart::InTablePhase::startTagStyleScript
html/parser.dart::InCaptionPhase::startTagTableElement
html/parser.dart::InCaptionPhase::startTagOther
html/parser.dart::InCaptionPhase::endTagTable
html/parser.dart::InCaptionPhase::endTagOther
html/parser.dart::InColumnGroupPhase::startTagOther
html/parser.dart::InColumnGroupPhase::endTagOther
html/parser.dart::InTableBodyPhase::startTagTableCell
html/parser.dart::InTableBodyPhase::startTagTableOther
html/parser.dart::InTableBodyPhase::startTagOther
html/parser.dart::InTableBodyPhase::endTagTable
html/parser.dart::InTableBodyPhase::endTagOther
html/parser.dart::InRowPhase::startTagTableOther
html/parser.dart::InRowPhase::startTagOther
html/parser.dart::InRowPhase::endTagTable
html/parser.dart::InRowPhase::endTagTableRowGroup
html/parser.dart::InRowPhase::endTagOther
html/parser.dart::InCellPhase::startTagTableOther
html/parser.dart::InCellPhase::startTagOther
html/parser.dart::InCellPhase::endTagImply
html/parser.dart::InCellPhase::endTagOther
html/parser.dart::InSelectPhase::startTagInput
html/parser.dart::InSelectPhase::startTagScript
html/parser.dart::InSelectPhase::startTagOther
html/parser.dart::InSelectInTablePhase::startTagTable
html/parser.dart::InSelectInTablePhase::startTagOther
html/parser.dart::InSelectInTablePhase::endTagTable
html/parser.dart::InSelectInTablePhase::endTagOther
html/parser.dart::AfterBodyPhase::startTagOther
html/parser.dart::AfterBodyPhase::endTagHtml::token
html/parser.dart::AfterBodyPhase::endTagOther
html/parser.dart::InFramesetPhase::startTagNoframes
html/parser.dart::InFramesetPhase::startTagOther
html/parser.dart::AfterFramesetPhase::startTagNoframes
html/parser.dart::AfterAfterBodyPhase::startTagOther
html/parser.dart::AfterAfterFramesetPhase::startTagNoFrames
html HtmlInputStream tokenizer.dart::HtmlTokenizer::stream
html TagToken tokenizer.dart::HtmlTokenizer::currentTagToken
token.dart::StartTagToken
token.dart::EndTagToken
html/parser.dart::InTableBodyPhase::startTagTableOther::token
html/parser.dart::InTableBodyPhase::endTagTable::token
html DoctypeToken tokenizer.dart::HtmlTokenizer::currentDoctypeToken
treebuilder.dart::TreeBuilder::insertDoctype::token
html/parser.dart::Phase::processDoctype::token
html StringToken tokenizer.dart::HtmlTokenizer::currentStringToken
token.dart::StringToken::add
treebuilder.dart::TreeBuilder::insertComment::token
token.dart::CommentToken
token.dart::CharactersToken
token.dart::SpaceCharactersToken
html/parser.dart::InBodyPhase::processSpaceCharactersDropNewline::token
html/parser.dart::InTableTextPhase::characterTokens
html TreeBuilder html/parser.dart::HtmlParser::tree
html/parser.dart::Phase::tree
html/parser.dart::HtmlParser::new::tree
html ActiveFormattingElements treebuilder.dart::TreeBuilder::activeFormattingElements
html StartTagToken treebuilder.dart::TreeBuilder::insertRoot::token
treebuilder.dart::TreeBuilder::createElement::token
treebuilder.dart::TreeBuilder::insertElement::token
treebuilder.dart::TreeBuilder::insertElementNormal::token
treebuilder.dart::TreeBuilder::insertElementTable::token
html/parser.dart::Phase::processStartTag::token
html/parser.dart::Phase::startTagHtml::token
html/parser.dart::HtmlParser::adjustMathMLAttributes::token
html/parser.dart::HtmlParser::adjustSVGAttributes::token
html/parser.dart::HtmlParser::adjustForeignAttributes::token
html/parser.dart::BeforeHeadPhase::startTagHead::token
html/parser.dart::BeforeHeadPhase::startTagOther::token
html/parser.dart::InHeadPhase::startTagHead::token
html/parser.dart::InHeadPhase::startTagBaseLinkCommand::token
html/parser.dart::InHeadPhase::startTagMeta::token
html/parser.dart::InHeadPhase::startTagTitle::token
html/parser.dart::InHeadPhase::startTagNoScriptNoFramesStyle::token
html/parser.dart::InHeadPhase::startTagScript::token
html/parser.dart::InHeadPhase::startTagOther::token
html/parser.dart::AfterHeadPhase::startTagBody::token
html/parser.dart::AfterHeadPhase::startTagFrameset::token
html/parser.dart::AfterHeadPhase::startTagFromHead::token
html/parser.dart::AfterHeadPhase::startTagHead::token
html/parser.dart::AfterHeadPhase::startTagOther::token
html/parser.dart::InBodyPhase::addFormattingElement::token
html/parser.dart::InBodyPhase::startTagProcessInHead::token
html/parser.dart::InBodyPhase::startTagBody::token
html/parser.dart::InBodyPhase::startTagFrameset::token
html/parser.dart::InBodyPhase::startTagCloseP::token
html/parser.dart::InBodyPhase::startTagPreListing::token
html/parser.dart::InBodyPhase::startTagForm::token
html/parser.dart::InBodyPhase::startTagListItem::token
html/parser.dart::InBodyPhase::startTagPlaintext::token
html/parser.dart::InBodyPhase::startTagHeading::token
html/parser.dart::InBodyPhase::startTagA::token
html/parser.dart::InBodyPhase::startTagFormatting::token
html/parser.dart::InBodyPhase::startTagNobr::token
html/parser.dart::InBodyPhase::startTagButton::token
html/parser.dart::InBodyPhase::startTagAppletMarqueeObject::token
html/parser.dart::InBodyPhase::startTagXmp::token
html/parser.dart::InBodyPhase::startTagTable::token
html/parser.dart::InBodyPhase::startTagVoidFormatting::token
html/parser.dart::InBodyPhase::startTagInput::token
html/parser.dart::InBodyPhase::startTagParamSource::token
html/parser.dart::InBodyPhase::startTagHr::token
html/parser.dart::InBodyPhase::startTagImage::token
html/parser.dart::InBodyPhase::startTagIsIndex::token
html/parser.dart::InBodyPhase::startTagTextarea::token
html/parser.dart::InBodyPhase::startTagIFrame::token
html/parser.dart::InBodyPhase::startTagRawtext::token
html/parser.dart::InBodyPhase::startTagOpt::token
html/parser.dart::InBodyPhase::startTagSelect::token
html/parser.dart::InBodyPhase::startTagRpRt::token
html/parser.dart::InBodyPhase::startTagMath::token
html/parser.dart::InBodyPhase::startTagSvg::token
html/parser.dart::InBodyPhase::startTagMisplaced::token
html/parser.dart::InBodyPhase::startTagOther::token
html/parser.dart::InTablePhase::startTagCaption::token
html/parser.dart::InTablePhase::startTagColgroup::token
html/parser.dart::InTablePhase::startTagCol::token
html/parser.dart::InTablePhase::startTagRowGroup::token
html/parser.dart::InTablePhase::startTagImplyTbody::token
html/parser.dart::InTablePhase::startTagTable::token
html/parser.dart::InTablePhase::startTagStyleScript::token
html/parser.dart::InTablePhase::startTagInput::token
html/parser.dart::InTablePhase::startTagForm::token
html/parser.dart::InTablePhase::startTagOther::token
html/parser.dart::InCaptionPhase::startTagTableElement::token
html/parser.dart::InCaptionPhase::startTagOther::token
html/parser.dart::InColumnGroupPhase::startTagCol::token
html/parser.dart::InColumnGroupPhase::startTagOther::token
html/parser.dart::InTableBodyPhase::startTagTr::token
html/parser.dart::InTableBodyPhase::startTagTableCell::token
html/parser.dart::InTableBodyPhase::startTagOther::token
html/parser.dart::InRowPhase::startTagTableCell::token
html/parser.dart::InRowPhase::startTagTableOther::token
html/parser.dart::InRowPhase::startTagOther::token
html/parser.dart::InCellPhase::startTagTableOther::token
html/parser.dart::InCellPhase::startTagOther::token
html/parser.dart::InSelectPhase::startTagOption::token
html/parser.dart::InSelectPhase::startTagOptgroup::token
html/parser.dart::InSelectPhase::startTagSelect::token
html/parser.dart::InSelectPhase::startTagInput::token
html/parser.dart::InSelectPhase::startTagScript::token
html/parser.dart::InSelectPhase::startTagOther::token
html/parser.dart::InSelectInTablePhase::startTagTable::token
html/parser.dart::InSelectInTablePhase::startTagOther::token
html/parser.dart::InForeignContentPhase::adjustSVGTagNames::token
html/parser.dart::AfterBodyPhase::startTagOther::token
html/parser.dart::InFramesetPhase::startTagFrameset::token
html/parser.dart::InFramesetPhase::startTagFrame::token
html/parser.dart::InFramesetPhase::startTagNoframes::token
html/parser.dart::InFramesetPhase::startTagOther::token
html/parser.dart::AfterFramesetPhase::startTagNoframes::token
html/parser.dart::AfterFramesetPhase::startTagOther::token
html/parser.dart::AfterAfterBodyPhase::startTagOther::token
html/parser.dart::AfterAfterFramesetPhase::startTagNoFrames::token
html/parser.dart::AfterAfterFramesetPhase::startTagOther::token
html TagAttribute token.dart::StartTagToken::attributeSpans
html CommentToken html/parser.dart::Phase::processComment::token
html CharactersToken html/parser.dart::Phase::processCharacters::token
html/parser.dart::InTablePhase::insertText::token
html SpaceCharactersToken html/parser.dart::Phase::processSpaceCharacters::token
html EndTagToken html/parser.dart::Phase::processEndTag::token
html/parser.dart::Phase::popOpenElementsUntil::token
html/parser.dart::BeforeHeadPhase::endTagImplyHead::token
html/parser.dart::BeforeHeadPhase::endTagOther::token
html/parser.dart::InHeadPhase::endTagHead::token
html/parser.dart::InHeadPhase::endTagHtmlBodyBr::token
html/parser.dart::InHeadPhase::endTagOther::token
html/parser.dart::AfterHeadPhase::endTagHtmlBodyBr::token
html/parser.dart::AfterHeadPhase::endTagOther::token
html/parser.dart::InBodyPhase::endTagP::token
html/parser.dart::InBodyPhase::endTagBody::token
html/parser.dart::InBodyPhase::endTagHtml::token
html/parser.dart::InBodyPhase::endTagBlock::token
html/parser.dart::InBodyPhase::endTagForm::token
html/parser.dart::InBodyPhase::endTagListItem::token
html/parser.dart::InBodyPhase::endTagHeading::token
html/parser.dart::InBodyPhase::endTagFormatting::token
html/parser.dart::InBodyPhase::endTagAppletMarqueeObject::token
html/parser.dart::InBodyPhase::endTagBr::token
html/parser.dart::InBodyPhase::endTagOther::token
html/parser.dart::TextPhase::endTagScript::token
html/parser.dart::TextPhase::endTagOther::token
html/parser.dart::InTablePhase::endTagTable::token
html/parser.dart::InTablePhase::endTagIgnore::token
html/parser.dart::InTablePhase::endTagOther::token
html/parser.dart::InCaptionPhase::endTagCaption::token
html/parser.dart::InCaptionPhase::endTagTable::token
html/parser.dart::InCaptionPhase::endTagIgnore::token
html/parser.dart::InCaptionPhase::endTagOther::token
html/parser.dart::InColumnGroupPhase::endTagColgroup::token
html/parser.dart::InColumnGroupPhase::endTagCol::token
html/parser.dart::InColumnGroupPhase::endTagOther::token
html/parser.dart::InTableBodyPhase::endTagTableRowGroup::token
html/parser.dart::InTableBodyPhase::endTagIgnore::token
html/parser.dart::InTableBodyPhase::endTagOther::token
html/parser.dart::InRowPhase::endTagTr::token
html/parser.dart::InRowPhase::endTagTable::token
html/parser.dart::InRowPhase::endTagTableRowGroup::token
html/parser.dart::InRowPhase::endTagIgnore::token
html/parser.dart::InRowPhase::endTagOther::token
html/parser.dart::InCellPhase::endTagTableCell::token
html/parser.dart::InCellPhase::endTagIgnore::token
html/parser.dart::InCellPhase::endTagImply::token
html/parser.dart::InCellPhase::endTagOther::token
html/parser.dart::InSelectPhase::endTagOption::token
html/parser.dart::InSelectPhase::endTagOptgroup::token
html/parser.dart::InSelectPhase::endTagSelect::token
html/parser.dart::InSelectPhase::endTagOther::token
html/parser.dart::InSelectInTablePhase::endTagTable::token
html/parser.dart::InSelectInTablePhase::endTagOther::token
html/parser.dart::AfterBodyPhase::endTagOther::token
html/parser.dart::InFramesetPhase::endTagFrameset::token
html/parser.dart::InFramesetPhase::endTagOther::token
html/parser.dart::AfterFramesetPhase::endTagHtml::token
html/parser.dart::AfterFramesetPhase::endTagOther::token

This check can be disabled by tagging the PR with skip-leaking-check.

License Headers ⚠️
// Copyright (c) 2025, the Dart project authors. Please see the AUTHORS file
// for details. All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.
Files
pkgs/html/lib/dom.dart
pkgs/html/test/parser_feature_test.dart

All source files should start with a license header.

Unrelated files missing license headers
Files
pkgs/bazel_worker/benchmark/benchmark.dart
pkgs/benchmark_harness/integration_test/perf_benchmark_test.dart
pkgs/boolean_selector/example/example.dart
pkgs/clock/lib/clock.dart
pkgs/clock/lib/src/clock.dart
pkgs/clock/lib/src/default.dart
pkgs/clock/lib/src/stopwatch.dart
pkgs/clock/lib/src/utils.dart
pkgs/clock/test/clock_test.dart
pkgs/clock/test/default_test.dart
pkgs/clock/test/stopwatch_test.dart
pkgs/clock/test/utils.dart
pkgs/coverage/lib/src/coverage_options.dart
pkgs/html/example/main.dart
pkgs/html/lib/dom_parsing.dart
pkgs/html/lib/html_escape.dart
pkgs/html/lib/parser.dart
pkgs/html/lib/src/constants.dart
pkgs/html/lib/src/encoding_parser.dart
pkgs/html/lib/src/html_input_stream.dart
pkgs/html/lib/src/list_proxy.dart
pkgs/html/lib/src/query_selector.dart
pkgs/html/lib/src/token.dart
pkgs/html/lib/src/tokenizer.dart
pkgs/html/lib/src/treebuilder.dart
pkgs/html/lib/src/utils.dart
pkgs/html/test/dom_test.dart
pkgs/html/test/parser_test.dart
pkgs/html/test/query_selector_test.dart
pkgs/html/test/selectors/level1_baseline_test.dart
pkgs/html/test/selectors/level1_lib.dart
pkgs/html/test/selectors/selectors.dart
pkgs/html/test/support.dart
pkgs/html/test/tokenizer_test.dart
pkgs/html/test/trie_test.dart
pkgs/html/tool/generate_trie.dart
pkgs/pubspec_parse/test/git_uri_test.dart
pkgs/stack_trace/example/example.dart
pkgs/watcher/test/custom_watcher_factory_test.dart
pkgs/yaml_edit/example/example.dart

This check can be disabled by tagging the PR with skip-license-check.

@mosuem mosuem requested review from HosseinYousefi and removed request for devoncarew April 17, 2025 12:38
@Dhruv-Maradiya
Copy link
Contributor Author

Hey, thanks for reviewing this! 🙌
It’s been a few months since I worked on it, and I was still getting familiar with the codebase at the time — so I’ll need to refresh myself on the changes.
I'll take a look as soon as I can. Appreciate your feedback!

@mosuem
Copy link
Member

mosuem commented Oct 7, 2025

@Dhruv-Maradiya Just a friendly ping as I am looking through PRs - is there intention to land this?

@Dhruv-Maradiya
Copy link
Contributor Author

Dhruv-Maradiya commented Oct 8, 2025

Hey @mosuem, sorry for the delay! I’ll try to wrap this up ASAP, most likely today.

@mosuem
Copy link
Member

mosuem commented Oct 13, 2025

Friendly ping :) (No pressure, just happened to walk by this tab in my browser)

Implements DOM spec textContent algorithm with optional convertBRsToNewlines parameter. Adds isElementBr() helper for namespace-aware BR detection. Maintains backward compatibility with existing .text getter.
@github-actions
Copy link

Package publishing

Package Version Status Publish tag (post-merge)
package:bazel_worker 1.1.4 already published at pub.dev
package:benchmark_harness 2.4.0-wip WIP (no publish necessary)
package:boolean_selector 2.1.2 already published at pub.dev
package:browser_launcher 1.1.3 already published at pub.dev
package:cli_config 0.2.1-wip WIP (no publish necessary)
package:cli_util 0.5.0-wip WIP (no publish necessary)
package:clock 1.1.3-wip WIP (no publish necessary)
package:code_builder 4.11.0 already published at pub.dev
package:coverage 1.15.0 already published at pub.dev
package:csslib 1.0.2 already published at pub.dev
package:extension_discovery 2.1.0 already published at pub.dev
package:file 7.0.2-wip WIP (no publish necessary)
package:file_testing 3.1.0-wip WIP (no publish necessary)
package:glob 2.1.3 already published at pub.dev
package:graphs 2.3.3-wip WIP (no publish necessary)
package:html 0.15.7-wip WIP (no publish necessary)
package:io 1.1.0-wip WIP (no publish necessary)
package:json_rpc_2 4.0.0 already published at pub.dev
package:markdown 7.3.1-wip WIP (no publish necessary)
package:mime 2.0.0 already published at pub.dev
package:oauth2 2.0.4 ready to publish oauth2-v2.0.4
package:package_config 2.3.0-wip WIP (no publish necessary)
package:pool 1.5.2 already published at pub.dev
package:process 5.0.5 already published at pub.dev
package:pub_semver 2.2.0 already published at pub.dev
package:pubspec_parse 1.5.1-wip WIP (no publish necessary)
package:source_map_stack_trace 2.1.3-wip WIP (no publish necessary)
package:source_maps 0.10.14-wip WIP (no publish necessary)
package:source_span 1.10.1 already published at pub.dev
package:sse 4.1.8 already published at pub.dev
package:stack_trace 1.12.1 already published at pub.dev
package:stream_channel 2.1.4 already published at pub.dev
package:stream_transform 2.1.2-wip WIP (no publish necessary)
package:string_scanner 1.4.1 already published at pub.dev
package:term_glyph 1.2.3-wip WIP (no publish necessary)
package:test_reflective_loader 0.4.0 already published at pub.dev
package:timing 1.0.2 already published at pub.dev
package:unified_analytics 8.0.6 ready to publish unified_analytics-v8.0.6
package:watcher 1.1.5-wip WIP (no publish necessary)
package:yaml 3.1.3 already published at pub.dev
package:yaml_edit 2.2.2 already published at pub.dev

Documentation at https://github.com/dart-lang/ecosystem/wiki/Publishing-automation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

<br/> Tag does not product /n

3 participants