Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix broken tables whose msgstr is a specific type of HTML #253

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

shinmili
Copy link

@shinmili shinmili commented Feb 8, 2025

If an msgstr for a table cell contains HTML that could be parsed as an HTML block if it were put in the top-level, the table is rendered incorrectly. The table row containing the cell gets interrupted in the middle and split into two (or more, if there are more such cells) incomplete rows.

Close #251.

If an msgstr for a table cell contains HTML that could be parsed as an
HTML block if it were put in the top-level, the table is rendered
incorrectly. The table row containing the cell gets interrupted in the
middle and split into two (or more, if there are more such cells)
incomplete rows.

Close google#251.
Copy link

google-cla bot commented Feb 8, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@codecov-commenter
Copy link

codecov-commenter commented Feb 11, 2025

Codecov Report

Attention: Patch coverage is 95.23810% with 3 lines in your changes missing coverage. Please review.

Project coverage is 85.37%. Comparing base (886f15e) to head (f9ae9a6).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
i18n-helpers/src/lib.rs 91.89% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #253      +/-   ##
==========================================
+ Coverage   84.74%   85.37%   +0.62%     
==========================================
  Files          15       16       +1     
  Lines        3377     3527     +150     
  Branches     3377     3527     +150     
==========================================
+ Hits         2862     3011     +149     
+ Misses        413      412       -1     
- Partials      102      104       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mgeisler mgeisler requested review from dyoo and mgeisler March 18, 2025 09:26
Comment on lines 155 to 160
Event::SoftBreak => Event::Text(" ".into()),
// Shortcut links like "[foo]" end up as "[foo]"
// in output. By changing them to a reference
// link, the link is expanded on the fly and the
// output becomes self-contained.
Event::Start(tag @ (Tag::Link { .. } | Tag::Image { .. })) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is duplicating some existing functionality, right? I feel I remember seeing this before 😄

If so, we should pull this filtering code out into its own function so that we can call it from here as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, some parts are copied from the _ => arm below. I'll try to making such function.

@@ -137,6 +140,33 @@ pub fn extract_events<'a>(text: &'a str, state: Option<State<'a>>) -> Vec<(usize
.enumerate()
.map(|(idx, line)| (idx + 1, Event::Text(line.into())))
.collect(),
// If we're in a table cell, we put the text in a minimal table, parse the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question about the minimal table: what happens if the text contains a | character?

I am not 100% sure what the Commonmark rules are for escaping pipes in tables — can you add another test to document the behavior if it's supported? If it's not supported, I would appreciate a comment here saying that we're sure text doensn't contain | because we know we're in a table context.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say it's basically supported, so I'll add a test for that.

Technically, I can't say text doesn't contain |, because translators could write arbitrary msgstrs in .po... As long as they don't introduce any non-escaped |s, it works fine.

"\
| Icon | Description |\n\
|-------------------------------------------------|-------------------|\n\
| <img src=\"some-icon.svg\" alt=\"Some icon.\"/> | Some description. |",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea: consider "translating the src attribute and dropping the alt attribute to make the example smaller. I believe one attribute should be enough to demonstrate this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I just intended to write some practical example, but it isn't really necessary for testing.

@mgeisler
Copy link
Collaborator

Hi @shinmili, thank you so much for putting this up! I think this looks good and I left some minor comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Broken table with <img> tag
3 participants