Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of 8bit message content and message/global #4926

Merged
merged 33 commits into from
Jul 26, 2024
Merged

Conversation

rsto
Copy link
Member

@rsto rsto commented May 23, 2024

This updates how Cyrus handles messages with 8bit content, both in headers and body parts values. This PR consists of a number of commits, some of which are refactoring only. The main functional changes are listed below:

  • JMAP Email/set{create} now uses the same logic for serializing JMAP Blobs and JMAP EmailBodyValues to MIME. It updates how the content transfer encoding for MIME bodies is chosen and rewrites inline plain text if necessary.
  • JMAP Email/set{create} and Email/{get} now both NFC-normalize internationalized email addresses.
  • JMAP Email/{get} also NFC-normalizes header:Xxx:asText property values. JMAP Email/set{create} still q-encodes header:Xxx:asText property values but NFC-normalizes the value before encoding.
  • JMAP Email/query and squatter now support internationalized email address domains queries in both Unicode and ASCII form.
  • The minor mailbox cache now stores the content transfer encoding of body parts having type message/rfc822 or message/global. The cache version is updated to 13.

Of all changes, the ones done to the MIME serialization of JMAP Emails is the most critical: once the MIME message got generated its JMAP Email input as supplied by the client gets thrown away. Accordingly, this PR contains a bunch of test cases to trigger each special condition in the JMAP Email to MIME logic.

@rsto rsto marked this pull request as draft May 23, 2024 10:49
@rsto rsto force-pushed the cyrus_eai branch 4 times, most recently from 135b679 to 41c3c3e Compare May 27, 2024 15:11
@rsto rsto marked this pull request as ready for review May 28, 2024 13:25
@ksmurchison
Copy link
Contributor

@rsto I tried to give this a good review while on a train to Rome, but its been a long day and I couldn't focus enough to give it justice. I probably can't get back to this until my flight home on June 8.

If you need it reviewed sooner, maybe someone else on the team can take a look.

Copy link
Contributor

@elliefm elliefm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a few nits and questions though.

I'm not sure my review of the Xapian changes was quite comprehensive, I don't use C++ much, so I won't notice subtle mistakes in it.

cassandane/tiny-tests/JMAPEmail/email_set_text_crlf Outdated Show resolved Hide resolved
imap/jmap_mail.c Outdated Show resolved Hide resolved
imap/jmap_mail.c Outdated Show resolved Hide resolved
imap/message.c Outdated Show resolved Hide resolved
@rsto
Copy link
Member Author

rsto commented Jul 9, 2024

This now also changes the Xapian backend to normalize any text using NFKC with CaseFolding before indexing and querying. It bumps the Xapian index version to 17. The default stemmer keeps using lowercase Cyrus Search Form.

If any of the existing indexes contains version 16 or less, then queries search for any terms that either match the NFKC-normalized term or the unnormalized term.

Copy link
Contributor

@ksmurchison ksmurchison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rsto
Copy link
Member Author

rsto commented Jul 17, 2024

This now also includes #4977

@rsto rsto force-pushed the cyrus_eai branch 2 times, most recently from f635f45 to 0116bdf Compare July 24, 2024 14:14
rsto added 3 commits July 25, 2024 12:40
Turns out that most uses of this function force a newline
before the parameter anyway, so don't require the caller
to provide the header prefix length.

Signed-off-by: Robert Stepanek <[email protected]>
It passed a non-ascii blob as text/plain content.
rsto added 25 commits July 25, 2024 12:40
Up until now the Content-Transfer-Encoding header for message/rfc822
attachments was ignored. That is fine for rfc822 but we want to treat
message/rfc822 and message/global equally, and the global subtype
does allow its content to be encoded.

This patch changes the parse code to not throw away the encoding
identifier of a message/rfc822 body part before storing it in cache.
It bumps the mailbox minor cache version to indicate this and handles
previous versions on read.
The User-Agent header is not defined for MIME messages, it only
is defined for NetNews and HTTP. Use "X-Mailer" instead which
at least is mentioned in RFC 2076.
This changes the Xapian backend to normalize any text using NFKC with
CaseFolding before indexing and querying. It bumps the Xapian index
version to 17. The default stemmer keeps using lowercase Cyrus Search Form.

If any of the existing indexes contains version 16 or less, then queries
search for any terms that either match the NFKC-normalized term or the
unnormalized term.
It really didn't belong to global.h in the first place.
Up until now, Cyrus did not convert the contents of header fields
and a couple of iCalendar properties to search form. This required
us to convert those part contents to lower case in the Xapian
wrapper, but we rather want this all proper NFKC-casefolded.

Calling charset_convert on these values before sending them to
the indexer makes sure all indexed text is in proper form.
The query runner might rewrite the search query before running
it, so report the query costs after that.
Copy link
Member

@brong brong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great work, thanks! Is it at a good point for merging now?

@rsto rsto merged commit 59fef69 into master Jul 26, 2024
1 check passed
@rsto
Copy link
Member Author

rsto commented Jul 26, 2024

Let's go for it. Merged.

@rsto rsto deleted the cyrus_eai branch July 26, 2024 06:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants