Skip to content

Carriage Return 0x0d rewritten in cell values #50

@ajtucker

Description

@ajtucker

We've had a few source CSV files with character sequence CR CR LF inside cells, used to denote a single line break. The resulting sequence in dclib — i.e. when dealing with values in templates — is LF LF, essentially doubling the line breaks. This becomes a problem when users are trying to separate paragraphs in a description, for instance, where the usual practice is to use a double line break to separate paragraphs, which comes out as 4 LF characters and often gets turned into two <br />s in the resulting HTML.

@skwlilac and I tracked this down to the CSV parser opencsv (version 2.3) which underneath uses Java's BufferedReader to iterate over lines of text, where lines are delimited by either CR, LF, or CR LF. As far as the CSV parser is concerned, if a line ends in the middle of a quoted cell, then it adds back a LF and continues reading the value.

This dependency comes from lib version 2.0.0, which in turn comes from appbase 2.0.0.

opencsv looks to have had a number of forks and owners over the intervening years, but does now say that it deals properly with CR characters in values

While this character sequence shouldn't be being used in the first place, I'd argue that the parser shouldn't be interpreting characters in cell values and should pass things through verbatim for processing within templates.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions