Skip to content

Conversation

@spassarop
Copy link
Collaborator

I have been modifying the properties files of the bundle for error messages saving them as UTF-8 without having explicit \uXXXX format on non-ASCII characters. However I am not sure these changes are enough because of the messages I get printed on standard output with broken characters.

I need someone else with more practical knowledge about configuration+runtime on Java to point out what else is needed in this PR. I tried to follow what was discussed on #456.

@kwwall
Copy link
Contributor

kwwall commented Mar 9, 2025

However I am not sure these changes are enough because of the messages I get printed on standard output with broken characters.

Can you be more elaborate. How are you trying to print these out on stdout? But just running something like (say):

$ cat src/main/resources/AntiSamy_de_DE.properties

or via some specific JUnit test or what? Jut pure speculation, but there may be environment variables set that potentially affect the output. For example, on Linux Mint 21.3:

$ env | grep -i LANG
GDM_LANG=en_US
LANG=en_US.UTF-8
LANGUAGE=en_US

I'm not sure if any of those affect what the output looks like, but I'd guess that $LANG potentially does. If you can describe one test that you are running where it's not give the expected results and you list the expected output, I can see if I can provide any additional insight. If not, I maybe can ask Matt, as he seems to understand code points way better than I do.

@davewichers
Copy link
Collaborator

@spassarop - are you going to try to research further/address @kwwall's comments?

@spassarop
Copy link
Collaborator Author

I am using Windows :I

I saved all properties files with VSCode as UTF-8 after opening them explicitly as ISO-8859-1, some of them required manual changes so I could visually understand they were being saved correctly. It was not necessary for all languages.

Ended up running this with JUnit to get the characters right:

    Properties properties = new Properties();
    try (java.io.InputStreamReader reader = new java.io.InputStreamReader(
            getClass().getClassLoader().getResourceAsStream("AntiSamy_it_IT.properties"),
            java.nio.charset.StandardCharsets.UTF_8)) {
      properties.load(reader);
    } catch (IOException e) {
      throw new RuntimeException(e);
    }
    properties.forEach((key, value) -> System.out.println(key + " = " + value));

That printed OK on the IntelliJ IDEA console. But then I retried my initial approach of printing the messages bundle with the right nomenclature in the parameters and it also worked:

    messages = ResourceBundle.getBundle("AntiSamy", new Locale("zh", "CN"));
    Enumeration<String> keys = messages.getKeys();
    while (keys.hasMoreElements()) {
      String key = keys.nextElement();
      String value = messages.getString(key);
      System.out.println(key + " = " + value);
    }

image

I call it a win. But the I don't like the whole works-on-my-machine results. Is there something better to test?

@kwwall
Copy link
Contributor

kwwall commented Mar 29, 2025

@spassarop wrote:

I call it a win. But the I don't like the whole works-on-my-machine results. Is there something better to test?

Ha! That's Java for you. Write one, test everywhere.

Seriously, I think the only thing you can do is to ask @GodMeowIceSun to test it via your revised code and JUnit test. Since he (?) is the one who created issue #456 , who better to verify it? Another alternative is if you have a mailing list or OWASP Slack channel, maybe as someone who has their Java runtime environment configured for Chinese characters to take a crack at testing it. But beyond that, I've got nothing. I think you've already done due diligence on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants