Skip to content

Conversation

Dankirk
Copy link

@Dankirk Dankirk commented Sep 13, 2025

Description

  • Sets C runtime locale to system locale with UTF-8 codepage on Windows.
    This has always been default behavior on unix, but Windows defaults to minimal 'C' locale.

    • LC_NUMERIC is still set to "C". Now on all platforms instead of just unix.
      This is so decimal point is a dot (not a comma) for string <-> float conversions.
  • The configured CRT locale is copied to be the default std::locale for C++.
    All platforms have been using minimal "C" until now. This change affects new facet and ios_base instances without a specified locale or imbue() call.

  • OBS Studio language setting no longer changes QLocale's default locale and instead always uses system locale.
    This gives conformity with non Qt functions, but most importantly is likely what user wants as well. Ie. sorting and formatting functions should follow OS locale rules instead of OBS Studio translations language. (Reverts c4840dd)

  • obs_get_locale() still returns OBS language locale, which is used for Python and LUA apis, GDI+ text widget transformations, and HTTP accepted languages header.

Motivation and Context

Locale-aware operations like sorting and time formatting in C are not available on Windows, but are on unix, as pointed out in PR #12577.
Fixes #11133

The C++ locale and QLocale changes make the locale-aware functions of all layers work in similiar fashion.

For example: On unix currently the used locales are: OS locale for CRT, minimal "C" for C++ and OBS language for QLocale.
A weekday name can be in three different languages depending if you used strftime(), std::time_get facet or QLocale.
This makes string transformations between C, C++ and Qt very tricky.

How Has This Been Tested?

An important point is that the CRT locale settings have always been different for unix and Windows, which suggests there aren't any insurmountable problems with that change. Windows specific functions should be tested for CRT locale. Changes for C++ and QLocale defaults affect all platforms.

Searched the codebase for things and addressed as necessary:

  • CRT: ctype.h character classification function parameters and expected return values
  • CRT: strftime() formatting with % placeholders
  • CRT: scanf() and printf() formatting with % placeholders
  • CRT: FILE operations
  • C++: fstream operations
  • C++: facet locale usage
  • QLocale: Expected return values of formatting functions
  • QLocale: QString locale-aware methods

Some general testing with Japanese characters

  • Edited recording path with %A (weekday) variable and some Japanese characters. Recorded a video. Weekday name was localized and recording worked fine.
  • Remuxed said file. Worked fine.
  • Renamed some sources with Japanese characters and exported the scene collection, removed it from OBS and re-imported it. No problems.

I'm on Windows 11 English US version, but with Finnish locale settings (fi_FI). OBS language is English.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code has been run through clang-format.
  • I have read the contributing document.
  • My code is not on the master branch.
  • The code has been tested.
  • All commit messages are properly formatted and commits squashed where appropriate.
  • I have included updates to all appropriate documentation.

@WizardCM WizardCM added the Enhancement Improvement to existing functionality label Sep 13, 2025
@Dankirk
Copy link
Author

Dankirk commented Sep 15, 2025

Scouted the web and the codebase for potential issues. Here's some observations...

General stuff about setlocale() on Windows
For list of things C runtime locale affects https://cppreference.com/w/c/locale/setlocale.html

  • We can ignore all number related things, because we use the minimal 'C' locale for LC_NUMERIC.
  • Affected string.h and time.h functions are all functions that are specifically for locale-aware things, Things like weekday names will now be localized using strftime() and strcoll() will do a locale-aware comparison.
  • ctype.h character classification ranges are extended. ie. isalnum() may return true for more characters, so there is reason to check if any part using these functions is okay with that. Couldn't hurt to cast the parameters to unsigned char either, since many functions expect value to be 0-255, which char using utf-8 casted to int might not be (char range is -128 to 127). Then again, the functions in use have worked fine on unix until now...
  • stdio.h Formatting of the % placeholders in scanf() and printf() and the sort is affected. Decimals will still be dots (controlled by LC_NUMERIC), but %s will match more. More about file operations below.

multibyte <-> utf8 <-> wchar

In platform.h there are various string conversion functions. From these only the multibyte functions with _mbs_ are affected by this change. The rest use Windows API, which doesn't follow C runtime locale modified by setlocale().

The _mbs_ functions are currently not used in OBS Studio itself, but are offered for external usage for Python, LUA and rtmp. This means there's no change for OBS Studio itself, but external things might see different results from these conversion functions on Windows, which will now be more alike to return values on unix.

The utf8 <-> wchar functions (ie os_utf8_to_wcs()) use MultiByteToWideChar() and WideCharToMultiByte() functions with utf-8 codepage, which will work after this update. Unlike mbstowcs() and the sort in _mbs_ implementations, these functions are independent from CRT locale.

Streams and file operations

C++ streams, like fstream are controlled by std::locale::global() or facets, which is separate setting from CRT setlocale(). Thus, C++ stream operations have been using the minimal "C" locale by default (both unix and Windows). This change copies the CRT locale as default for C++ too.

C-style FILE wide char streams pick the locale available when first io operation is used and continue using that. So it is important setlocale() is called before these streams are used.

printf() and scanf() -type functions use locale for % placeholders, as explained above.

When to setlocale() ?

The beginning of OBSApp constructor seems like a good place for setlocale() calls, since it is the earliest after Qt has performed it's own locale calls. Unix has also been using the constructor to reset LC_NUMERIC back to C. If it is decided that OBS translations locale should be followed instead of OS's, initLocale() also seems acceptable.

@Dankirk Dankirk force-pushed the locale branch 8 times, most recently from 9d20b0c to c0dbe23 Compare September 19, 2025 20:30
Sets runtime locale to system locale with UTF-8 codepage. This is already default behavior on unix, but Windows defaults to minimal 'C' locale.

OBS Studio language settings no longer change QLocale default locale, instead system locale is used for conformity. It is likely this is what user wants as well. Ie. sorting and formatting functions should follow OS locale instead of OBS Studio language (which also lacks country information).
Cast ctype function char parameters to unsigned char to ensure they are in correct range (0 to 255 vs -128 to 127) when used with utf-8 encoding (or extended ascii).

Fixes dstr astrcmp* functions when used with utf-8 (or extended ascii) characters, so now they are treated greater than the base ascii and thus sorted after them, not before.
Switch locale-aware timestamping for logging / crash handling to %H:%M:%S
Use CRT locale as default for C++. logfile and stdout/in/err streams still use the default minimal 'C'.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Improvement to existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Months and weekdays are not localized in filename formatting
2 participants