Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix native extension build on M1 macOS (Ruby 3+) by adjusting compiler flags #193

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

restot
Copy link

@restot restot commented Feb 7, 2025

This change addresses a build failure on M1 MacBooks (macOS 15.3, build 24D60) when compiling the charlock_holmes gem with Ruby 3+. The problem was caused by an explicit hardcoded flag (-x c++) in the ext/charlock_holmes/extconf.rb file, which caused the C compiler to receive a stray "C++" argument.

# ext/charlock_holmes/extconf.rb 
compile_options = +"-x c++"

After some troubleshooting I've come up with fixes:

  • Force the use of /usr/bin/clang++ instead of the default g++ on Apple Silicon (arm64) with Ruby ≥ 3.0.0.
  • Append the ICU C++ standard flag to $CXXFLAGS (not $CPPFLAGS) to prevent C files from receiving C++ flags.
  • Strip stray -x c++ from $CPPFLAGS for affected systems.

Tested on macOS 15.3 M1 with Ruby 3.0.2 and 3.3.5 using a custom script for encoding detection/conversion.

A test script was used to verify core functionality (encoding detection and conversion) with different sample strings.

For example:
For the short sample "Caf\xe9".force_encoding("ISO-8859-1"), the detector returns low confidence (15) and defaults to UTF-8 (this is expected with such a short and ambiguous string).
For a longer sample such as
"Caf\xe9 au lait, d\xe9j\xe0 vu, cr\xe8me brûl\xe9e, voilà, à la mode".force_encoding("ISO-8859-1"), the detector correctly identifies the encoding as ISO-8859-1 with higher confidence (41) and the conversion produces a reasonably correct UTF-8 string.

I've shared the full test script source and output details below.

Error log I had before fix:

% gem install charlock_holmes-0.7.9.gem -- \
  --with-icu-dir="$(brew --prefix icu4c)" \
  --with-cxx=/usr/bin/clang++ \
  --with-cxxflags="-std=c++17"
  
Building native extensions. This could take a while...
ERROR:  Error installing charlock_holmes-0.7.9.gem:
        ERROR: Failed to build gem native extension.
checking for -licui18n... yes
checking for unicode/ucnv.h... yes
checking for -lz... yes
checking for -licuuc... yes
checking for -licudata... yes
checking for icu that requires explicit C++ version flag... yes
checking for icu that compiles with c++20 standard... yes
Static linking is disabled.
creating Makefile
clang: error: no such file or directory: 'c++'
make: *** [converter.o] Error 1

Because of the hardcoded c++ flag in extconf.rb --with-cxx flags have been ignored.
I tried maybe everything to help the compiler recognize the C++ compiler installed in the system but always got the same error.

Test script source:

#!/usr/bin/env ruby
require 'charlock_holmes'

puts "Testing charlock_holmes gem..."

# Sample string encoded in ISO-8859-1 (Latin-1)
sample_text = "Caf\xe9".force_encoding("ISO-8859-1")
puts "Original string: #{sample_text} (encoding: #{sample_text.encoding})"

# Detect the encoding
detection = CharlockHolmes::EncodingDetector.detect(sample_text)
if detection
  puts "Detected encoding: #{detection[:encoding]}"
  puts "Detection confidence: #{detection[:confidence]}"
else
  puts "Encoding detection failed."
  exit(1)
end

# Convert the string to UTF-8
begin
  converted = CharlockHolmes::Converter.convert(sample_text, detection[:encoding], "UTF-8")
  puts "Converted string: #{converted} (encoding: #{converted.encoding})"
rescue => e
  puts "Error during conversion: #{e}"
end

detection = CharlockHolmes::EncodingDetector.detect(sample_text)
if detection
  puts "Detected encoding: #{detection[:encoding]}"
  puts "Detection confidence: #{detection[:confidence]}"
else
  puts "No encoding detected."
end

sample_text = "Caf\xe9 au lait, d\xe9j\xe0 vu, cr\xe8me brûl\xe9e".force_encoding("ISO-8859-1")
puts "Original string: #{sample_text} (encoding: #{sample_text.encoding})"
# Convert the string to UTF-8 based on the detection.
begin
  converted = CharlockHolmes::Converter.convert(sample_text, detection[:encoding], "UTF-8")
  puts "Converted string: #{converted} (encoding: #{converted.encoding})"
rescue => e
  puts "Conversion error: #{e}"
end

# Create a longer, less ambiguous ISO-8859-1 sample
sample_text = "Caf\xe9 au lait, d\xe9j\xe0 vu, cr\xe8me brûl\xe9e, voilà, à la mode".force_encoding("ISO-8859-1")
puts "Original string: #{sample_text} (encoding: #{sample_text.encoding})"

detection = CharlockHolmes::EncodingDetector.detect(sample_text)
if detection
  puts "Detected encoding: #{detection[:encoding]}"
  puts "Detection confidence: #{detection[:confidence]}"
else
  puts "No encoding detected."
end

converted = CharlockHolmes::Converter.convert(sample_text, detection[:encoding], "UTF-8")
puts "Converted string: #{converted} (encoding: #{converted.encoding})"

Test script output:

Testing charlock_holmes gem...
Original string: Caf� (encoding: ISO-8859-1)
Detected encoding: UTF-8
Detection confidence: 15
Converted string: Caf� (encoding: UTF-8)
Detected encoding: UTF-8
Detection confidence: 15
Original string: Caf� au lait, d�j� vu, cr�me brûl�e (encoding: ISO-8859-1)
Converted string: Caf� au lait, d�j� vu, cr�me brûl�e (encoding: UTF-8)
Original string: Caf� au lait, d�j� vu, cr�me brûl�e, voilà, à la mode (encoding: ISO-8859-1)
Detected encoding: ISO-8859-1
Detection confidence: 41
Converted string: Café au lait, déjà vu, crème brûlée, voilà, à la mode (encoding: UTF-8)

Thank you for your attention!

@restot
Copy link
Author

restot commented Feb 7, 2025

Please take a look: @tenderlove, @stanhu

Copy link
Collaborator

@tenderlove tenderlove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to add extra flags anyway? It looks like icu4c installs pkg-config settings, so we should just use the flags stored in pkg config rather than messing with them.

@@ -1,3 +1,3 @@
module CharlockHolmes
VERSION = "0.7.9"
VERSION = "0.7.10"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't change the version. Thanks!

Copy link
Author

@restot restot Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes addressed. If you are talking about compile_options = +"-x c++" I have no idea why it even exists. As you rightly mentioned we are using pkg-config settings. Also, I've changed the code to ignore just this flag and was able to successfully compile the gem with specifying the std version
gem install ./charlock_holmes-0.7.9.gem -- --with-cxxflags="-std=c++17"

@stanhu
Copy link
Contributor

stanhu commented Feb 10, 2025

I wasn't able to reproduce the problem, but maybe CFLAGS would make more sense than CPPFLAGS?

diff --git a/ext/charlock_holmes/extconf.rb b/ext/charlock_holmes/extconf.rb
index 903c297..e33ff81 100644
--- a/ext/charlock_holmes/extconf.rb
+++ b/ext/charlock_holmes/extconf.rb
@@ -67,7 +67,7 @@ if icu_requires_version_flag
     checking_for("icu that compiles with #{std} standard") do
       flags = compile_options + " -std=#{std}"
       if try_compile(minimal_program, flags)
-        $CPPFLAGS << flags
+        $CFLAGS << flags
 
         true
       end

tenderlove and others added 2 commits February 9, 2025 22:43
…ing clang++ and adjusting C++ flags

revert version change
@eli-barkov-talogy eli-barkov-talogy force-pushed the v0.7.10-compatibillity-for-m1 branch from b2202f6 to 92a20eb Compare February 10, 2025 03:43
@restot
Copy link
Author

restot commented Feb 10, 2025

@stanhu

I couldn't reproduce the problem, but maybe CFLAGS would make more sense than CPPFLAGS?

diff --git a/ext/charlock_holmes/extconf.rb b/ext/charlock_holmes/extconf.rb
index 903c297..e33ff81 100644
--- a/ext/charlock_holmes/extconf.rb
+++ b/ext/charlock_holmes/extconf.rb
@@ -67,7 +67,7 @@ if icu_requires_version_flag
     checking_for("ICU that compiles with #{std} standard") do
       flags = compile_options + " -std=#{std}"
       if try_compile(minimal_program, flags)
-        $CPPFLAGS << flags
+        $CFLAGS << flags
 
         true
       end

Thank you, yes $CFLAGS helped to get rid of defining $CXXFLAGS and $CPPFLAGS but haven't helped to solve the root cause for my case.

My initial problem was in this explicit flag

compile_options = +"-x c++"

So I solved it with conditional application to preserve backward compatibility because I don't have enough competence to justify or disproof its necessity

Without an explicit "-x c++" flag for the compiler, I was able to compile source code.

@stanhu
Copy link
Contributor

stanhu commented Feb 10, 2025

I'm not sue what's going on your system. I have hundreds of team members who have had to compile and install this gem on all sorts of macOS and clang versions. You might want to inspect the output Makefile to see how this stray argument is being used.

@tenderlove
Copy link
Collaborator

  • Force the use of /usr/bin/clang++ instead of the default g++ on Apple Silicon (arm64) with Ruby ≥ 3.0.0

When I compile, it uses clang++. Do you know why your system is using g++? It's not default for me (and I'm on Apple Silicon). You should be able to check by doing:

ruby -r rbconfig -e'pp RbConfig::CONFIG["CXX"]'

Do we really need to add extra flags anyway? It looks like icu4c installs pkg-config settings, so we should just use the flags stored in pkg config rather than messing with them.

To answer my own question, ICU requires certain versions of C++. In order to convince the C or C++ compiler to use that version, we have to specify a -std=c++17 (or whatever it wants).

For some reason, pkg-config doesn't specify that flag:

$ pkg-config --cflags icu-i18n
-I/opt/homebrew/Cellar/icu4c@76/76.1_1/include

Strangely, the .pc file knows about the flag:

$ cat /opt/homebrew/opt/icu4c@76/lib/pkgconfig/icu-i18n.pc | grep FLAGS
# CFLAGS contains only anything end users should set
CFLAGS =  -std=c11
# CXXFLAGS contains only anything end users should set
CXXFLAGS =  -std=c++17
# DEFS only contains those UCONFIG_CPPFLAGS which are not auto-set by platform.h
#SHAREDLIBCFLAGS=-fPIC

But it doesn't add CFLAGS in the Cflags section of the .pc file:

$ cat /opt/homebrew/opt/icu4c@76/lib/pkgconfig/icu-i18n.pc | grep Cflags
Cflags: -I${includedir}

The Cflags line is the only way pkg-config knows what to return when you run the pkg-config command according to this.

It seems like whatever is generating the .pc file is not including the right stuff in the Cflags line. I'm not sure who generates that file though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants