Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cargo causes GpioController to hang on Raspberry Pi 5 Bookworm #2332

Closed
skipfire opened this issue Jul 14, 2024 · 31 comments · Fixed by #2335
Closed

Cargo causes GpioController to hang on Raspberry Pi 5 Bookworm #2332

skipfire opened this issue Jul 14, 2024 · 31 comments · Fixed by #2335
Labels
bug Something isn't working Priority:0 Work that we can't release without Status: In PR

Comments

@skipfire
Copy link
Contributor

skipfire commented Jul 14, 2024

Describe the bug

On a Raspberry Pi 5 with a .NET project that uses GpioController, new GpioController() hangs after installing cargo. The same image does not have a problem on other versions of Pi hardware. I found removing llvm-14-tools restores the functionality, but installing that package by itself does not cause the problem.

Steps to reproduce

# Install .NET
wget https://dot.net/v1/dotnet-install.sh -O dotnet-install.sh
bash ./dotnet-install.sh
echo 'export PATH=$PATH:$HOME/.dotnet' >> ~/.bashrc
echo 'export DOTNET_ROOT=$HOME/.dotnet' >> ~/.bashrc
source ~/.bashrc

# Make test project
dotnet new console -n cargotest
cd cargotest
dotnet add package System.Device.Gpio --version 3.2.0
nano Program.cs

Put the following content into Program.cs

using System.Device.Gpio;
Console.WriteLine("before new");
var controller = new GpioController(PinNumberingScheme.Logical);
Console.WriteLine("after new");
controller.Dispose();
Console.WriteLine("exiting");

Test the behaviors

# do the testing
# it should work first try
dotnet run .
sudo apt install -y cargo
# it should fail on this try, cancel once you see it is not moving forward
dotnet run .
sudo apt remove -y llvm-14-tools
# it should work on this last try
dotnet run .

Expected behavior

new GpioController() should not hang with cargo installed (or any non-dotnet package), in the example all 3 Console.WriteLine calls should execute.

Actual behavior

Only the first Console.WriteLine before new GpioController() executes and then it hangs without continuing or error.

Versions used

  • Version of System.Device.Gpio package 3.2.0
  • dotnet --info on the machine where app is being built and run
    .NET SDK:
    Version: 8.0.303
    Commit: 29ab8e3268
    Workload version: 8.0.300-manifests.c915c39d
    MSBuild version: 17.10.4+10fbfbf2e

Runtime Environment:
OS Name: debian
OS Version: 12
OS Platform: Linux
RID: linux-arm64
Base Path: /home/genmonpi/.dotnet/sdk/8.0.303/

.NET workloads installed:
There are no installed workloads to display.

Host:
Version: 8.0.7
Architecture: arm64
Commit: 2aade6beb0

.NET SDKs installed:
8.0.303 [/home/genmonpi/.dotnet/sdk]

.NET runtimes installed:
Microsoft.AspNetCore.App 8.0.7 [/home/genmonpi/.dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.NETCore.App 8.0.7 [/home/genmonpi/.dotnet/shared/Microsoft.NETCore.App]

Other architectures found:
None

Environment variables:
DOTNET_ROOT [/home/genmonpi/.dotnet]

global.json file:
Not found

Learn more:
https://aka.ms/dotnet/info

Download .NET:
https://aka.ms/dotnet/download

@skipfire skipfire added the bug Something isn't working label Jul 14, 2024
@krwq
Copy link
Member

krwq commented Jul 18, 2024

[Triage] Can you try explicitly using libgpiod (new GpioController(PinNumberingScheme.Logical, new LibGpiodDriver())) and sysfs driver (new GpioController(PinNumberingScheme.Logical, new SysFsDriver())) and see if that repros for both? That seems very weird - another thing you can try is reinstalling dotnet and see if maybe that helps - please keep us updated - it would be good to figure out what's causing the hang

@krwq krwq added the Needs: Author Feedback We are waiting for author to react to feedback (action required) label Jul 18, 2024
@skipfire
Copy link
Contributor Author

I can try the libgpiod later today. I have tried on multiple fresh installs with dotnet both before and after the cargo install and it made no difference so this isn't a single install issue. The repro steps I provided I built with a fresh image to make it as simple as possible for someone else to be able to reproduce.

@dotnet-policy-service dotnet-policy-service bot removed the Needs: Author Feedback We are waiting for author to react to feedback (action required) label Jul 18, 2024
@skipfire
Copy link
Contributor Author

skipfire commented Jul 18, 2024

@krwq gave it a shot and LibGpiodDriver failed the same as no driver, however SysFsDriver did pass the constructor. I haven't tried it with any of the rest of the functionality I would need, but the constructor didn't hang up.

Note for anyone trying to replicate, add using System.Device.Gpio.Drivers; to the initial repro code.

@pgrawehr
Copy link
Contributor

Try with new LibGpiodDriver(4), that's the correct setting for Rpi5.

@skipfire
Copy link
Contributor Author

@pgrawehr LibGpioDriver(4) fails as well.

@skipfire
Copy link
Contributor Author

@pgrawehr @krwq I have found the problem, there are a pair of circular symbolic links in an installed library and it's getting stuck in an infinite loop in LibGpiodDriverFactory.GetFiles where it callsdirectoriesToProcess.Push(subdirectory); as part of looking for libgpiod.so.*. I'm sure that ignoring all symbolic links would cause other problems and DirectoryInfo.LinkTarget is not available with netstandard2.0. With .NET 8.0 both were showing a LinkTarget of "..".

sudo find -L /lib -type l
find: File system loop detected; ‘/lib/llvm-14/build/Release’ is part of the same file system loop as ‘/lib/llvm-14’.
find: File system loop detected; ‘/lib/llvm-14/build/Debug+Asserts’ is part of the same file system loop as ‘/lib/llvm-14’.

Any suggestions?

@pgrawehr
Copy link
Contributor

Does that also happen if you force it to Libgpiod v1? There should be a special constructor for this.

@skipfire
Copy link
Contributor Author

Can you provide a bit of code of what you want me to try because from what I see this is before the version matters? This error comes from the first line in LibGpiodDriver when it needs LibGpiodDriverFactory.Instance in the call LibGpiodDriverFactory.VersionedLibgpiodDriver versionedLibgpiodDriver = LibGpiodDriverFactory.Instance.Create(gpioChip);.

@pgrawehr
Copy link
Contributor

@skipfire Sorry, I was on my phone only. I was thinking of using the public LibGpiodDriver(int gpioChip, LibGpiodDriverVersion driverVersion) constructor, but it seems this is calling the problematic factory constructor as well, thus getting the same issue. Looks like you need to either use SysFs or manually remove that circular link until this issue is resolved (which might not be so easy).
Thanks for finding that problem.

@pgrawehr pgrawehr added Priority:0 Work that we can't release without and removed untriaged labels Jul 22, 2024
@joperezr
Copy link
Member

FYI @huesla as this is a bug of on the code that is trying to find the right version of libgpiod from different folders.

@skipfire Thank you so much for the great report and for following up and finding the culprit. We discussed during triage with @raffaeler and @pgrawehr and we think that this code shouldn't be there to begin with, as ideally we should be relying on NativeLibrary probing paths as opposed to us doing the probing ourselves. We will consider replacing this code in the next major version. In the meantime, a potential workaround would be to add a HashSet to mark all of the directories that have been visited (or pushed to directoriesToProcess) and to do a quick check before each time we are going to push to ensure that this directory is not in the visited (in order to prevent loops). This would work as a temporary mitigation of the issue. @skipfire (or @huesla) does that solution sound good to you? If so, would you be willing to contribute a PR with it? Thanks again for all of your help with this!

@skipfire
Copy link
Contributor Author

@joperezr I am willing to contribute and I tried your suggestion already, but the directory keeps extending so it isn't seen as the same.
Given /lib/myfolder/mysubfolder -> /lib/myfolder
Will will get /lib/myfolder/mysubfolder/myfolder/mysubfolder/myfolder etc and I could not find a way to determine that I had already visited that folder with .NETStandard 2.0 compatibility. If .NET Standard 2.0 compatibility wasn't a factor then with .NET 5+ the LinkTarget is available and would make it possible.

@raffaeler
Copy link
Contributor

@skipfire My guess is that the path is always computed as relative instead of keeping a root path and adding the relative path for each visited library.
Since you are already debugging it, you should be able to print a debug trace for each symbolic link and see how the path is resolved.
If it sounds confusing, you may probably post here a verbose log of the path being visited, the symbolic link and the new path being constructed.

Thanks!

@skipfire
Copy link
Contributor Author

@raffaeler I understand that, however the symbolic link detail is not available in .NET Standard 2.0 so the data is not available to the runtime. Trying to access the symbolic data generates a compile exception because of the .NET Standard 2.0 build requirement of the project.

@raffaeler
Copy link
Contributor

@skipfire I understand that, but you can still use reflection (or dyanamic) for debugging purposes ...

@skipfire
Copy link
Contributor Author

Which I did in a test harness and it keeps appending folders with no visible reference to the real path.

@skipfire
Copy link
Contributor Author

Here is one example, /lib/llvm-14/build/Release/ is /lib/llvm-14/

/lib/llvm-14/build $ ls -al
total 16
drwxr-xr-x 4 root root 4096 Jul 19 21:01 .
drwxr-xr-x 7 root root 4096 Jul 19 21:01 ..
lrwxrwxrwx 1 root root    2 Feb 17  2023 Debug+Asserts -> ..
lrwxrwxrwx 1 root root   10 Feb 17  2023 include -> ../include
lrwxrwxrwx 1 root root    6 Feb 17  2023 lib -> ../lib
lrwxrwxrwx 1 root root    2 Feb 17  2023 Release -> ..
lrwxrwxrwx 1 root root    8 Feb 17  2023 share -> ../share
drwxr-xr-x 2 root root 4096 Feb 17  2023 unittests
drwxr-xr-x 3 root root 4096 Jul 19 20:58 utils

DirectoryInfo for /build/llvm-14/build/Release with .NET 8.0 provides some data, but I cannot find any property, even in the debug view, that says /lib/llvm-14/.

{/lib/llvm-14/build/Release}
    Attributes: ReadOnly | Directory | ReparsePoint
    CreationTime: {17/02/2023 05:57:29}
    CreationTimeUtc: {17/02/2023 11:57:29}
    Exists: true
    Extension: ""
    FullName: "/lib/llvm-14/build/Release"
    FullPath: "/lib/llvm-14/build/Release"
    LastAccessTime: {19/07/2024 21:00:29}
    LastAccessTimeUtc: {20/07/2024 02:00:29}
    LastWriteTime: {17/02/2023 05:57:29}
    LastWriteTimeUtc: {17/02/2023 11:57:29}
    LinkTarget: ".."
    Name: "Release"
    OriginalPath: "/lib/llvm-14/build/Release"
    Parent: {/lib/llvm-14/build}
    Root: {/}
    UnixFileMode: OtherExecute | OtherRead | GroupExecute | GroupRead | UserExecute | UserWrite | UserRead

@raffaeler
Copy link
Contributor

@skipfire A quick workaround could be to declare the pinvoke for realpath in libc which return the full canonicalized path name

@raffaeler
Copy link
Contributor

@pgrawehr
Copy link
Contributor

I'm confused, as I cannot reproduce the issue.

pi@raspberrypi:/lib $ sudo find -L /lib -type l
/lib/aarch64-linux-gnu/qt-default/qtchooser/default.conf
find: Dateisystemschleife erkannt; �/lib/chromium-browser/libs� ist ein Teil der gleichen Schleife wie �/lib/chromium-browser�.

So there's a loop and I've added traces to see how the path builds up, but it appears to abort automatically, maybe after the length reaches a certain limit.

@skipfire
Copy link
Contributor Author

skipfire commented Jul 25, 2024

@pgrawehr Is that after installing cargo and running on a Pi 5 with Bookworm? This does not happen on other Pi models or before Bookworm.

@pgrawehr
Copy link
Contributor

In my case, the loop is apparently generated by chromium, but this shouldn't make a difference, should it? I'm running on a Pi4 with Bookworm, which also should behave the same here.

@skipfire
Copy link
Contributor Author

The Chromium loop wasn't breaking for me either even though it existed. The problem is specific to a Pi 5, I tried on a Pi 3A+, 3B, 3B+, Pi 4, Zero 2, and Pi 5 and Pi 5 was the only spot it didn't work.

@pgrawehr
Copy link
Contributor

That is really confusing then (and makes no sense). Can we just limit the recursion depth to maybe 6 or so levels? The files we're looking for shouldn't be so deeply nested. As noted above, we just need a quick-and-dirty hack for now and we should remove this entire search method later.

@huesla
Copy link
Contributor

huesla commented Jul 25, 2024

Hi, I have just read this topic and hope to find some time on the weekend to look into this. Since you are that deep into the code, you might just skip the factory and explicitly instantiate the desired versioned driver temporarily until there is a fix.
I hope this piece of code gets replaced soon...

@huesla
Copy link
Contributor

huesla commented Jul 25, 2024

Are there any valid reasons though for symlink loops to exist?

@skipfire
Copy link
Contributor Author

@huesla specifying a driver doesn't skip the problem code, it's in the factory to get the right chipset which is called even if you pass a specific driver.

@huesla
Copy link
Contributor

huesla commented Jul 25, 2024

Sorry I meant in the constructor of LibGpioDriver.cs
Instead of calling the factory, directly instantiate the versioned driver (depending on the libgpiod version installed on your system)

For example

...
_driver = new LibGpiodV1Driver(chipNumber);
...

@skipfire
Copy link
Contributor Author

@huesla that was one of the recommendations that was tried and it doesn't avoid the code. The problem is in GetFiles which is called by GetInstalledLibraries which is called by LibGpiodDriverFactory which is called by Instance which is called by LibGpiodDriver(int gpioChip) regardless of chip version.

@huesla
Copy link
Contributor

huesla commented Jul 26, 2024

I think the previous recommendation was to pass the LibGpiodDriverVersion enum value to:

public LibGpiodDriver(int gpioChip, LibGpiodDriverVersion driverVersion)
{
    LibGpiodDriverFactory.VersionedLibgpiodDriver versionedLibgpiodDriver = LibGpiodDriverFactory.Instance.Create(gpioChip, driverVersion);
    _driver = versionedLibgpiodDriver.LibGpiodDriver;
    Version = versionedLibgpiodDriver.DriverVersion;
}

which would call the problematic factory, as you correctly stated.

What I meant was for example:

public LibGpiodDriver(int gpioChip, LibGpiodDriverVersion driverVersion)
{
    // LibGpiodDriverFactory.VersionedLibgpiodDriver versionedLibgpiodDriver = LibGpiodDriverFactory.Instance.Create(gpioChip, driverVersion);
    _driver = new LibGpiodV1Driver(gpioChip);
    Version = LibGpiodDriverVersion.V1;
}

This entirely skips the factory, which is only needed to automatically choose the appropriate LibGpiodDriver version depending on the system. @joperezr please contact me if you need support in switching to NativeLibrary or so...

@pgrawehr
Copy link
Contributor

Are there any valid reasons though for symlink loops to exist?

@huesla I can't think of any, but unfortunately, that's how it is. These loops are created by external packages, even well-known ones such as the chromium browser. We can't change when some third-party packages do stupid things.

@skipfire
Copy link
Contributor Author

Posted PR #2335 to address this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Priority:0 Work that we can't release without Status: In PR
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants