Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix Android support #3396

Merged
merged 5 commits into from
Apr 22, 2022
Merged

fix Android support #3396

merged 5 commits into from
Apr 22, 2022

Conversation

jtracey
Copy link
Contributor

@jtracey jtracey commented Apr 13, 2022

This PR is to fix and improve Android support. There are some final points that still need discussion that I'll add in another comment, but it's basically done.

I've broken the PR into four commits, each of which can, if need be, be broken out into their own PRs, but make the most sense together.

  • Commit 1 fixes a potential null pointer deref from an assumption that the getpw* family of syscalls always returns valid C strings or errors, when null pointer strings are also a possibility. So far, Android is the only platform I know of that does this.
  • Commit 2 is the result of going through and finding everywhere Android diverges from Linux, figuring out why, and patching to accommodate those differences or lack thereof. This includes fixing everything preventing Android builds (modulo Commit 3), and enabling/modifying tests to take Android into account. There are a few tests that don't pass yet because some design decisions need to be made, which is the main point that needs discussion.
  • Commit 3 performs some fixes that are specific to libc/nix with Android on 32 bit arches, which have some weird edge cases not found on other platforms (it's possible some are upstream bugs, but for now they're easy to work around).
  • Commit 4 adds Android to the CICD to make sure it stays supported into the future. It won't be able to check that Android maintains parity with Linux, so e.g. someone may add a Linux-specific performance enhancement they forgot to also enable on Android, but it will make sure that builds continue to work and all the enabled tests pass.

In case anyone reading isn't familiar: Android is built on the Linux kernel, but is set up very differently from most Unix environments. Perhaps most notably, each application is installed as and runs under its own Unix user, with very few permissions. Access control is supposed to be done through Android's permissions framework, not the classic file mode bits or capabilities. That said, there is a shell environment, with access to a Unix-style interface, but a lot of things one might expect to work, don't. E.g., hard links are disabled for everyone but root (and even then, won't work in shared directories), there is no privilege escalation mechanism like sudo/su by default, parts of the standard Unix directory layout aren't accessible (e.g., /proc) or don't exist at all (e.g., /tmp).

The last commit is somewhat hacky. There isn't a lot of precedence for CI of terminal-based applications on an Android emulator. The biggest problem is the limitations of the default shell, which would be difficult to set up an environment to run the tests in. Instead, we install the Termux app, which provides something more closely approaching a typical Linux environment (including the GNU coreutils). One of the most useful things Termux provides is a package manager, which allows us to install dependencies, as well as rust/cargo itself, so we can test building on the emulated device (rustup doesn't seem to work). However, there is no mechanism for using the Termux shell directly, so we use the debug bridge to enter keyboard strokes, then query for files to tell when the commands terminate, what their exit codes were, etc.. One adjustment to this flow we could make would be adding a step setting up sshd in Termux, which would allow us to run commands in a more typical manner for a remote host. The reason I haven't done so is we would still need the key strokes + query style of interaction to set up sshd, and keeping to one method of interaction means less code to debug (preventing problems if future changes to Termux, the emulator, or the runner require reconfiguring ssh or something). However, if it is preferred, I can rewrite that commit to use that instead, and the key strokes + query stuff would be contained to the cache setup step. Which, on that note, the CI step is set up to cache the emulator image after installing Termux, Rust, and dependencies. This means we don't have to wait for that every time, we reduce downloads from the package repos, and we reduce noise from changing installed package versions. Any time we want to update the cache, we just need to change the cache key string, e.g. by using a different Termux version, or just adding something like "-v2" to the end.

The action we use to run the Android emulator is popular, with a high install rate (69th most popular action of 12,708 as of this writing) and a good list of notable projects relying on it (including projects from Google and the Android team itself). I have, however, found it to be a bit temperamental in getting it properly configured. Android versions that work on my local machine don't work with it, I couldn't get either ARM architecture working on it, and the shutdown step seems to hang most runs when performing the caching step. However, the configuration in the commit seems to work consistently (with the hang addressed by an unfortunate but effective pkill -9). I suspect the configuration won't need changing often, and the project is still actively maintained, so hopefully some of these issues are worked out by the time it does.

I tried to make sure to leave code comments on anything potentially confusing, but feel free to request more anywhere, or to ask questions here why something was necessary. Stay tuned for a comment on the minimum that still needs to be addressed before merging...

@jtracey
Copy link
Contributor Author

jtracey commented Apr 13, 2022

Misc:

  • The Android CI job currently depends on fewer jobs than it should for consistency with the other platforms. I'll fix this once we're closer to merging, right now it's just convenient for the faster turnaround time.

Failing CI tests:

First, there's the quick and dirty blanket solution to discuss: just disable all of these on Android, at least for now. Simply not worrying about the fact that something could be tested on Android, but isn't, until someone actually encounters a bug as a result seems like a possibly acceptable solution, given how broken things are now anyway. The rest of this discussion assumes that's not the tack we're taking; I've listed "don't test on Android" as an option for the failing tests where I think that could make sense on an individual basis.

Second, there's the matter of root on Android. Some of the tests below could pass by only running them as the root user. We should make a decision on what our policy is on this -- namely, do we support rooted Android? Root shells are somewhat rare on Android, used almost exclusively by enthusiasts who want to tweak their ROMs in various ways. On the one hand, it seems excessive to cater support to such a small minority of a use case, but on the other, such enthusiasts are perhaps the most likely to need a well-tested set of coreutils on Android. Were we to decide to run these tests when root, we would also need to decide whether that should be accommodated in the CICD. I haven't yet looked at what it takes to get a root Termux shell in the emulator, but I suspect it would at least be possible.

  • Upstream bug in xattr: test_cp::test_cp_archive, test_cp::test_cp_archive_recursive: The xattr library intentionally fails at runtime for any platform not explicitly supported, and Android is not on the list of supported platforms. This is a bit of a tough call; the library hasn't been touched since 2018, and the only link in the README is broken, so I suspect it's just not maintained. Our options are:
    • try to patch the upstream library: add Android as supported platform Stebalien/xattr#23
    • remove the dependency, and implement xattr support ourselves
    • don't support xattrs on Android
    • don't test on Android for now, file a new PR to enable the tests if/when a new xattr release is available
  • Seccomp restrictions: test_date::test_date_set_permissions_error, test_chroot::test_enter_chroot_fails: These rely on syscalls blocked by Android's seccomp filter, triggering an unhandled SIGSYS. Our options are:
    • handle SIGSYS signals and do something with them (most likely, print an error)
    • remove the seccomp filter in the CICD emulator (would still fail on any device with the filter enabled, which is basically all of them)
    • don't test on Android for now, leave the first option for a later PR
    • don't test on Android ever, the SIGSYS crash is enough for a user to figure out the problem
  • Hard links: test_du::test_du_hard_link, test_link::test_link_existing_file, test_cp::test_cp_arg_link, test_cp::test_cp_link_backup: Android only supports creating hard links by the root user, and on non-shared directories (e.g., not /sdcard). Our options are:
    • require that the test run as root on Android (and in a non-shared directory)
    • don't test on Android
  • test_cp_overriding_arguments: Similar to "Hard links", but the test also tests options aside from ones involving links. Our options are:
    • make --symbolic-link --link argument in this test conditional on not Android
    • require that the test run as root on Android (and in a non-shared directory)
    • don't test at all on Android
  • test_ln::test_backup_force: The test currently uses a hard link (see "Hard links" above), but I think if a symlink was used instead, it would still test what was intended -- the fact it's using a symlink then a hard link just made me cautious enough to double check first. Our options are:
    • make the test use symlinks for both commands
    • require that the test run as root on Android (and in a non-shared directory)
    • don't test on Android
  • test_touch::test_touch_no_dereference: Comes from a bug in the filetime crate. I've filed a PR, which has already been merged. Our options are:
    • wait for upstream to publish a new release, then bump in Cargo.toml
    • don't test on Android for now, and file another PR once upstream is ready
  • Use of /dev/stdout: test_touch::test_touch_changes_time_of_file_in_stdout, test_ls::test_ls_devices: /dev/stdout doesn't exist or isn't accessible on some versions of Android (I think older versions, the API 28 emulator doesn't allow it but my API 30 device does). Our options are:
    • track down the code accessing /dev/stdout and make it use /proc/self/fd/1 on Android (maybe Linux too)
    • only support Android versions that have /dev/stdout (would mean figuring out why the more recent Android emulators aren't working in the Github environment and fixing it)
    • check at runtime for /dev/stdout before testing
    • don't test on Android
  • Cleared env in tests: test_install::test_install_and_strip, test_install::test_install_and_strip_with_program: install -s invokes an external call to "strip". Part of the process of setting up the test environment is to clear all environment variables. This means that $PATH becomes unset. I'm not sure how it gets reset before the tests are run, but whatever is resetting it seems to be giving the $PATH of the Android root shell, since strace shows a bunch of failed execve searches for "strip" in paths we don't have access to (e.g., /apex/com.android.runtime/bin/strip). I'd have to investigate further, but I suspect something similar is causing the BSD failure mentioned in the "FixME"s. Some possible solutions are:
    • preserve the $PATH environment variable in the test environment
    • preserve the $LD_PRELOAD environment variable in the test environment (Termux populates it with a library that fixes these sorts of things)
    • either of the above on Android specifically

Failing non-CI tests:

These are tests that pass on the CI environment, but fail on at least one emulator or device. They're included here mostly for completion's sake, I don't think it should be necessary to resolve them before merging, and can be broken out into their own issues. If you'd like me to change anything to fix them now though, feel free to say so.

@sylvestre
Copy link
Contributor

wahou, terrific, bravo :)

@sylvestre
Copy link
Contributor

I think you can ignore "root test" :)

@jtracey
Copy link
Contributor Author

jtracey commented Apr 14, 2022

@sylvestre Noted, thanks. :) Could you also confirm that the test_ln::test_backup_force test is fine using symlinks for both commands? Here's the original commit.

@sylvestre
Copy link
Contributor

sylvestre commented Apr 14, 2022 via email

@jtracey
Copy link
Contributor Author

jtracey commented Apr 15, 2022

I've filed a PR to fix the xattr problem upstream: Stebalien/xattr#23

@jtracey jtracey force-pushed the android2 branch 4 times, most recently from a700910 to 2fdf388 Compare April 16, 2022 07:10
@jtracey jtracey marked this pull request as ready for review April 16, 2022 07:46
@jtracey
Copy link
Contributor Author

jtracey commented Apr 16, 2022

It's alive! \o/

I went ahead and picked what I thought were reasonable options for the remaining blockers (and also fixed one more problem uncovered: on x86, only 32 bit ELF files can be read by strip by default; I swapped out the relevant test file, since x86-64 can do both). Feel free to request changes to any of those decisions, or anything else.

@sylvestre
Copy link
Contributor

I will have look asap.
Would it be possible to add code coverage in the ci too? Thanks

@jtracey
Copy link
Contributor Author

jtracey commented Apr 16, 2022

Unfortunately, I don't think code coverage is possible yet. As I mentioned before, rustup doesn't seem to work on Android right now, so we're using what's available in the package manager. It's kept up to date, e.g. it has rust 1.60.0, but that means we can't use nightly features/flags. Rust 1.60.0 did add support for code coverage profiling, so I was hoping we could at least get something, but it didn't work:

$ RUSTFLAGS="-C instrument-coverage" cargo build
   Compiling libc v0.2.121
   Compiling cfg-if v1.0.0
error[E0463]: can't find crate for `profiler_builtins`
  |
  = note: the compiler may have been built without the profiler runtime

For more information about this error, try `rustc --explain E0463`.
error: could not compile `cfg-if` due to previous error
warning: build failed, waiting for other jobs to finish...
error: build failed

It sounds like this is somewhat expected, based on the documentation on the profiler. So we'll need to wait for (or fix ourselves) either rustup support, or a rustc with profiling enabled in the package manager.

@sylvestre
Copy link
Contributor

Do you want to land this now and iterate from it? thanks

jtracey added 5 commits April 20, 2022 08:44
The code for creating a Passwd from the fields of the raw syscall result
assumed that the syscall would return valid C strings in all non-error
cases. This is not true, and at least one platform (Android) will
populate the fields with null pointers where they are not supported.

To fix this and prevent the error from happening again, this commit
changes `cstr2string(ptr)` to check for a null pointer, and return an
`Option<String>`, with `None` being the null pointer case. While
arguably it should be the caller's job to check for a null pointer
before calling (since the safety precondition is that the pointer is to
a valid C string), relying on the type checker to force remembering this
edge case is safer in the long run.
@jtracey
Copy link
Contributor Author

jtracey commented Apr 20, 2022

If it looks good to you, then yeah I think that's the way to go, so we don't have to keep looking out for new breakage.

I think I almost have code coverage working, I'll just file a new PR for that in the near future.

@jtracey jtracey deleted the android2 branch October 10, 2024 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants