Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add which function for finding executables in PATH #2440

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ dirs = "5.0.1"
dotenvy = "0.15"
edit-distance = "2.0.0"
heck = "0.5.0"
is_executable = "1.0.4"
lexiclean = "0.0.1"
libc = "0.2.0"
num_cpus = "1.15.0"
Expand Down
16 changes: 16 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1597,6 +1597,22 @@ $ just
name `key`, returning `default` if it is not present.
- `env(key)`<sup>1.15.0</sup> — Alias for `env_var(key)`.
- `env(key, default)`<sup>1.15.0</sup> — Alias for `env_var_or_default(key, default)`.
- `which(exe)`<sup>master</sup> — Retrieves the full path of `exe` according
to the `PATH`. Returns an empty string if no executable named `exe` exists.

```just
bash := which("bash")
nexist := which("does-not-exist")

@test:
echo "bash: '{{bash}}'"
echo "nexist: '{{nexist}}'"
```

```console
bash: '/bin/bash'
nexist: ''
```

#### Invocation Information

Expand Down
56 changes: 56 additions & 0 deletions src/function.rs
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@ pub(crate) fn get(name: &str) -> Option<Function> {
"uppercase" => Unary(uppercase),
"uuid" => Nullary(uuid),
"without_extension" => Unary(without_extension),
"which" => Unary(which),
_ => return None,
};
Some(function)
Expand Down Expand Up @@ -667,6 +668,61 @@ fn uuid(_context: Context) -> FunctionResult {
Ok(uuid::Uuid::new_v4().to_string())
}

fn which(context: Context, s: &str) -> FunctionResult {
0xzhzh marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about this version? I found the original logic a little hard to follow:

(Also included some comments which don't need to go into the final PR.)

fn which(context: Context, command: &str) -> FunctionResult {
  use std::path::Component;

  let command = Path::new(command);

  let relative = match command.components().next() {
    None => return Err("empty command".into()),
    // Any path that starts with `.` or `..` can't be joined to elements of `$PATH` and should be considered on its own. (Is this actually true? What about `C:foo` on windows? Is that a thing?
    Some(Component::CurDir) | Some(Component::ParentDir) => vec![command.into()],
    _ => {
      let paths = env::var_os("PATH").ok_or("`PATH` environment variable not set")?;

      env::split_paths(&paths)
        .map(|path| path.join(command))
        .collect()
    }
  };

  let working_directory = context.evaluator.context.working_directory();

  let absolute = relative
    .into_iter()
    // note that an path.join(absolute_path) winds up being absolute_path
    // lexiclean is hear to remove unnecessary `.` and `..`
    .map(|relative| working_directory.join(relative).lexiclean())
    .collect::<Vec<PathBuf>>();

  for candidate in absolute {
    if is_executable::is_executable(&candidate) {
      return candidate
        .to_str()
        .map(str::to_string)
        .ok_or_else(|| format!("Executable path not unicode: {}", candidate.display()));
    }
  }

  Ok(String::new())
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally find my implementation easier to understand, but that might only be because I wrote it.

The main difference between our implementations seems to be how we determine whether to resolve via PATH. Mine checks command.components.count(): anything with more than one component is considered a relative path and skips PATH resolution. Meanwhile, yours checks command.components.next(): anything whose first element is ., .., or / is considered a relative path and skips PATH resolution.

Crucially, they differ on how they handle something like which("dir/cmd"), which has more than one component, but the first component is a Component::Normal("dir"). Thus, my implementation would consider it the same as ./dir/cmd (i.e., resolved relative to CWD), while yours would resolve it using the PATH variable.

I wrote a small test, and (at least on macOS with sh, bash, and zsh) it seems like my implementation has the correct behavior in this instance:

#!/bin/sh

set -e
tmpdir="$(mktemp -d)"
cd "$tmpdir"

mkdir -p path/dir dir

printf "#!%s\necho resolved via PATH" "$SHELL" > path/dir/cmd
chmod +x path/dir/cmd

printf "#!%s\necho resolved via CWD" "$SHELL" > dir/cmd
chmod +x dir/cmd

PATH="$(realpath path)" dir/cmd   # prints 'resolved via CWD'

rm -rf "$tmpdir"

The other differences are:

  • You use Path::new(command) rather than PathBuf::from(command). I've adopted this change since it's more efficient (and also
  • You use lexiclean() to remove extraneous .. or . components. However, at least the which implementation on my system (macOS) does not do this.
  • You handle absolute paths (those beginning with /) implicitly, through the behavior of Path::join() (TIL about this behavior!). But I personally think it's clearer to handle this case explicitly.
  • You make two passes through the candidates, one to prepend the working directory, and one to check whether it is an executable. I didn't do this to avoid making another copy.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, you're right, in addition to your test, I looked at the [POSIX shell standard](https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/utilities/V3_chap02.html_ (search for "command search and execution"), and it says that for any path with a, the PATH variable isn't consulted.

So I think command.components.count() is best. (Which, if we're doing that, it makes sense to use that to check for an empty command, and not .is_empty())

GNU which does actually remove ..:

$ gwhich ../just/bin/forbid
/Users/rodarmor/src/just/bin/forbid

So I'm in favor of using lexiclean, since it makes the returned paths easier to read, in case they return .. or ..

I personally like using the relative path behavior to do joining, but I think that's kind of a wash, since it's a little weird.

Copy link
Author

@0xzhzh 0xzhzh Jan 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GNU which does actually remove .. [...] So I'm in favor of using lexiclean, since it makes the returned paths easier to read, in case they return .. or ..

I wasn't aware of that, thanks for finding that behavior! But if someone really wants to clean up the path, can't they do something like clean(which("some/../funny/path"))? Or even canonicalize(which("some/../funny/path"))?

The argument against cleaning the path is that doing so removes information that can no longer be recovered. I'm wary of over-normalization for weird edge cases like the example mentioned in the Rust Path::components() docs.

Admittedly, I can't personally foresee any scenario where this relative path information might actually be useful, so I'm strongly against calling lexiclean() here. But since the user can opt into normalization with other Just functions like clean and canonicalize, shouldn't we leave this to the user?

use is_executable::IsExecutable;

let cmd = PathBuf::from(s);

let path_var;
let candidates = match cmd.components().count() {
0 => Err("empty command string".to_string())?,
1 => {
// cmd is a regular command
path_var = env::var_os("PATH").ok_or("Environment variable `PATH` is not set")?;
env::split_paths(&path_var).map(|path| path.join(cmd.clone())).collect()
}
_ => {
// cmd contains a path separator, treat it as a path
vec![cmd]
}
};

for mut candidate in candidates {
if candidate.is_relative() {
// This candidate is a relative path, either because the user invoked `which("./rel/path")`,
// or because there was a relative path in `PATH`. Resolve it to an absolute path.
let cwd = context
0xzhzh marked this conversation as resolved.
Show resolved Hide resolved
.evaluator
.context
.search
.justfile
.parent()
.ok_or_else(|| {
format!(
"Could not resolve absolute path from `{}` relative to the justfile directory. Justfile `{}` had no parent.",
candidate.display(),
context.evaluator.context.search.justfile.display()
)
})?;
let mut cwd = PathBuf::from(cwd);
cwd.push(candidate);
candidate = cwd;
}

if candidate.is_executable() {
return candidate.to_str().map(str::to_string).ok_or_else(|| {
format!(
"Executable path is not valid unicode: {}",
candidate.display()
)
});
}
}

// No viable candidates; return an empty string
Ok(String::new())
}

fn without_extension(_context: Context, path: &str) -> FunctionResult {
let parent = Utf8Path::new(path)
.parent()
Expand Down
1 change: 1 addition & 0 deletions tests/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@ mod timestamps;
mod undefined_variables;
mod unexport;
mod unstable;
mod which_exec;
0xzhzh marked this conversation as resolved.
Show resolved Hide resolved
#[cfg(windows)]
mod windows;
#[cfg(target_family = "windows")]
Expand Down
170 changes: 170 additions & 0 deletions tests/which_exec.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
use super::*;

trait TempDirExt {
fn executable(self, file: impl AsRef<Path>) -> Self;
}

impl TempDirExt for TempDir {
fn executable(self, file: impl AsRef<Path>) -> Self {
let file = self.path().join(file.as_ref());

// Make sure it exists first, as a sanity check.
assert!(
file.exists(),
"executable file does not exist: {}",
file.display()
);

// Windows uses file extensions to determine whether a file is executable.
// Other systems don't care. To keep these tests cross-platform, just make
// sure all executables end with ".exe" suffix.
assert!(
file.extension() == Some("exe".as_ref()),
"executable file does not end with .exe: {}",
file.display()
);

#[cfg(not(windows))]
{
let perms = std::os::unix::fs::PermissionsExt::from_mode(0o755);
fs::set_permissions(file, perms).unwrap();
}

self
}
}

#[test]
fn finds_executable() {
let tmp = temptree! {
0xzhzh marked this conversation as resolved.
Show resolved Hide resolved
"hello.exe": "#!/usr/bin/env bash\necho hello\n",
}
.executable("hello.exe");

Test::new()
.justfile(r#"p := which("hello.exe")"#)
.env("PATH", tmp.path().to_str().unwrap())
.args(["--evaluate", "p"])
.stdout(format!("{}", tmp.path().join("hello.exe").display()))
.run();
}

#[test]
fn prints_empty_string_for_missing_executable() {
let tmp = temptree! {
"hello.exe": "#!/usr/bin/env bash\necho hello\n",
}
.executable("hello.exe");

Test::new()
.justfile(r#"p := which("goodbye.exe")"#)
.env("PATH", tmp.path().to_str().unwrap())
.args(["--evaluate", "p"])
.stdout("")
.run();
}

#[test]
fn skips_non_executable_files() {
let tmp = temptree! {
"hello.exe": "#!/usr/bin/env bash\necho hello\n",
"hi": "just some regular file",
}
.executable("hello.exe");

Test::new()
.justfile(r#"p := which("hi")"#)
0xzhzh marked this conversation as resolved.
Show resolved Hide resolved
.env("PATH", tmp.path().to_str().unwrap())
.args(["--evaluate", "p"])
.stdout("")
.run();
}

#[test]
fn supports_multiple_paths() {
let tmp1 = temptree! {
"hello1.exe": "#!/usr/bin/env bash\necho hello\n",
}
.executable("hello1.exe");

let tmp2 = temptree! {
"hello2.exe": "#!/usr/bin/env bash\necho hello\n",
}
.executable("hello2.exe");

let path =
env::join_paths([tmp1.path().to_str().unwrap(), tmp2.path().to_str().unwrap()]).unwrap();

Test::new()
.justfile(r#"p := which("hello1.exe")"#)
.env("PATH", path.to_str().unwrap())
.args(["--evaluate", "p"])
.stdout(format!("{}", tmp1.path().join("hello1.exe").display()))
.run();

Test::new()
.justfile(r#"p := which("hello2.exe")"#)
.env("PATH", path.to_str().unwrap())
.args(["--evaluate", "p"])
.stdout(format!("{}", tmp2.path().join("hello2.exe").display()))
.run();
}

#[test]
fn supports_shadowed_executables() {
let tmp1 = temptree! {
"shadowed.exe": "#!/usr/bin/env bash\necho hello\n",
}
.executable("shadowed.exe");

let tmp2 = temptree! {
"shadowed.exe": "#!/usr/bin/env bash\necho hello\n",
}
.executable("shadowed.exe");

// which should never resolve to this directory, no matter where or how many
// times it appears in PATH, because the "shadowed" file is not executable.
let dummy = if cfg!(windows) {
temptree! {
"shadowed": "#!/usr/bin/env bash\necho hello\n",
}
} else {
temptree! {
"shadowed.exe": "#!/usr/bin/env bash\necho hello\n",
}
};

// This PATH should give priority to tmp1/shadowed.exe
let tmp1_path = env::join_paths([
dummy.path().to_str().unwrap(),
tmp1.path().to_str().unwrap(),
dummy.path().to_str().unwrap(),
tmp2.path().to_str().unwrap(),
dummy.path().to_str().unwrap(),
])
.unwrap();

// This PATH should give priority to tmp2/shadowed.exe
let tmp2_path = env::join_paths([
dummy.path().to_str().unwrap(),
tmp2.path().to_str().unwrap(),
dummy.path().to_str().unwrap(),
tmp1.path().to_str().unwrap(),
dummy.path().to_str().unwrap(),
])
.unwrap();

Test::new()
.justfile(r#"p := which("shadowed.exe")"#)
.env("PATH", tmp1_path.to_str().unwrap())
.args(["--evaluate", "p"])
.stdout(format!("{}", tmp1.path().join("shadowed.exe").display()))
.run();

Test::new()
.justfile(r#"p := which("shadowed.exe")"#)
.env("PATH", tmp2_path.to_str().unwrap())
.args(["--evaluate", "p"])
.stdout(format!("{}", tmp2.path().join("shadowed.exe").display()))
.run();
}