Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite script to clean up dev-desktops #390

Open
jdno opened this issue Feb 26, 2024 · 9 comments
Open

Rewrite script to clean up dev-desktops #390

jdno opened this issue Feb 26, 2024 · 9 comments

Comments

@jdno
Copy link
Member

jdno commented Feb 26, 2024

In #389, we created a cronjob that runs a Bash script to periodically clean up unused cache directories on the dev-desktops. While the script itself is relatively simple, it uses some advanced arguments for find and a lot of loops to go through all directories inside /home. Given that the script performs destructive actions on the machine, we should make it more robust and easier to maintain by rewriting it in Rust.

We specifically want to get these benefits from rewriting it:

  • Make it easier to review and understand the code
  • Test the script thoroughly with unit tests

Prior Art

I've considered rewriting the script as part of #389, but the priority wasn't high enough to dedicate the required time to the task. But the code snippet below can be used as inspiration or a starting point.

ansible/roles/dev-desktop/files/free_disk_space/src/main.rs
use std::path::{Path, PathBuf};

use anyhow::Error;
use clap::Parser;
use walkdir::WalkDir;

/// Clean up unused projects
///
/// This CLI finds all projects that users have checked out on the dev-desktops and deletes
/// temporary files if the project has not been modified in a certain number of days.
///
/// Specifically, the CLI will look for checkouts of `rust-lang/rust` and delete the `build`
/// directory. And it will find unused crates and delete the `target` directory.
#[derive(Parser)]
struct Cli {
    /// The root directory to search for projects
    #[arg(short, long = "root-directory", default_value = "/home")]
    root_directory: PathBuf,

    /// The maximum age of a project in days
    ///
    /// The CLI will only clean projects that have not been updated in the last `max-age` days.
    #[arg(short, long = "max-age", default_value_t = 60)]
    max_age: u32,

    /// Perform a dry run without cleaning any files
    ///
    /// When this flag is set, the CLI will only print the directories that would be removed.
    #[arg(long = "dry-run", default_value_t = false)]
    dry_run: bool,
}

fn main() -> Result<(), Error> {
    let cli = Cli::parse();

    let all_artifact_directories = find_artifact_directories(&cli.root_directory)?;

    Ok(())
}

fn find_artifact_directories(root_directory: &Path) -> Result<Vec<PathBuf>, Error> {
    WalkDir::new(root_directory)
        .into_iter()
        .filter_entry(|entry| is_rust_checkout(entry.path()) || is_cargo_crate(entry.path()))
        .map(|entry| entry.map(|e| e.into_path()).map_err(|e| e.into()))
        .collect()
}

fn is_rust_checkout(path: &Path) -> bool {
    path.join("x.py").is_file() && path.join("build").is_dir()
}

fn is_cargo_crate(path: &Path) -> bool {
    path.join("Cargo.toml").is_file() && path.join("target").is_dir()
}

#[cfg(test)]
mod tests {
    use std::fs::{create_dir, create_dir_all, File};

    use tempfile::TempDir;

    use super::*;

    fn cargo_crate(parent: Option<&Path>) -> TempDir {
        let krate = parent
            .map(TempDir::new_in)
            .unwrap_or_else(TempDir::new)
            .expect("failed to create temporary crate");

        File::create(krate.path().join("Cargo.toml")).expect("failed to create fake Cargo.toml");
        create_dir(krate.path().join("target")).expect("failed to create fake target directory");

        krate
    }

    fn rust_checkout(parent: Option<&Path>) -> TempDir {
        let checkout = parent
            .map(TempDir::new_in)
            .unwrap_or_else(TempDir::new)
            .expect("failed to create temporary checkout");

        File::create(checkout.path().join("x.py")).expect("failed to create fake x.py");
        create_dir(checkout.path().join("build")).expect("failed to create fake build directory");

        checkout
    }

    #[test]
    fn find_artifact_directories_in_root() {
        let rust_checkout = rust_checkout(None);

        let artifact_directories = find_artifact_directories(rust_checkout.path())
            .expect("failed to find artifact directories");

        assert_eq!(artifact_directories, vec![rust_checkout.path()]);
    }

    #[test]
    fn find_artifact_directories_recursively() {
        let root_directory = TempDir::new().expect("failed to create temporary directory");

        let rust_checkout = rust_checkout(Some(root_directory.path()));
        let cargo_crate = cargo_crate(Some(root_directory.path()));

        let other = root_directory.path().join("other").join("build");
        create_dir_all(other).expect("failed to create fake directory");

        let artifact_directories = find_artifact_directories(root_directory.path())
            .expect("failed to find artifact directories");

        assert_eq!(
            artifact_directories,
            vec![rust_checkout.path(), cargo_crate.path()]
        );
    }

    #[test]
    fn is_rust_checkout_returns_true_for_rust_checkout() {
        let checkout = rust_checkout(None);

        assert!(is_rust_checkout(checkout.path()));
    }

    #[test]
    fn is_rust_checkout_returns_false_for_cargo_crate() {
        let root_directory = cargo_crate(None);

        assert!(!is_rust_checkout(root_directory.path()));
    }

    #[test]
    fn is_rust_checkout_returns_false_for_random_directory() {
        let root_directory = TempDir::new().expect("failed to create temporary directory");

        // Create a fake build directory but no x.py
        create_dir(root_directory.path().join("build"))
            .expect("failed to create fake build directory");

        assert!(!is_rust_checkout(root_directory.path()));
    }

    #[test]
    fn is_cargo_crate_returns_true_for_cargo_crate() {
        let root_directory = cargo_crate(None);

        assert!(is_cargo_crate(root_directory.path()));
    }

    #[test]
    fn is_cargo_crate_returns_false_for_rust_checkout() {
        let root_directory = rust_checkout(None);

        assert!(!is_cargo_crate(root_directory.path()));
    }

    #[test]
    fn is_cargo_crate_returns_false_for_random_directory() {
        let root_directory = TempDir::new().expect("failed to create temporary directory");

        // Create Cargo.toml but no target directory
        File::create(root_directory.path().join("Cargo.toml"))
            .expect("failed to create fake Cargo.toml");

        assert!(!is_cargo_crate(root_directory.path()));
    }
}

Resources

@jdno jdno added this to infra-team Feb 26, 2024
@jdno jdno moved this from Backlog to Ready in infra-team Feb 26, 2024
@github-project-automation github-project-automation bot moved this to Backlog in infra-team Feb 26, 2024
@jdno
Copy link
Member Author

jdno commented Mar 11, 2024

In #389 (comment), it was pointed out that the developer guide for rustc suggests creating a build-rust-analyzer directory. We might want to clean that up in the future, too.

@nain-F49FF806
Copy link
Contributor

nain-F49FF806 commented Oct 9, 2024

Hi, I would like to tackle this issue, as a start in contributing to Rust-lang.
I followed to this page from the eurorust implRoom issue.
But want to mention that I will not to be able to attend eurorust in person or virtually : (

I have a few questions / ideas that may be of use here regardless:

  • Could we use the CACHEDIR.TAG to make the detection of the cache dirs more robust?
    I am not knowledgeable about the x.py build dir, but target of rust crates usually have a CACHEDIR.TAG
    created by cargo. The presence of which would be a sign that the directory is indeed safe to clean out.
  • Should we use cargo clean in place of deleting the target directory of crates directly?
    This will take care of cachedir checking, and future changes to target directory.

@nain-F49FF806
Copy link
Contributor

Also possibly relevant cleanup step: https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#gc

@jieyouxu
Copy link
Member

I have a few questions / ideas that may be of use here regardless:

  • Could we use the CACHEDIR.TAG to make the detection of the cache dirs more robust?
    I am not knowledgeable about the x.py build dir, but target of rust crates usually have a CACHEDIR.TAG
    created by cargo. The presence of which would be a sign that the directory is indeed safe to clean out.

This depends on the particular checkout.

  • For rust-lang/rust checkouts (i.e. what rustc/std/rustdoc developers work with), bootstrap (rust-lang/rust build system) maintains a separate build directory which is not trivially managed by cargo and they can include other artifacts. For rust-lang/rust checkouts, it's actually easier to run ./x clean which should handle trimming unneeded build artifacts. Note that via git worktree, some rustc/std/rustdoc devs have multiple checkouts.
    • The default contributor build directory is called build, but you can customize it.
    • Be careful to not accidentally delete a source directory named build/, that can be awkward.
    • This build/ directory does not have a CACHEDIR.TAG you can look at.
  • ... but that alone is not sufficient. Commonly, rust-lang/rust contributors will setup rust-analyzer and use a separate build directory. build-rust-analyzer is a common alternative build dir name for r-a, but the contributor can feed r-a with any arbitrary directory name, i.e. it doesn't need to be exactly named build-rust-analyzer.
  • Outside of rust-lang/rust checkouts contributors might also have usual cargo projects for experimenting and tools and what not, and also cargo/rustfmt contributors's checkouts might be "regular" cargo projects.

If you can implement a heuristic to detect if a directory is a cargo-managed project (this probably should not fire on cargo dirs inside rust-lang/rust checkouts because bootstrap has additional logic on top, as you don't want to delete bootstrap's self build via cargo, then you have to build bootstrap to run ./x clean), then yes I think that can make sense. You might want to use nightly cargo to run cargo clean though because nightly cargo might support newer variations of build structure compared to stable cargo versions.


Additional space-saving techniques:

  • Remove specifically-dated nightly rustup toolchains. This can come from using cargo-bisect-rustc and retaining some specific toolchains. But please do not delete stable-<target-triple> or beta-<target-triple> or nightly-<target-triple> toolchains.

@jdno
Copy link
Member Author

jdno commented Oct 10, 2024

As a first step, I'd like to rewrite the script that we have now and make sure that the new implementation is well-tested and well-documented. It'll be much easier afterwards to add more features or improve its heuristics. 🙂

@jdno
Copy link
Member Author

jdno commented Oct 10, 2024

@nain-F49FF806 I'm more than happy to help with mentoring, reviews, or anything else that you might need. Feel free to reach out on Zulip as well. If we can break the implementation into small PRs that'd great!

@klaus82
Copy link

klaus82 commented Dec 3, 2024

Hi @jdno is this task still open and not assigned? I would like to start to work on this as a start in contributing.

@klaus82
Copy link

klaus82 commented Dec 16, 2024

@nain-F49FF806 are you working on this?

@nain-F49FF806
Copy link
Contributor

@klaus82 Not atm. Not actively. I had intended to find some time to get to this but haven't been able to.
Please go ahead and feel free to pitch in.

Cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Ready
Development

No branches or pull requests

5 participants