Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make travis jobs faster #410

Open
JohanLorenzo opened this issue Apr 22, 2016 · 4 comments
Open

Make travis jobs faster #410

JohanLorenzo opened this issue Apr 22, 2016 · 4 comments

Comments

@JohanLorenzo
Copy link
Contributor

With the dependencies being in the same repo, Travis results might take an hour overall. Time is split into 2 parts:

  • Time for travis to find a free VM - during peak hours (from 7am to 10am Pacific Time), it can take up to 30 minutes.
  • Time to actually do things - about 15 minutes for the best job (OS X) and 30 minutes for the worst one (Linux with all steps).

I see 2 ways of tackling these issues.
#1. Prevent multiple builds in the same job

In the Linux job particularly, we build foxbox or one of its dependencies many times:

  1. when the first cargo build is run
  2. when cargo test is executed in foxbox
    1. each time cargo test runs in a subcrate => We build every dependency one more time. This a missing feature in cargo. See RFC: Add workspaces to Cargo rust-lang/rfcs#1525
  3. when we run cargo test with dead code => every single dependency gets recompiled with no dead code optimization.

In the cross-compile job, we also build once in release mode the other time in debug.

I would recommend to put each of these different builds in self-contained jobs. Moreover, if subcrates are compiled an tested in their own job, it would allow us to deal with publishing on crates.io in that same job.
#2. Run on the container based infrastructure whenever we can

That infrastructure is faster to spawn and has more available machine than the VM infra. Using them would be ideal.

However, most of our cases requires us to run in a VM:

Hence, the only cases I see we could use the container would be the cargo test in subcrates.

@JohanLorenzo
Copy link
Contributor Author

After setting up the solution described above, I am not convinced this will help to have faster feedback.

Here's a set of jobs as an example. When I worked on the patch, I noticed results similar to that one.
I push that commit at 1:47pm UTC. It finished at 3:30pm UTC. The longest time was actually done waiting on machine to be available. Even though, that test was done at the worst time of the day, it's not an improvement compared to what we currently have.

Moreover, I realized we can't use docker instances on the subcrates jobs like I proposed. The dockers instances are running Ubuntu Precise, and the openzwave-adapter requires a version of gcc that allows -std=c++11. Docker instances on Trust won't happen until a few months.

Hence, I wonder if we shouldn't use a different CI platform.

@samgiles proposed to set up a buildbot like rust's bors. Setting up a production-ready buildbot will likely take some time. I wonder if using taskcluster-github wouldn't be easier to set and scale up. Moreover, combined with Treeherder, it would allow us to display 3-dimensional-matrices of jobs (platform/suites/jobs), whereas Travis stucks us to a sorted linear list.

What do you think @fabricedesre ?

@fabricedesre
Copy link
Collaborator

We don't have time now to set up a whole new buildbot infra. My biggest gripe was the intermittent failures which are now fixed.

@JohanLorenzo
Copy link
Contributor Author

I agree. As it would take too much time, let's leave this issue aside, for now.

@JohanLorenzo
Copy link
Contributor Author

JohanLorenzo commented Apr 28, 2016

Another approach suggested by @fabricedesre : Don't run coverage when we know it's not going to change on PRs.

Cases we could be smarter on

I think we can also extend that reasoning to : Run only tests that are necessary. For instance:

if a patch contains only a change in... there is no need to run any...
src/main.rs tests under components/*
a component like dns_challenge test in taxonomy or thinkerbell
in a JS integration test rust test
the README build or test at all

Exclusion vs inclusion

At first it sounds safer to define exclusion rules (like described in the table above). If we went the other way (aka "when this change happens, run only these steps"), we'd have to declare what steps to run, each time we add/change a piece of code. This can be easily missed. I believe that a test is better if it's run too often rather than never.

PRs vs master

The exclusion rules might be buggy. Based on that, I would recommend to keep running every test on master, and just narrow down the PRs.

How we could proceed?

A way to define these exclusions rules could be within files like .gitignore. We define one per directory and it defines rules for every subdirectory.

Then a script will:

  1. parse the patch to get the list of files changed
  2. go find the exceptions rules
  3. reduce the list to what existing in every ruled found above
  4. generate the steps to run
  5. execute them

Going further?

For rust code, we can probably avoid to manually define what should not be tested. We could parse Cargo.toml to know what are the dependencies between each rust component. I think we can handle that after the manual case implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants