Make travis jobs faster #410

JohanLorenzo · 2016-04-22T09:18:06Z

With the dependencies being in the same repo, Travis results might take an hour overall. Time is split into 2 parts:

Time for travis to find a free VM - during peak hours (from 7am to 10am Pacific Time), it can take up to 30 minutes.
Time to actually do things - about 15 minutes for the best job (OS X) and 30 minutes for the worst one (Linux with all steps).

I see 2 ways of tackling these issues.
#1. Prevent multiple builds in the same job

In the Linux job particularly, we build foxbox or one of its dependencies many times:

when the first cargo build is run
when cargo test is executed in foxbox
1. each time cargo test runs in a subcrate => We build every dependency one more time. This a missing feature in cargo. See RFC: Add workspaces to Cargo rust-lang/rfcs#1525
when we run cargo test with dead code => every single dependency gets recompiled with no dead code optimization.

In the cross-compile job, we also build once in release mode the other time in debug.

I would recommend to put each of these different builds in self-contained jobs. Moreover, if subcrates are compiled an tested in their own job, it would allow us to deal with publishing on crates.io in that same job.
#2. Run on the container based infrastructure whenever we can

That infrastructure is faster to spawn and has more available machine than the VM infra. Using them would be ideal.

However, most of our cases requires us to run in a VM:

When we run on Mac.
When we can't install a package via the travis apt addon:
- either because we need to edit sources.list ourselves
- or because these packaged are not supported by the addon (yet?)

Hence, the only cases I see we could use the container would be the cargo test in subcrates.

The text was updated successfully, but these errors were encountered:

JohanLorenzo · 2016-04-27T16:22:44Z

After setting up the solution described above, I am not convinced this will help to have faster feedback.

Here's a set of jobs as an example. When I worked on the patch, I noticed results similar to that one.
I push that commit at 1:47pm UTC. It finished at 3:30pm UTC. The longest time was actually done waiting on machine to be available. Even though, that test was done at the worst time of the day, it's not an improvement compared to what we currently have.

Moreover, I realized we can't use docker instances on the subcrates jobs like I proposed. The dockers instances are running Ubuntu Precise, and the openzwave-adapter requires a version of gcc that allows -std=c++11. Docker instances on Trust won't happen until a few months.

Hence, I wonder if we shouldn't use a different CI platform.

@samgiles proposed to set up a buildbot like rust's bors. Setting up a production-ready buildbot will likely take some time. I wonder if using taskcluster-github wouldn't be easier to set and scale up. Moreover, combined with Treeherder, it would allow us to display 3-dimensional-matrices of jobs (platform/suites/jobs), whereas Travis stucks us to a sorted linear list.

What do you think @fabricedesre ?

fabricedesre · 2016-04-27T16:28:03Z

We don't have time now to set up a whole new buildbot infra. My biggest gripe was the intermittent failures which are now fixed.

JohanLorenzo · 2016-04-27T16:31:36Z

I agree. As it would take too much time, let's leave this issue aside, for now.

JohanLorenzo · 2016-04-28T11:29:49Z

Another approach suggested by @fabricedesre : Don't run coverage when we know it's not going to change on PRs.

Cases we could be smarter on

I think we can also extend that reasoning to : Run only tests that are necessary. For instance:

if a patch contains only a change in...	there is no need to run any...
`src/main.rs`	tests under `components/*`
a component like `dns_challenge`	test in taxonomy or thinkerbell
in a JS integration test	rust test
the README	build or test at all

Exclusion vs inclusion

At first it sounds safer to define exclusion rules (like described in the table above). If we went the other way (aka "when this change happens, run only these steps"), we'd have to declare what steps to run, each time we add/change a piece of code. This can be easily missed. I believe that a test is better if it's run too often rather than never.

PRs vs master

The exclusion rules might be buggy. Based on that, I would recommend to keep running every test on master, and just narrow down the PRs.

How we could proceed?

A way to define these exclusions rules could be within files like .gitignore. We define one per directory and it defines rules for every subdirectory.

Then a script will:

parse the patch to get the list of files changed
go find the exceptions rules
reduce the list to what existing in every ruled found above
generate the steps to run
execute them

Going further?

For rust code, we can probably avoid to manually define what should not be tested. We could parse Cargo.toml to know what are the dependencies between each rust component. I think we can handle that after the manual case implemented.

JohanLorenzo mentioned this issue Apr 25, 2016

[WIP] Split travis jobs into faster ones #416

Closed

JohanLorenzo mentioned this issue Apr 28, 2016

Travis: Remove unused cache (prone to intermittent failures) #442

Merged

ferjm added the Quality: Testing label May 3, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make travis jobs faster #410

Make travis jobs faster #410

JohanLorenzo commented Apr 22, 2016

JohanLorenzo commented Apr 27, 2016

fabricedesre commented Apr 27, 2016

JohanLorenzo commented Apr 27, 2016

JohanLorenzo commented Apr 28, 2016 •

edited

Loading

Make travis jobs faster #410

Make travis jobs faster #410

Comments

JohanLorenzo commented Apr 22, 2016

JohanLorenzo commented Apr 27, 2016

fabricedesre commented Apr 27, 2016

JohanLorenzo commented Apr 27, 2016

JohanLorenzo commented Apr 28, 2016 • edited Loading

Cases we could be smarter on

Exclusion vs inclusion

PRs vs master

How we could proceed?

Going further?

JohanLorenzo commented Apr 28, 2016 •

edited

Loading