Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build-and-provide does not update cowbuilder environment #158

Open
mark0n opened this issue Jun 17, 2016 · 4 comments · May be fixed by #159
Open

build-and-provide does not update cowbuilder environment #158

mark0n opened this issue Jun 17, 2016 · 4 comments · May be fixed by #159

Comments

@mark0n
Copy link

mark0n commented Jun 17, 2016

I'm running into issues with build-and-provide not updating the cowbuilder environment. This seems to be caused by a lock file that must have been left behind on an earlier run:

<jenkins5:~ >ls /var/run/lock/jessie-amd64.update -l
-rw-r--r-- 1 jenkins-slave jenkins-slave 0 Jun  1 10:20 /var/run/lock/jessie-amd64.update                                                                                          

I'm not sure why these lock files are left behind but this has happened multiple times on one of our build slaves. But I guess the why doesn't matter too much (it could just be the result of a crashed executor). Anyway, the issue is detected by build-and-provide correctly

15:45:30 + echo '*** Update run already taking place, skipping ***'
15:45:30 *** Update run already taking place, skipping ***

Unfortunately it seems to handle this case in the wrong way. It just prints the warning, skips the cowbuilder update and proceeds. As a consequence I end up with a cowbuilder environment that has not seen an apt-get update run for weeks causing builds against old versions of my libraries.

@mika: I'm still a bit confused by some details of the locking code (the 9> part in particular) but overall it makes sense to me - except for the following line: 9c37df2#diff-8d65555d96716579f851f18d113e9d79R331 Can you please elaborate a bit on your intentions? I believe we should at least sleep until the lock file goes away. We definitely shouldn't proceed with the build if we aren't 100% sure that our cowbuilder environment has actually successfully (and completely!) been updated. Even if another cowbuilder --update run is underway we should wait until it's done and then start our own update run. This would make sure we never miss the latest version of our build dependencies.

@mark0n
Copy link
Author

mark0n commented Jun 17, 2016

Ok, I spent some time to track down what triggered this problem in the first place. As it turns out a user aborted a job while it was updating the cowbuilder environment. Other reasons that could potentially result in the lock file being left behind would be the crash of an executor or a loss of power.

@linuxmaniac
Copy link
Contributor

Maybe a check for the date of the lockfile in order to verify that is a faulty one and remove it?

@mark0n
Copy link
Author

mark0n commented Jun 20, 2016

How about waiting until either
a) the lock file is being removed by some other process or
b) the lock file is more than 10 minutes old?

For (a) we can just continue, for (b) we need to remove the lock file first.

@mark0n
Copy link
Author

mark0n commented Jun 20, 2016

Ok, I think I understand the problem a bit better:

The lock file is created by

(
# code that uses the lock
) 9>"${update_lockfile}"

The first line inside this block is acquiring the lock on the open file with file descriptor 9 from the kernel. The lock is automatically released as soon as the file is closed. This should happen even in case the process dies pretty badly. But the file might be left behind!
It seems like the real problem is the test here: It doesn't test for the lock but for the file. I guess something like

# remove the whole "Update run already taking place, skipping" block
(
flock --wait 600 9 || exit 1
) 9>@${update_lockfile}"

would be more appropriate.

We are currently testing a fix. I'll open a PR as soon as I feel confident about it.

@mark0n mark0n linked a pull request Jun 20, 2016 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants