Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is CI using Ubuntu with snaps? #3223

Closed
BsAtHome opened this issue Dec 12, 2024 · 7 comments
Closed

Why is CI using Ubuntu with snaps? #3223

BsAtHome opened this issue Dec 12, 2024 · 7 comments

Comments

@BsAtHome
Copy link
Contributor

The last CI build of https://github.com/LinuxCNC/linuxcnc/actions/runs/12300199563/job/34327786462?pr=3221 should have been easy, but it stalled almost three and a half hours on installing a snap of firefox. Here is the log:
Thu, 12 Dec 2024 16:09:04 GMT Preparing to unpack .../4-firefox_1%3a1snap1-0ubuntu5_amd64.deb ...
Thu, 12 Dec 2024 16:09:04 GMT => Installing the firefox snap
Thu, 12 Dec 2024 16:09:04 GMT ==> Checking connectivity with the snap store
Thu, 12 Dec 2024 16:09:05 GMT ==> Installing the firefox snap
Thu, 12 Dec 2024 16:09:12 GMT 2024-12-12T16:09:12Z INFO Waiting for automatic snapd restart...
Thu, 12 Dec 2024 19:36:21 GMT firefox 133.0.3-1 from Mozilla✓ installed
Thu, 12 Dec 2024 19:36:21 GMT => Snap installation complete
Thu, 12 Dec 2024 19:36:21 GMT Unpacking firefox (1:1snap1-0ubuntu5) over (133.0+build2-0ubuntu0.24.04.1~mt1) ...
Thu, 12 Dec 2024 19:36:21 GMT dpkg: warning: unable to delete old directory '/etc/firefox': Directory not empty

I've seen more errors with Ubuntu and delays with snaps, which are unrelated to the actual build of LinuxCNC. Using snaps makes it just all the more prone to failure. Can't the CI builds and tests be done on plain Debian? It is supposed to be packaged on Debian. Why then build on Ubuntu?

@rene-dev
Copy link
Member

Because that is what github offers. There are builds on debian on the buildbot: http://buildbot.linuxcnc.org/buildbot/waterfall

@BsAtHome
Copy link
Contributor Author

Hm,... If is ain't in github, well, that is, well... very sad.

BTW, does the buildbot have anything newer than Buster (debian 10)? Can't see bullseye (debian 11) or bookworm (debian 12). Weezy(7) is obsolete and Jessy(8), Stretch(9) and Buster(10) are LTS EOL (https://www.debian.org/releases/) and are only supported by paid extended LTS support.

@rene-dev
Copy link
Member

@BsAtHome
Copy link
Contributor Author

http://buildbot2.highlab.com/buildbot/#/grid

Thank you.

However, on that bot it seems that all Bullseye and Bookworm builds fail continuously. It seems to think that another instance of LinuxCNC is running. It fails in runtests at the first test (feed-rate). It times out on the lockfile removal.

@rmu75
Copy link
Contributor

rmu75 commented Dec 13, 2024

I made a patch about a year ago that would allow starting multiple linuxcnc issues in parallel and therefore running tests in parallel.

@BsAtHome
Copy link
Contributor Author

I made a patch about a year ago that would allow starting multiple linuxcnc issues in parallel and therefore running tests in parallel.

Yes, PR #2722, which is work-in-progress as far as I can see. I think it is a good idea to allow multiple instances. However, I do see some problems with the patches.

You need to check the environment variable to be numeric in scripts/linuxcnc.in , something like:

if ! [[ "$LINUXCNC_INSTANCE" =~ ^[0-9]+$ ]]; then
  LINUXCNC_INSTANCE=0
  # or fail...
fi

Blindly adding an offset to the TCP port can be very wrong if the value exceeds the range of a short int. A check must be performed after the addition; not just a cast to short before the addition. There should also be some bounds to it checked before startup in the linuxcnc script.

I do not understand why you would need an N*16 offset on the TCP port and shmem key. Wouldn't an offset by N be enough?

There seem to be many other instances of rtapi_shmem_new() with different keys. Shouldn't all of these be changed for running a parallel instance?

@rmu75
Copy link
Contributor

rmu75 commented Dec 13, 2024

That should probably discussed in #2722 -- in principle you are right of course. It would perhaps suffice to check instance number and limit that to some value like 15max. I don't remember all details off the bat. Offset N would probably work also. As nobody seemed interested I didn't develop that any further.

All those more or less arbitrary segment IDs are ugly and could interfere with something else -- a clean solution would be to replace SYSV shared memory (shmget) with POSIX shared memory (shm_open), that would also take care of stale shared memory segments that can happen sometimes AFAIU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants