Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canary deployments - REST, TW, Publisher #8650

Open
mapellidario opened this issue Aug 26, 2024 · 12 comments
Open

Canary deployments - REST, TW, Publisher #8650

mapellidario opened this issue Aug 26, 2024 · 12 comments

Comments

@mapellidario
Copy link
Member

mapellidario commented Aug 26, 2024

introduction

We have always dreamed about processing a small portion of production crab tasks with a new crab version.

We do not have a general solution, each crab system will need to be adapted in a very specific way.

REST

This requires no change in the code, but requires some experience with k8s deployments. The idea is:

  • create a new deployment in dmwm/CMSKubernetes/helm/crabserver/templates, call it crabserver-canary.
    • change only metadata.name: crabserver-canary, do not change for example metadata.labels.app: crabserver nor spec.selector.matchLabels.app: crabserver
  • create a new values file dmwm/CMSKubernetes/helm/crabserver/values-canary.yaml
  • change deploy.sh to accept canary as an argument, you will need to edit the cluster_map.
    • otherwise simply use helm template crabserver . -f values.yaml -f values-${1}.yaml | kubectl -n crab apply -f -, but beware to be connected with the proper context (username+cluster)!

TaskWorker

This requires some change in the code, that have been outlined in https://github.com/dmwm/CRABServer/wiki/TaskWorker-Canary-Deployment

we decided that we run one TW per virtual machine, so that the hostname is enough to identify which TW process/container we are referring to.

Publisher

This is not defined yet, but we have some ideas. The simplest one is

  • run a publisher on every VM where we run TW
  • make sure that a single publisher only select jobs that have been submitted by tasks processed by the TW running on the same VM
    • need to add tm_twname to filetransfersdb table or pick it up from task table using a join on taskname. In any case need to change the SQL which picks the files to work on.
@belforte
Copy link
Member

I believe that we agreed that @novicecpp will do the K8s part together with @aspiringmind-code

I am still un-decided if it is better to call it crabserver-canary or crabserver-qa

@belforte
Copy link
Member

NOTE: it will be important to be able to tell (easily) the canary pod from the others in our monitoring, e.g. quickly tell where HTTP errors come from

@novicecpp
Copy link
Contributor

novicecpp commented Aug 28, 2024

Screenshot from 2024-08-28 15-33-38

Look promising.
https://monit-grafana.cern.ch/goto/3JXSKT3Ig?orgId=11

EDIT: new image to contains pod name.

@belforte
Copy link
Member

better split off TW+Publisher to a different issue #8678

@novicecpp
Copy link
Contributor

novicecpp commented Sep 13, 2024

To-do:

@novicecpp
Copy link
Contributor

novicecpp commented Oct 31, 2024

Sorry. I need to keep this open because it does not deploy on production yet.

test (at least test12) and preprod are now use new helm chart.

@novicecpp novicecpp removed their assignment Oct 31, 2024
@belforte
Copy link
Member

@aspiringmind-code will do it :-)

@novicecpp
Copy link
Contributor

Sorry, wrong issue.

This one I tested myself on my test12 and preprod, and it works.
But you may need to modify dashboard a bit to make it more easier to see metrics between crabserver and crabserver-canary.

@belforte
Copy link
Member

I knew that it works, but AFAIK it is not deployed in production, not used, not monitored. We have not even deployed latest CRABServer tag there. Plenty of useful work to do !

@aspiringmind-code
Copy link
Contributor

For Canary TW deployment, I envision changes made in commits here and here @belforte let me know if you agree with this approach. Thanks!

@belforte
Copy link
Member

thanks @aspiringmind-code changes to TW are not trivial, let's create an ad-hoc issue

@belforte
Copy link
Member

belforte commented Dec 6, 2024

time to look at REST #8859

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants
@belforte @novicecpp @mapellidario @aspiringmind-code and others