Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature(pipelines): collect builders logs #9786

Merged
merged 1 commit into from
Jan 14, 2025

Conversation

fruch
Copy link
Contributor

@fruch fruch commented Jan 12, 2025

we have multiple issue related to builder, and we don't have any logs from their end saved

this commit add code that collect the journalctl logs from builder and upload it to s3 and Argus.

Testing

PR pre-checks (self review)

  • I added the relevant backport labels
  • I didn't leave commented-out/debugging code

Reminders

  • Add New configuration option and document them (in sdcm/sct_config.py)
  • Add unit tests to cover my changes (under unit-test/ folder)
  • Update the Readme/doc folder relevant to this change (if needed)

@fruch fruch added backport/none Backport is not required test-provision-aws Run provision test on AWS labels Jan 12, 2025
@fruch fruch force-pushed the add_builder_logs branch 2 times, most recently from a53b60c to 814066b Compare January 12, 2025 20:38
we have multiple issue related to builder, and we don't have
any logs from thier end saved

this commit add code that collect the journalctl logs from builder
and upload it to s3 and Argus.
@fruch fruch requested review from a team and dimakr January 13, 2025 08:11
@fruch
Copy link
Contributor Author

fruch commented Jan 13, 2025

AWS provision failed cause of access to S3 (unrelated to the PR)

23:22:43  Command: 'sudo sctool backup -c a05d9ea2-312c-4c8d-99b0-9f9b57e8cbde --keyspace scylla_bench,keyspace1  --location s3:manager-backup-tests-us-east-1 '
23:22:43  Exit code: 1
23:22:43  Stdout:
23:22:43  Stderr:
23:22:43  Error: create backup target: location is not accessible
23:22:43  10.4.2.70: agent [HTTP 400] operation put: s3 upload: 301 Moved Permanently: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint. (code:PermanentRedirect) - make sure the location is correct and credentials are set, to debug SSH to 10.4.2.70 and run "scylla-manager-agent check-location -L s3:manager-backup-tests-us-east-1 --debug"
23:22:43  Trace ID: DXvyKGPjQMKUCkJpyxlKCQ (grep in scylla-manager logs)

@mikliapko this is quite wired, how did we end doing backup to us-east-1, when the test is running in eu-west-1 ?

@dimakr
Copy link
Contributor

dimakr commented Jan 13, 2025

@fruch All builder logs are collected and these can be from some old days. For instance in the test run https://jenkins.scylladb.com/job/scylla-staging/job/fruch/job/artifacts-ubuntu2404-test/30/, there are logs for the run date itself - 12.01.2025, but also logs from 18.12.2024. Do we want to collect all logs from builder or maybe filter them ('since' option or similar)?

@mikliapko
Copy link
Contributor

@mikliapko this is quite wired, how did we end doing backup to us-east-1, when the test is running in eu-west-1 ?

oh, I believe it's a result of getting rid of backup_bucket_region I haven't foreseen.
Before, the test was passing as backup_bucket_region parameter was set to us-east-1 (means that manager agent had this region in configuration). Now, manager-agent configuration file has the region_name there (eu-west-1) which leads to regions mismatch and eventually this failure.

For the sake of verifying this PR, I’d suggest rerunning the test in the us-east-1 region, if possible.

As a long term solution, I suppose we should have buckets in all regions used in testing (pre-created in advance or created in a test runtime if missing). So, I will try to handle it shortly.

@fruch
Copy link
Contributor Author

fruch commented Jan 13, 2025

@fruch All builder logs are collected and these can be from some old days. For instance in the test run https://jenkins.scylladb.com/job/scylla-staging/job/fruch/job/artifacts-ubuntu2404-test/30/, there are logs for the run date itself - 12.01.2025, but also logs from 18.12.2024. Do we want to collect all logs from builder or maybe filter them ('since' option or similar)?

some of them are from when the AMI was created, which is o.k.

Copy link
Contributor

@soyacz soyacz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@dimakr dimakr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fruch fruch merged commit fb56030 into scylladb:master Jan 14, 2025
5 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/none Backport is not required promoted-to-master test-provision-aws Run provision test on AWS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants