Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ARO-5368] Try and log the VM info + console log on failure #3629

Merged
merged 25 commits into from
Jul 15, 2024

Conversation

hawkowl
Copy link
Collaborator

@hawkowl hawkowl commented Jun 12, 2024

Fixes: issues.redhat.com/browse/ARO-5368

Dumps the VM info + console logs on failure so that we don't need to run the Geneva Action or have the control plane still around to get it.

See #3287

@hawkowl
Copy link
Collaborator Author

hawkowl commented Jun 12, 2024

/azp run ci, e2e

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@hawkowl
Copy link
Collaborator Author

hawkowl commented Jun 12, 2024

#3630 splits out the log changes

"github.com/Azure/ARO-RP/pkg/util/stringutils"
)

func (m *manager) LogAzureInformation(ctx context.Context) (interface{}, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want some doc comments here about what this func does and what information it gets.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

pkg/cluster/failurediagnostics/diagnostics.go Show resolved Hide resolved
{f: m.logNodes, isJSON: true},
{f: m.logClusterOperators, isJSON: true},
{f: m.logIngressControllers, isJSON: true},
{f: d.LogAzureInformation, isJSON: false},
Copy link
Collaborator

@bitoku bitoku Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about not adding LogAzureInformation in gatherFailureLogs but adding a new method?
It would be simpler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feedback doesn't make a lot of sense to me. What's the need for adding another function?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean because we added d.LogAzureInformation in gatherFailureLogs, we need isJSON field and some conditional branches in here.

If we make a new method instead of gatherFailureLogs, and call d.LogAzureInformation there not in gatherFailureLogs, we can keep this method simple.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the existing process of dumping out YAML/JSON is a bit unfriendly for SREs, especially when it comes to the ClusterOperators output. I think it would be better to move some of these to more formatted output, as well as have the ability to automatically analyse the logs (e.g. the serial console logs) and output formatted results there. I'm not sure if wrapping this func in another will make it overall more simple, but I do think another pass at this whole area would be a welcome change in a separate PR.

pkg/util/steps/runner.go Outdated Show resolved Hide resolved
pkg/cluster/failurediagnostics/virtualmachines.go Outdated Show resolved Hide resolved
pkg/util/storage/manager.go Show resolved Hide resolved
Copy link

Please rebase pull request.

@github-actions github-actions bot added needs-rebase branch needs a rebase and removed ready-for-review labels Jun 16, 2024
@hawkowl hawkowl force-pushed the hawkowl/e2e-failure-vm-status branch from 80fa07b to 00ca0d7 Compare June 18, 2024 00:11
@github-actions github-actions bot removed the needs-rebase branch needs a rebase label Jun 18, 2024
@hawkowl hawkowl force-pushed the hawkowl/e2e-failure-vm-status branch from d65328a to c2e5526 Compare July 8, 2024 00:18
@hawkowl
Copy link
Collaborator Author

hawkowl commented Jul 8, 2024

/azp run ci, e2e

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@hawkowl hawkowl force-pushed the hawkowl/e2e-failure-vm-status branch from c2e5526 to a91be7c Compare July 8, 2024 00:38
@hawkowl
Copy link
Collaborator Author

hawkowl commented Jul 8, 2024

/azp run ci, e2e

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@hawkowl
Copy link
Collaborator Author

hawkowl commented Jul 8, 2024

/azp run e2e

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jaitaiwan jaitaiwan merged commit 31af734 into master Jul 15, 2024
21 checks passed
@SudoBrendan SudoBrendan deleted the hawkowl/e2e-failure-vm-status branch July 24, 2024 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request go Pull requests that update Go code size-medium Size medium skippy pull requests raised by member of Team Skippy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants