Skip to content

Conversation

tarasmadan
Copy link
Collaborator

After this change we'll see only the reports coming from the same thread.

@tarasmadan tarasmadan marked this pull request as ready for review September 24, 2025 09:36
@tarasmadan tarasmadan force-pushed the get-only-the-thread-reports branch from d565a9a to 1e097c2 Compare September 24, 2025 11:58
Copy link
Member

@ramosian-glider ramosian-glider left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code should also handle reports from the CPU context.

vm/vm.go Outdated
func getReportChain(reps []*report.Report, threadID string) []*report.Report {
var res []*report.Report
for _, rep := range reps {
if rep.ThreadID == threadID {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't pkg/report already filtered reports by context?

If yes, then why do we need this change here?
If no, then how have pkg/report parsed output with intermixed reports from different context? Are we careful enough during parsing to properly split all intermixed reports from different contexts? Or if not, then isn't our parsing broken?

Copy link
Collaborator Author

@tarasmadan tarasmadan Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't pkg/report already filtered reports by context?

No. It extracts the reports one-by-one paying attention to the "current report ThreadID".

If no, then how have pkg/report parsed output with intermixed reports from different context? Are we careful enough during parsing to properly split all intermixed reports from different contexts? Or if not, then isn't our parsing broken?

Current logic is trying to get report assuming it may be starting at any line.
Suppressed/broken reports are ignored.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It extracts the reports one-by-one paying attention to the "current report ThreadID".

Then isn't it broken for what you want to do?
What will it extract from the following log?

[C1] report0_first_line
[C1] report0_middle_line
[C1] report0_last_line
[C2] report1_first_line
[C2] report1_middle_line
[C1] report2_first_line
[C1] report2_middle_line
[C1] report2_last_line
[C2] report1_last_line

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will work correctly because report1 will be parsed from report1_first_line to report1_last_line ignoring the lines from another context.
Once done, we'll start looking for the next report from the line [C2] report1_middle_line.
It is a bit ~quadratish algorithm, but the correctness should be Ok.

Let me add the synthetic test to cover this case. It should be helpful to double check your concern and prevent these problems in the future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The answer (see pkg/report/testdata/linux/parse_all/3) is:

  1. report0 concatenated with report2.
  2. report1.
  3. report2.

Point 1 is a bug, but I don't think it is critical.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main purpose of report is being included in every bug email inline. I think we generally want to include only one report to not make emails way too long. Will anything here affect emails?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will anything here affect emails?

After this PR and #6332 the plan is to disable panic_on_warn.
We'll start receiving bloated first report after that. That is the problem.
Thanks. It means point 1 is critical and should be fixed.

@a-nogikh
Copy link
Collaborator

Please extend the commit/PR description, I have only read what you provided here (I have no other context) and thus I struggle to understand the problem it's supposed to solve.

@tarasmadan tarasmadan marked this pull request as draft September 25, 2025 09:51
@tarasmadan tarasmadan force-pushed the get-only-the-thread-reports branch 7 times, most recently from d932998 to 003ba11 Compare September 25, 2025 13:17
After this change we'll see only the reports coming from the same thread.
@tarasmadan tarasmadan force-pushed the get-only-the-thread-reports branch from 003ba11 to 712a9bf Compare September 25, 2025 13:58
// TestParseAll's focus points are the ability to:
// 1. parse multiple reports
// 2. extract correct ThreadID/CpuID.
func TestParseAll(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we parse all of our existing reports with this? How large will be the diff for existing reports? I think they generally did not include multiple reports, so perhaps not much will change. That may be better than creating 2 parallel test suites and 2 different tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This testing suite may benefit from the refactoring.
Do you feel concerned only about the parse_all or also about the guilty, guilty_raw, symbolize etc.?

I think all this transformation/extraction testing may be simplified a lot if we'll agree to have 2 files per test:
The code will be: Transform(input) -> serialize what we care about -> compare to the output file.
Current -update flag will work the same way allowing us to do the large scale tuning.

In this case we don't need any parsing logic. Only the function_under_test + logging the parts we care about.
WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you feel concerned only about the parse_all or also about the guilty, guilty_raw, symbolize etc.?

guilty accepts different inputs (symbolized and cleaned output, basically reports).
I dunno what is symbolize, and why it's separated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One file per test was intentional, it makes diff reviews possible.
For example, if we split them into 2 files, if somebody changes something that changes output, PR will contain only the output files, but it will be hard to understand if these are legit changes or not, b/c you won't see the input files in the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants