-
Notifications
You must be signed in to change notification settings - Fork 27
Add fuzzer: FuzzEncodeFromJSON which signals roundtripping issues #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
t.Logf("YAML %q <%[1]x>", b2) | ||
|
||
if !bytes.Equal(b, b2) { | ||
t.Errorf("Marshal->Unmarshal->Marshal mismatch:\n- expected: %q\n- got: %q", b, b2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dolmen , what do you think about changing this to t.Logf
, so that we can merge the PR to main and always see where the problems are?
If we error, we can't really merge this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now it stops on the first error. Changing to Logf
lets us see all the issues and fix them one at a time (and add a regression test for each fix).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened #45 to address the -0
case failure in the description of this issue.
With that fix I was able to run a 1m fuzzing test successfully:
> go test -fuzz FuzzEncodeFromJSON -fuzztime=1m
OK: 50 passed
fuzz: elapsed: 0s, gathering baseline coverage: 0/9 completed
fuzz: elapsed: 0s, gathering baseline coverage: 9/9 completed, now fuzzing with 20 workers
fuzz: elapsed: 3s, execs: 313723 (104563/sec), new interesting: 316 (total: 325)
fuzz: elapsed: 6s, execs: 526587 (70945/sec), new interesting: 411 (total: 420)
fuzz: elapsed: 9s, execs: 576090 (16504/sec), new interesting: 421 (total: 430)
fuzz: elapsed: 12s, execs: 584466 (2792/sec), new interesting: 426 (total: 435)
fuzz: elapsed: 15s, execs: 589110 (1548/sec), new interesting: 433 (total: 442)
fuzz: elapsed: 18s, execs: 592742 (1211/sec), new interesting: 438 (total: 447)
fuzz: elapsed: 21s, execs: 595062 (773/sec), new interesting: 441 (total: 450)
fuzz: elapsed: 24s, execs: 601436 (2124/sec), new interesting: 443 (total: 452)
fuzz: elapsed: 27s, execs: 604790 (1118/sec), new interesting: 449 (total: 458)
fuzz: elapsed: 30s, execs: 606250 (487/sec), new interesting: 450 (total: 459)
fuzz: elapsed: 33s, execs: 609672 (1141/sec), new interesting: 451 (total: 460)
fuzz: elapsed: 36s, execs: 612397 (908/sec), new interesting: 453 (total: 462)
fuzz: elapsed: 39s, execs: 613778 (460/sec), new interesting: 455 (total: 464)
fuzz: elapsed: 42s, execs: 616508 (910/sec), new interesting: 456 (total: 465)
fuzz: elapsed: 45s, execs: 619459 (984/sec), new interesting: 458 (total: 467)
fuzz: elapsed: 48s, execs: 624442 (1661/sec), new interesting: 459 (total: 468)
fuzz: elapsed: 51s, execs: 628547 (1368/sec), new interesting: 459 (total: 468)
fuzz: elapsed: 54s, execs: 632180 (1212/sec), new interesting: 459 (total: 468)
fuzz: elapsed: 57s, execs: 636113 (1311/sec), new interesting: 459 (total: 468)
fuzz: elapsed: 1m0s, execs: 671082 (11613/sec), new interesting: 460 (total: 469)
fuzz: elapsed: 1m1s, execs: 671082 (0/sec), new interesting: 460 (total: 469)
PASS
ok go.yaml.in/yaml/v3 65.454s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened #46 to address another issue highlighted by the fuzzing test.
With both fixes applied:
~/c/go-yaml-2 (merge-neg-zero-tabs)> go test -fuzz FuzzEncodeFromJSON
OK: 50 passed
fuzz: elapsed: 0s, gathering baseline coverage: 0/1048 completed
fuzz: elapsed: 1s, gathering baseline coverage: 1048/1048 completed, now fuzzing with 20 workers
fuzz: elapsed: 3s, execs: 191976 (63988/sec), new interesting: 1 (total: 1049)
fuzz: elapsed: 6s, execs: 507713 (105131/sec), new interesting: 5 (total: 1053)
fuzz: elapsed: 9s, execs: 835477 (109372/sec), new interesting: 12 (total: 1060)
(...)
fuzz: elapsed: 1h6m12s, execs: 89709287 (22797/sec), new interesting: 327 (total: 1375)
fuzz: elapsed: 1h6m15s, execs: 89760448 (17057/sec), new interesting: 327 (total: 1375)
fuzz: elapsed: 1h6m18s, execs: 89824044 (21190/sec), new interesting: 328 (total: 1376)
^C
fuzz: elapsed: 1h6m21s, execs: 89891344 (22440/sec), new interesting: 328 (total: 1376)
fuzz: elapsed: 1h6m21s, execs: 89891344 (0/sec), new interesting: 328 (total: 1376)
PASS
ok go.yaml.in/yaml/v3 3985.504s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we error, we can't really merge this.
Fuzzing only happens when you explicitly run it with go test -fuzz <testname>
, otherwise they act like normal tests: https://go.dev/doc/security/fuzz/#running-fuzz-tests
Fuzz tests are run much like a unit test by default. Each seed corpus entry will be tested against the fuzz target, reporting any failures before exiting.
To enable fuzzing, run go test with the -fuzz flag, providing a regex matching a single fuzz test.
This looks fine to merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me apart from the stray test at the bottom :)
We also should run the fuzzer (for a limited time, like 30s or less) in CI as a regression test. Can I submit this also in here or do you prefer a separate PR? |
Add fuzzing for roundtripping by using JSON documents as input.
4a5d6c3
to
fc01637
Compare
I don't think it's a good idea to run the fuzzer on PRs, since it's not a deterministic regression test. Each run generates new random data, and occasionally it will hit a case we don't handle properly. But it's not ideal to block an unrelated PR (like one updating copyright comments) just because the fuzzer stumbled on a new edge case 🙂 Also, the test cases from the corpus, as well as anything committed to |
That's exactly the point of fuzzing. The point of fuzzing in CI is to catch regressions early. I don't have the motivation to extend this PR to cover all of JSON with static tests.
But there are no such cases at the moment. The point is to catch future cases due to regressions.
If this really happens it will still be time to disable fuzzing. |
I think you may be conflating two things. Fuzzing uses new random data to find previously unknown issues. Regressions, on the other hand, are cases where something that used to work stops working due to a change. This MR helps with the former, but not the latter.
You don't have to worry, that wasn't being asked of you. I've already added tests for all the cases I fixed. Any new issue that gets discovered will be covered with a test when it's addressed.
There are such cases at the moment , you just need to let the fuzzer run long enough and it will eventually hit something we don't currently handle correctly, whether it's a known open issue or something new. These are not blockers for unrelated PRs. |
Here is an article with cases that I plan to investigate: https://john-millikin.com/json-is-not-a-yaml-subset |
I'm not even sure what "cases" that article is asserting wrt JSON not being a subset. I wrote https://yamlscript.org/blog/2025-07-29/is-json-really-a-subset-of-yaml/ last Tuesday. A couple years ago there was a ycombinator thread where people asserted ~5 things that made YAML not a superset of JSON. Our core team went through each one and found that the 1.2 spec held up in that regard. I can't find the thread but I'll keep looking... |
@perlpunk found it for me: https://news.ycombinator.com/item?id=30052128 To be clear, any (correctly implemented) YAML 1.2 loader using the YAML 1.2 core schema should load any JSON correctly. Would you agree with that statement, @perlpunk ? |
Yes, according to my knowledge that's true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍, some minor feedbacks anyway
t.Skip("not valid JSON") | ||
} | ||
|
||
t.Logf("JSON %q", s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This maybe
t.Skip("not valid JSON") | |
} | |
t.Logf("JSON %q", s) | |
t.Skipf("not valid JSON %q", s) | |
} | |
t.Logf("JSON %q", s) |
/* | ||
// Handling of number is different, so we can't have universal exact matching | ||
if !reflect.DeepEqual(v2, v) { | ||
t.Errorf("mismatch:\n- got: %#v\n- expected: %#v", v2, v) | ||
} | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm always suspicious when I see commented out code.
What is the need here?
Can't it be simply removed? Is it a left behind debug or something that was planned but that was abandoned?
f.Add(`{}`) | ||
f.Add(`[]`) | ||
f.Add(`[[]]`) | ||
f.Add(`{"a":[]}`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this also
f.Add(`{"a":[]}`) | |
f.Add(`{"a":{}}`) | |
f.Add(`{"a":[]}`) |
//go:build go1.18 | ||
// +build go1.18 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please remove the go version from here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we now have "go 1.18" in go.mod this build guard is redundant.
So 👍
"encoding/json" | ||
"testing" | ||
|
||
"go.yaml.in/yaml/v3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"go.yaml.in/yaml/v3" | |
"go.yaml.in/yaml/v4" |
@carloslima let's review this soon. |
Add fuzzing for roundtripping by using JSON documents as input.
The principle: YAML is a superset of JSON, so any data structure serialisable as a JSON document should be serializable with a YAML serializer and that document should give back the original structure after deserialisation. The fuzzer uses JSON documents as input.
This fuzzer detects issues such as go-yaml/yaml#1004.
Note: this is a port of go-yaml/yaml#1024 (I'm the author of the code) which I have also submitted as kubernetes-sigs/yaml#110 and goccy/go-yaml#742. So far no project pass the test.