Skip to content

Conversation

dolmen
Copy link

@dolmen dolmen commented Jun 7, 2025

Add fuzzing for roundtripping by using JSON documents as input.

The principle: YAML is a superset of JSON, so any data structure serialisable as a JSON document should be serializable with a YAML serializer and that document should give back the original structure after deserialisation. The fuzzer uses JSON documents as input.

This fuzzer detects issues such as go-yaml/yaml#1004.

Note: this is a port of go-yaml/yaml#1024 (I'm the author of the code) which I have also submitted as kubernetes-sigs/yaml#110 and goccy/go-yaml#742. So far no project pass the test.

$ go test -fuzz FuzzEncodeFromJSON
OK: 50 passed
fuzz: elapsed: 0s, gathering baseline coverage: 0/9 completed
fuzz: elapsed: 0s, gathering baseline coverage: 9/9 completed, now fuzzing with 8 workers
fuzz: elapsed: 0s, execs: 4829 (25353/sec), new interesting: 34 (total: 43)
--- FAIL: FuzzEncodeFromJSON (0.20s)
    --- FAIL: FuzzEncodeFromJSON (0.00s)
        fuzz_test.go:33: JSON "-0"
        fuzz_test.go:34: Go   %!q(float64=-0) <-0x0p+00>
        fuzz_test.go:41: YAML "-0\n" <2d300a>
        fuzz_test.go:49: Go   '\x00' <0>
        fuzz_test.go:62: YAML "0\n" <300a>
        fuzz_test.go:65: Marshal->Unmarshal->Marshal mismatch:
            - expected: "-0\n"
            - got:      "0\n"
    
    Failing input written to testdata/fuzz/FuzzEncodeFromJSON/c64b69bf2c432100
    To re-run:
    go test -run=FuzzEncodeFromJSON/c64b69bf2c432100
FAIL
exit status 1
FAIL	go.yaml.in/yaml/v3	4.168s

t.Logf("YAML %q <%[1]x>", b2)

if !bytes.Equal(b, b2) {
t.Errorf("Marshal->Unmarshal->Marshal mismatch:\n- expected: %q\n- got: %q", b, b2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dolmen , what do you think about changing this to t.Logf, so that we can merge the PR to main and always see where the problems are?

If we error, we can't really merge this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it stops on the first error. Changing to Logf lets us see all the issues and fix them one at a time (and add a regression test for each fix).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #45 to address the -0 case failure in the description of this issue.

With that fix I was able to run a 1m fuzzing test successfully:

> go test -fuzz FuzzEncodeFromJSON -fuzztime=1m

OK: 50 passed
fuzz: elapsed: 0s, gathering baseline coverage: 0/9 completed
fuzz: elapsed: 0s, gathering baseline coverage: 9/9 completed, now fuzzing with 20 workers
fuzz: elapsed: 3s, execs: 313723 (104563/sec), new interesting: 316 (total: 325)
fuzz: elapsed: 6s, execs: 526587 (70945/sec), new interesting: 411 (total: 420)
fuzz: elapsed: 9s, execs: 576090 (16504/sec), new interesting: 421 (total: 430)
fuzz: elapsed: 12s, execs: 584466 (2792/sec), new interesting: 426 (total: 435)
fuzz: elapsed: 15s, execs: 589110 (1548/sec), new interesting: 433 (total: 442)
fuzz: elapsed: 18s, execs: 592742 (1211/sec), new interesting: 438 (total: 447)
fuzz: elapsed: 21s, execs: 595062 (773/sec), new interesting: 441 (total: 450)
fuzz: elapsed: 24s, execs: 601436 (2124/sec), new interesting: 443 (total: 452)
fuzz: elapsed: 27s, execs: 604790 (1118/sec), new interesting: 449 (total: 458)
fuzz: elapsed: 30s, execs: 606250 (487/sec), new interesting: 450 (total: 459)
fuzz: elapsed: 33s, execs: 609672 (1141/sec), new interesting: 451 (total: 460)
fuzz: elapsed: 36s, execs: 612397 (908/sec), new interesting: 453 (total: 462)
fuzz: elapsed: 39s, execs: 613778 (460/sec), new interesting: 455 (total: 464)
fuzz: elapsed: 42s, execs: 616508 (910/sec), new interesting: 456 (total: 465)
fuzz: elapsed: 45s, execs: 619459 (984/sec), new interesting: 458 (total: 467)
fuzz: elapsed: 48s, execs: 624442 (1661/sec), new interesting: 459 (total: 468)
fuzz: elapsed: 51s, execs: 628547 (1368/sec), new interesting: 459 (total: 468)
fuzz: elapsed: 54s, execs: 632180 (1212/sec), new interesting: 459 (total: 468)
fuzz: elapsed: 57s, execs: 636113 (1311/sec), new interesting: 459 (total: 468)
fuzz: elapsed: 1m0s, execs: 671082 (11613/sec), new interesting: 460 (total: 469)
fuzz: elapsed: 1m1s, execs: 671082 (0/sec), new interesting: 460 (total: 469)
PASS
ok  	go.yaml.in/yaml/v3	65.454s

Copy link
Contributor

@carloslima carloslima Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #46 to address another issue highlighted by the fuzzing test.

With both fixes applied:

~/c/go-yaml-2 (merge-neg-zero-tabs)> go test -fuzz FuzzEncodeFromJSON
OK: 50 passed
fuzz: elapsed: 0s, gathering baseline coverage: 0/1048 completed
fuzz: elapsed: 1s, gathering baseline coverage: 1048/1048 completed, now fuzzing with 20 workers
fuzz: elapsed: 3s, execs: 191976 (63988/sec), new interesting: 1 (total: 1049)
fuzz: elapsed: 6s, execs: 507713 (105131/sec), new interesting: 5 (total: 1053)
fuzz: elapsed: 9s, execs: 835477 (109372/sec), new interesting: 12 (total: 1060)
(...)
fuzz: elapsed: 1h6m12s, execs: 89709287 (22797/sec), new interesting: 327 (total: 1375)
fuzz: elapsed: 1h6m15s, execs: 89760448 (17057/sec), new interesting: 327 (total: 1375)
fuzz: elapsed: 1h6m18s, execs: 89824044 (21190/sec), new interesting: 328 (total: 1376)
^C
fuzz: elapsed: 1h6m21s, execs: 89891344 (22440/sec), new interesting: 328 (total: 1376)
fuzz: elapsed: 1h6m21s, execs: 89891344 (0/sec), new interesting: 328 (total: 1376)
PASS
ok  	go.yaml.in/yaml/v3	3985.504s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we error, we can't really merge this.

Fuzzing only happens when you explicitly run it with go test -fuzz <testname>, otherwise they act like normal tests: https://go.dev/doc/security/fuzz/#running-fuzz-tests

Fuzz tests are run much like a unit test by default. Each seed corpus entry will be tested against the fuzz target, reporting any failures before exiting.
To enable fuzzing, run go test with the -fuzz flag, providing a regex matching a single fuzz test.

This looks fine to merge.

Copy link
Contributor

@carloslima carloslima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me apart from the stray test at the bottom :)

@dolmen
Copy link
Author

dolmen commented Jul 3, 2025

I will remove TestEncodeString that was helpful for understanding the fuzz failure, but isn't needed thanks to #46.

I'll also add the cases from #45 and #46 to the corpus.

@dolmen
Copy link
Author

dolmen commented Jul 3, 2025

We also should run the fuzzer (for a limited time, like 30s or less) in CI as a regression test. Can I submit this also in here or do you prefer a separate PR?

Add fuzzing for roundtripping by using JSON documents as input.
@dolmen dolmen force-pushed the add-FuzzEncodeFromJSON-go.yaml.in branch from 4a5d6c3 to fc01637 Compare July 3, 2025 07:11
@dolmen dolmen requested review from carloslima and ingydotnet July 3, 2025 07:13
@carloslima
Copy link
Contributor

We also should run the fuzzer (for a limited time, like 30s or less) in CI as a regression test. Can I submit this also in here or do you prefer a separate PR?

I don't think it's a good idea to run the fuzzer on PRs, since it's not a deterministic regression test. Each run generates new random data, and occasionally it will hit a case we don't handle properly. But it's not ideal to block an unrelated PR (like one updating copyright comments) just because the fuzzer stumbled on a new edge case 🙂

Also, the test cases from the corpus, as well as anything committed to testdata/fuzz/FuzzEncodeFromJSON/, are executed during every test run as part of the regular test suite. So any regressions there would still be caught without needing to run the fuzzer itself.

@dolmen
Copy link
Author

dolmen commented Aug 1, 2025

I don't think it's a good idea to run the fuzzer on PRs, since it's not a deterministic regression test.

That's exactly the point of fuzzing. The point of fuzzing in CI is to catch regressions early. I don't have the motivation to extend this PR to cover all of JSON with static tests.

Each run generates new random data, and occasionally it will hit a case we don't handle properly.

But there are no such cases at the moment. The point is to catch future cases due to regressions.

But it's not ideal to block an unrelated PR (like one updating copyright comments) just because the fuzzer stumbled on a new edge case 🙂

If this really happens it will still be time to disable fuzzing.

@carloslima
Copy link
Contributor

I don't think it's a good idea to run the fuzzer on PRs, since it's not a deterministic regression test.

That's exactly the point of fuzzing. The point of fuzzing in CI is to catch regressions early.

I think you may be conflating two things. Fuzzing uses new random data to find previously unknown issues. Regressions, on the other hand, are cases where something that used to work stops working due to a change. This MR helps with the former, but not the latter.

I don't have the motivation to extend this PR to cover all of JSON with static tests.

You don't have to worry, that wasn't being asked of you. I've already added tests for all the cases I fixed. Any new issue that gets discovered will be covered with a test when it's addressed.

Each run generates new random data, and occasionally it will hit a case we don't handle properly.

But there are no such cases at the moment. The point is to catch future cases due to regressions.

There are such cases at the moment , you just need to let the fuzzer run long enough and it will eventually hit something we don't currently handle correctly, whether it's a known open issue or something new. These are not blockers for unrelated PRs.

@dolmen
Copy link
Author

dolmen commented Aug 1, 2025

Here is an article with cases that I plan to investigate: https://john-millikin.com/json-is-not-a-yaml-subset

@ingydotnet
Copy link
Member

Here is an article with cases that I plan to investigate: https://john-millikin.com/json-is-not-a-yaml-subset

I'm not even sure what "cases" that article is asserting wrt JSON not being a subset.

I wrote https://yamlscript.org/blog/2025-07-29/is-json-really-a-subset-of-yaml/ last Tuesday.

A couple years ago there was a ycombinator thread where people asserted ~5 things that made YAML not a superset of JSON. Our core team went through each one and found that the 1.2 spec held up in that regard. I can't find the thread but I'll keep looking...

@ingydotnet
Copy link
Member

@perlpunk found it for me: https://news.ycombinator.com/item?id=30052128

To be clear, any (correctly implemented) YAML 1.2 loader using the YAML 1.2 core schema should load any JSON correctly.
That's what we mean by superset/subset.

Would you agree with that statement, @perlpunk ?

@perlpunk
Copy link
Member

perlpunk commented Aug 2, 2025

To be clear, any (correctly implemented) YAML 1.2 loader using the YAML 1.2 core schema should load any JSON correctly. That's what we mean by superset/subset.

Would you agree with that statement, @perlpunk ?

Yes, according to my knowledge that's true

Copy link
Contributor

@ccoVeille ccoVeille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍, some minor feedbacks anyway

Comment on lines +34 to +37
t.Skip("not valid JSON")
}

t.Logf("JSON %q", s)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This maybe

Suggested change
t.Skip("not valid JSON")
}
t.Logf("JSON %q", s)
t.Skipf("not valid JSON %q", s)
}
t.Logf("JSON %q", s)

Comment on lines +55 to +60
/*
// Handling of number is different, so we can't have universal exact matching
if !reflect.DeepEqual(v2, v) {
t.Errorf("mismatch:\n- got: %#v\n- expected: %#v", v2, v)
}
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm always suspicious when I see commented out code.

What is the need here?

Can't it be simply removed? Is it a left behind debug or something that was planned but that was abandoned?

f.Add(`{}`)
f.Add(`[]`)
f.Add(`[[]]`)
f.Add(`{"a":[]}`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this also

Suggested change
f.Add(`{"a":[]}`)
f.Add(`{"a":{}}`)
f.Add(`{"a":[]}`)

Comment on lines +1 to +2
//go:build go1.18
// +build go1.18
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please remove the go version from here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we now have "go 1.18" in go.mod this build guard is redundant.
So 👍

"encoding/json"
"testing"

"go.yaml.in/yaml/v3"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"go.yaml.in/yaml/v3"
"go.yaml.in/yaml/v4"

@ingydotnet
Copy link
Member

@carloslima let's review this soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants