Add fuzzer: FuzzEncodeFromJSON which signals roundtripping issues #35

dolmen · 2025-06-07T04:21:12Z

Add fuzzing for roundtripping by using JSON documents as input.

The principle: YAML is a superset of JSON, so any data structure serialisable as a JSON document should be serializable with a YAML serializer and that document should give back the original structure after deserialisation. The fuzzer uses JSON documents as input.

This fuzzer detects issues such as go-yaml/yaml#1004.

Note: this is a port of go-yaml/yaml#1024 (I'm the author of the code) which I have also submitted as kubernetes-sigs/yaml#110 and goccy/go-yaml#742. So far no project pass the test.

$ go test -fuzz FuzzEncodeFromJSON
OK: 50 passed
fuzz: elapsed: 0s, gathering baseline coverage: 0/9 completed
fuzz: elapsed: 0s, gathering baseline coverage: 9/9 completed, now fuzzing with 8 workers
fuzz: elapsed: 0s, execs: 4829 (25353/sec), new interesting: 34 (total: 43)
--- FAIL: FuzzEncodeFromJSON (0.20s)
    --- FAIL: FuzzEncodeFromJSON (0.00s)
        fuzz_test.go:33: JSON "-0"
        fuzz_test.go:34: Go   %!q(float64=-0) <-0x0p+00>
        fuzz_test.go:41: YAML "-0\n" <2d300a>
        fuzz_test.go:49: Go   '\x00' <0>
        fuzz_test.go:62: YAML "0\n" <300a>
        fuzz_test.go:65: Marshal->Unmarshal->Marshal mismatch:
            - expected: "-0\n"
            - got:      "0\n"
    
    Failing input written to testdata/fuzz/FuzzEncodeFromJSON/c64b69bf2c432100
    To re-run:
    go test -run=FuzzEncodeFromJSON/c64b69bf2c432100
FAIL
exit status 1
FAIL	go.yaml.in/yaml/v3	4.168s

ingydotnet · 2025-06-25T20:43:25Z

fuzz_test.go

+		t.Logf("YAML %q <%[1]x>", b2)
+
+		if !bytes.Equal(b, b2) {
+			t.Errorf("Marshal->Unmarshal->Marshal mismatch:\n- expected: %q\n- got:      %q", b, b2)


@dolmen , what do you think about changing this to t.Logf, so that we can merge the PR to main and always see where the problems are?

If we error, we can't really merge this.

Right now it stops on the first error. Changing to Logf lets us see all the issues and fix them one at a time (and add a regression test for each fix).

I've opened #45 to address the -0 case failure in the description of this issue.

With that fix I was able to run a 1m fuzzing test successfully:

> go test -fuzz FuzzEncodeFromJSON -fuzztime=1m OK: 50 passed fuzz: elapsed: 0s, gathering baseline coverage: 0/9 completed fuzz: elapsed: 0s, gathering baseline coverage: 9/9 completed, now fuzzing with 20 workers fuzz: elapsed: 3s, execs: 313723 (104563/sec), new interesting: 316 (total: 325) fuzz: elapsed: 6s, execs: 526587 (70945/sec), new interesting: 411 (total: 420) fuzz: elapsed: 9s, execs: 576090 (16504/sec), new interesting: 421 (total: 430) fuzz: elapsed: 12s, execs: 584466 (2792/sec), new interesting: 426 (total: 435) fuzz: elapsed: 15s, execs: 589110 (1548/sec), new interesting: 433 (total: 442) fuzz: elapsed: 18s, execs: 592742 (1211/sec), new interesting: 438 (total: 447) fuzz: elapsed: 21s, execs: 595062 (773/sec), new interesting: 441 (total: 450) fuzz: elapsed: 24s, execs: 601436 (2124/sec), new interesting: 443 (total: 452) fuzz: elapsed: 27s, execs: 604790 (1118/sec), new interesting: 449 (total: 458) fuzz: elapsed: 30s, execs: 606250 (487/sec), new interesting: 450 (total: 459) fuzz: elapsed: 33s, execs: 609672 (1141/sec), new interesting: 451 (total: 460) fuzz: elapsed: 36s, execs: 612397 (908/sec), new interesting: 453 (total: 462) fuzz: elapsed: 39s, execs: 613778 (460/sec), new interesting: 455 (total: 464) fuzz: elapsed: 42s, execs: 616508 (910/sec), new interesting: 456 (total: 465) fuzz: elapsed: 45s, execs: 619459 (984/sec), new interesting: 458 (total: 467) fuzz: elapsed: 48s, execs: 624442 (1661/sec), new interesting: 459 (total: 468) fuzz: elapsed: 51s, execs: 628547 (1368/sec), new interesting: 459 (total: 468) fuzz: elapsed: 54s, execs: 632180 (1212/sec), new interesting: 459 (total: 468) fuzz: elapsed: 57s, execs: 636113 (1311/sec), new interesting: 459 (total: 468) fuzz: elapsed: 1m0s, execs: 671082 (11613/sec), new interesting: 460 (total: 469) fuzz: elapsed: 1m1s, execs: 671082 (0/sec), new interesting: 460 (total: 469) PASS ok go.yaml.in/yaml/v3 65.454s

I've opened #46 to address another issue highlighted by the fuzzing test.

With both fixes applied:

~/c/go-yaml-2 (merge-neg-zero-tabs)> go test -fuzz FuzzEncodeFromJSON OK: 50 passed fuzz: elapsed: 0s, gathering baseline coverage: 0/1048 completed fuzz: elapsed: 1s, gathering baseline coverage: 1048/1048 completed, now fuzzing with 20 workers fuzz: elapsed: 3s, execs: 191976 (63988/sec), new interesting: 1 (total: 1049) fuzz: elapsed: 6s, execs: 507713 (105131/sec), new interesting: 5 (total: 1053) fuzz: elapsed: 9s, execs: 835477 (109372/sec), new interesting: 12 (total: 1060) (...) fuzz: elapsed: 1h6m12s, execs: 89709287 (22797/sec), new interesting: 327 (total: 1375) fuzz: elapsed: 1h6m15s, execs: 89760448 (17057/sec), new interesting: 327 (total: 1375) fuzz: elapsed: 1h6m18s, execs: 89824044 (21190/sec), new interesting: 328 (total: 1376) ^C fuzz: elapsed: 1h6m21s, execs: 89891344 (22440/sec), new interesting: 328 (total: 1376) fuzz: elapsed: 1h6m21s, execs: 89891344 (0/sec), new interesting: 328 (total: 1376) PASS ok go.yaml.in/yaml/v3 3985.504s

If we error, we can't really merge this.

Fuzzing only happens when you explicitly run it with go test -fuzz <testname>, otherwise they act like normal tests: https://go.dev/doc/security/fuzz/#running-fuzz-tests

Fuzz tests are run much like a unit test by default. Each seed corpus entry will be tested against the fuzz target, reporting any failures before exiting.
To enable fuzzing, run go test with the -fuzz flag, providing a regex matching a single fuzz test.

This looks fine to merge.

carloslima

This looks good to me apart from the stray test at the bottom :)

fuzz_test.go

dolmen · 2025-07-03T05:25:05Z

I will remove TestEncodeString that was helpful for understanding the fuzz failure, but isn't needed thanks to #46.

I'll also add the cases from #45 and #46 to the corpus.

dolmen · 2025-07-03T05:27:51Z

We also should run the fuzzer (for a limited time, like 30s or less) in CI as a regression test. Can I submit this also in here or do you prefer a separate PR?

Add fuzzing for roundtripping by using JSON documents as input.

carloslima · 2025-07-03T10:19:09Z

We also should run the fuzzer (for a limited time, like 30s or less) in CI as a regression test. Can I submit this also in here or do you prefer a separate PR?

I don't think it's a good idea to run the fuzzer on PRs, since it's not a deterministic regression test. Each run generates new random data, and occasionally it will hit a case we don't handle properly. But it's not ideal to block an unrelated PR (like one updating copyright comments) just because the fuzzer stumbled on a new edge case 🙂

Also, the test cases from the corpus, as well as anything committed to testdata/fuzz/FuzzEncodeFromJSON/, are executed during every test run as part of the regular test suite. So any regressions there would still be caught without needing to run the fuzzer itself.

dolmen · 2025-08-01T09:45:30Z

I don't think it's a good idea to run the fuzzer on PRs, since it's not a deterministic regression test.

That's exactly the point of fuzzing. The point of fuzzing in CI is to catch regressions early. I don't have the motivation to extend this PR to cover all of JSON with static tests.

Each run generates new random data, and occasionally it will hit a case we don't handle properly.

But there are no such cases at the moment. The point is to catch future cases due to regressions.

But it's not ideal to block an unrelated PR (like one updating copyright comments) just because the fuzzer stumbled on a new edge case 🙂

If this really happens it will still be time to disable fuzzing.

carloslima · 2025-08-01T12:55:00Z

I don't think it's a good idea to run the fuzzer on PRs, since it's not a deterministic regression test.

That's exactly the point of fuzzing. The point of fuzzing in CI is to catch regressions early.

I think you may be conflating two things. Fuzzing uses new random data to find previously unknown issues. Regressions, on the other hand, are cases where something that used to work stops working due to a change. This MR helps with the former, but not the latter.

I don't have the motivation to extend this PR to cover all of JSON with static tests.

You don't have to worry, that wasn't being asked of you. I've already added tests for all the cases I fixed. Any new issue that gets discovered will be covered with a test when it's addressed.

Each run generates new random data, and occasionally it will hit a case we don't handle properly.

But there are no such cases at the moment. The point is to catch future cases due to regressions.

There are such cases at the moment , you just need to let the fuzzer run long enough and it will eventually hit something we don't currently handle correctly, whether it's a known open issue or something new. These are not blockers for unrelated PRs.

dolmen · 2025-08-01T22:53:28Z

Here is an article with cases that I plan to investigate: https://john-millikin.com/json-is-not-a-yaml-subset

ingydotnet · 2025-08-02T16:29:07Z

Here is an article with cases that I plan to investigate: https://john-millikin.com/json-is-not-a-yaml-subset

I'm not even sure what "cases" that article is asserting wrt JSON not being a subset.

I wrote https://yamlscript.org/blog/2025-07-29/is-json-really-a-subset-of-yaml/ last Tuesday.

A couple years ago there was a ycombinator thread where people asserted ~5 things that made YAML not a superset of JSON. Our core team went through each one and found that the 1.2 spec held up in that regard. I can't find the thread but I'll keep looking...

ingydotnet · 2025-08-02T17:23:38Z

@perlpunk found it for me: https://news.ycombinator.com/item?id=30052128

To be clear, any (correctly implemented) YAML 1.2 loader using the YAML 1.2 core schema should load any JSON correctly.
That's what we mean by superset/subset.

Would you agree with that statement, @perlpunk ?

perlpunk · 2025-08-02T18:02:35Z

To be clear, any (correctly implemented) YAML 1.2 loader using the YAML 1.2 core schema should load any JSON correctly. That's what we mean by superset/subset.

Would you agree with that statement, @perlpunk ?

Yes, according to my knowledge that's true

ccoVeille

LGTM 👍, some minor feedbacks anyway

ccoVeille · 2025-08-02T19:27:18Z

fuzz_test.go

+			t.Skip("not valid JSON")
+		}
+
+		t.Logf("JSON %q", s)


This maybe

Suggested change

t.Skip("not valid JSON")

}

t.Logf("JSON %q", s)

t.Skipf("not valid JSON %q", s)

}

t.Logf("JSON %q", s)

ccoVeille · 2025-08-02T19:30:10Z

fuzz_test.go

+		/*
+			// Handling of number is different, so we can't have universal exact matching
+			if !reflect.DeepEqual(v2, v) {
+				t.Errorf("mismatch:\n-      got: %#v\n- expected: %#v", v2, v)
+			}
+		*/


I'm always suspicious when I see commented out code.

What is the need here?

Can't it be simply removed? Is it a left behind debug or something that was planned but that was abandoned?

ccoVeille · 2025-08-02T19:31:29Z

fuzz_test.go

+	f.Add(`{}`)
+	f.Add(`[]`)
+	f.Add(`[[]]`)
+	f.Add(`{"a":[]}`)


Maybe this also

Suggested change

f.Add(`{"a":[]}`)

f.Add(`{"a":{}}`)

f.Add(`{"a":[]}`)

stefanprodan · 2025-08-02T19:39:39Z

fuzz_test.go

+//go:build go1.18
+// +build go1.18


Can you please remove the go version from here

As we now have "go 1.18" in go.mod this build guard is redundant.
So 👍

stefanprodan · 2025-08-02T19:39:48Z

fuzz_test.go

+	"encoding/json"
+	"testing"
+
+	"go.yaml.in/yaml/v3"


Suggested change

"go.yaml.in/yaml/v3"

"go.yaml.in/yaml/v4"

ingydotnet · 2025-08-22T16:25:45Z

@carloslima let's review this soon.

ingydotnet requested changes Jun 25, 2025

View reviewed changes

carloslima requested changes Jul 2, 2025

View reviewed changes

fuzz_test.go Outdated Show resolved Hide resolved

Add fuzzer: FuzzEncodeFromJSON

fc01637

Add fuzzing for roundtripping by using JSON documents as input.

dolmen force-pushed the add-FuzzEncodeFromJSON-go.yaml.in branch from 4a5d6c3 to fc01637 Compare July 3, 2025 07:11

dolmen requested review from carloslima and ingydotnet July 3, 2025 07:13

carloslima approved these changes Jul 3, 2025

View reviewed changes

ingydotnet requested review from dims, scottrigby, stefanprodan and ccoVeille August 2, 2025 16:31

ccoVeille approved these changes Aug 2, 2025

View reviewed changes

stefanprodan reviewed Aug 2, 2025

View reviewed changes

Add fuzzer: FuzzEncodeFromJSON which signals roundtripping issues #35

Are you sure you want to change the base?

Add fuzzer: FuzzEncodeFromJSON which signals roundtripping issues #35

Uh oh!

Conversation

dolmen commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carloslima Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carloslima left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dolmen commented Jul 3, 2025

Uh oh!

dolmen commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carloslima commented Jul 3, 2025

Uh oh!

dolmen commented Aug 1, 2025

Uh oh!

carloslima commented Aug 1, 2025

Uh oh!

dolmen commented Aug 1, 2025

Uh oh!

ingydotnet commented Aug 2, 2025

Uh oh!

ingydotnet commented Aug 2, 2025

Uh oh!

perlpunk commented Aug 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ccoVeille left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ingydotnet commented Aug 22, 2025

Uh oh!

Uh oh!

dolmen commented Jun 7, 2025 •

edited

Loading

carloslima Jun 26, 2025 •

edited

Loading

dolmen commented Jul 3, 2025 •

edited

Loading

perlpunk commented Aug 2, 2025 •

edited

Loading