Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Submodules Fuzz Target #1919

Merged
merged 1 commit into from
May 30, 2024

Conversation

DaveLak
Copy link
Contributor

@DaveLak DaveLak commented May 30, 2024

Fuzz Introspector heuristics suggest the Submodule API code represent "optimal analysis targets" that should yield a meaningful increase in code coverage. The changes here introduce a first pass at implementing a fuzz harness that cover the primary APIs/methods related to Submodules. Of particular interest to me is the Submodule.config_writer() coverage.

Please note however, there is likely plenty of room for improvement in this harness in terms of both code coverage as well as performance; the latter of which will see significant benefit from a well curated seed corpus of .gitmodules file like inputs. The ParsingError raised by the fuzzer without a good seed corpus hinders test efficacy significantly.

I have a draft PR up with a seed corpus here: gitpython-developers/qa-assets#5

Fuzz Introspector heuristics suggest the Submodule API code represent
"optimal analysis targets" that should yield a meaningful increase in
code coverage. The changes here introduce a first pass at implementing a
fuzz harness that cover the primary APIs/methods related to Submodules.
Of particular interest to me is the `Submodule.config_writer()`
coverage.

Please note however, there is likely plenty of room for improvement in
this harness in terms of both code coverage as well as performance; the
latter of which will see significant benefit from a well curated seed
corpus of `.gitmodules` file like inputs. The `ParsingError` raised by
the fuzzer without a good seed corpus hinders test efficacy
significantly.
Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for taking such great care!

Part of me thinks that the submodule implementation is so riddled with inaccuracies and and incorrectness that fuzzing it seems like a waste. The fuzzer can only try to find unexpected exceptions, and maybe that's a small win, but at what cost?

Part of that feeling also stems for the incredible sluggishness of Python in general, so any fuzzing feels wasteful. But that's besides the point I suppose, apologies for the ramblings.

@Byron Byron merged commit 2345c1a into gitpython-developers:main May 30, 2024
26 checks passed
@DaveLak DaveLak deleted the improve-fuzzer-coverage branch May 30, 2024 14:33
@DaveLak
Copy link
Contributor Author

DaveLak commented May 30, 2024

Please don't apologize, and definitely do not hesitate to reject or push back on any of my PRs! (especially considering that my last few PRs came out of the blue without prior discussion about whether they're even wanted -- sorry about that 😅)

Part of me thinks that the submodule implementation is so riddled with inaccuracies and and incorrectness that fuzzing it seems like a waste. The fuzzer can only try to find unexpected exceptions, and maybe that's a small win, but at what cost?
Part of that feeling also stems for the incredible sluggishness of Python in general, so any fuzzing feels wasteful.

I think your points are perfectly reasonable. Here is how I've been thinking of the value in fuzzing GitPython:

  • Hopelessly broken or not, GitPython is widely used, so I tend to believe (perhaps too optimistically) that even small wins which result in improved stability can have an outsized impact for some n of users over time. Doing the wrong thing right is better than doing it wrong + unexpected behavior someone will eventually need to debug. That said, I fully respect and understand if you don't think the value justifies the CPU cycles, maintenance burden, or any other reason for that matter.

  • Similar to the above, I believe continuous fuzzing is well-positioned to identify regressions that traditional unit tests may miss. Even without a CI action run on PRs, the accumulated corpus in ClusterFuzz should be able to identify unexpected exceptions reasonably quickly after they're introduced. But, of course, that is just my hypothesis, which has yet to be validated.

  • Finally, perhaps from a somewhat naive perspective, I believe that incorporating well-documented and effective fuzzing into GitPython has the potential to benefit the wider Python community. Now, I fully recognize how presumptuous that sounds but here me out. Historically, fuzzing has been used by security experts on lower-level languages with great success, but a lack of easy to use tooling, terse documentation, and few quality examples to emulate has hindered wider adoption in higher-level languages like Python. So I think integrating fuzz testing into a widely-used and complex Python project like GitPython, paired with some quality documentation, can lower the barrier for entry for folks that may be seeking to learn more or just happen to stumble across it (like I did lol.)

But that's besides the point I suppose, apologies for the ramblings.

I think everything you said is very much on-point regarding any of the fuzzing work in this repo. Moreover, I really appreciate hearing your thoughts, so thanks!

In case it isn't clear, I won't be offended if you feel the juice isn't worth the squeeze, and would rather me hold off on any non-maintenance type fuzzing work. Frankly, if you decided you'd rather it all removed ASAP, I'd help remove it. I've learned a lot about Git, Python, fuzzing, and more working on these, so I wouldn't consider it a wasted effort even if the changes never made it to PR, So thanks, @Byron, for the support along the way! 🙂


And now, it's my turn to apologize for the ramblings 😅

@Byron
Copy link
Member

Byron commented May 31, 2024

(especially considering that my last few PRs came out of the blue without prior discussion about whether they're even wanted -- sorry about that 😅)

That's perfectly alright - no need to make it more complicated, just do what you think is right, you are driving this.

  • That said, I fully respect and understand if you don't think the value justifies the CPU cycles, maintenance burden, or any other reason for that matter.

I am also clearly biased and think that everybody should use gitoxide, so it helps to see your reasoning to unbias me a bit. After all, GitPython has it's value and given its usage, maybe it's probably a good idea to invest in any measure that can make it a little better, maybe particularly due to its many flaws.

  • Finally, perhaps from a somewhat naive perspective, I believe that incorporating well-documented and effective fuzzing into GitPython has the potential to benefit the wider Python community.

Spreading fuzzing as a technique through GitPython is a great thought and I am fully behind that - if nothing else comes out of it, more Python projects might adopt it which could be a net-win. And even if not, people learn how to use a fuzzer which will help in any programming environment eventually.

In case it isn't clear, I won't be offended if you feel the juice isn't worth the squeeze, and would rather me hold off on any non-maintenance type fuzzing work. Frankly, if you decided you'd rather it all removed ASAP, I'd help remove it. I've learned a lot about Git, Python, fuzzing, and more working on these, so I wouldn't consider it a wasted effort even if the changes never made it to PR, So thanks, @Byron, for the support along the way! 🙂

I'd never do that, and don't feel that way at all. But I do admit that I'd love to see you eventually move to gitoxide - you do tremendous work here and even if it's just for spreading good fuzzing setups, gitoxide could certainly be a good vessel for that.

Eventually.

No pressure :D.

PS: There I'd definitely have more opinions on what to fuzz as well, which might make it more interesting for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants