Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue with package details endpoint #1502

Open
p-brito opened this issue Jan 29, 2025 · 5 comments
Open

Performance issue with package details endpoint #1502

p-brito opened this issue Jan 29, 2025 · 5 comments

Comments

@p-brito
Copy link

p-brito commented Jan 29, 2025

While attempting to retrieve details for this package (https://repo.packagist.org/p2/letraceursnork/wordpress-security-advisories~dev.json), I encountered significant performance issues due to the sheer volume of conflict data. The last successful request I managed to execute returned a response exceeding 5 million lines, making the request nearly impossible to complete efficiently. To improve accessibility and performance, I suggest removing conflict information from the main endpoint and introducing a dedicated endpoint specifically for fetching this data.

@stof
Copy link
Contributor

stof commented Jan 31, 2025

Those p2 endpoints are the ones used by Composer to read the metadata it needs to resolve dependencies. Removing the conflict data from them would break things, because it would effectively remove the conflicts from the package definition.

@stof
Copy link
Contributor

stof commented Jan 31, 2025

this package specifically has a huge file here, because it has tons of conflicts while also having 470 branches included in this file (and the composer-specific minification does not help a lot, because those conflict rules are changing between versions, and the branches don't even have name that sort them in a relevant order minimizing back and forth changes)

@p-brito
Copy link
Author

p-brito commented Feb 4, 2025

First of all, thanks for the quick answer @stof. What about having a filter to exclude certain properties from the response?

@stof
Copy link
Contributor

stof commented Feb 4, 2025

@p-brito those p2 files are static files served directly by the webserver. Supporting a filter would require using dynamic generation of metadata, which will be a permanent DoS attack on the server.

Packagist currently knows a bit more than 415K packages, which represents 830K metadata files (each package has one file for branches, for which the cache is invalidated at each commit but that composer does not need to load by default, and one file for tags, for which the cache gets invalidated only when a release is done). And Packagist has lots of traffic to those files.
And among those packages, letraceursnork/wordpress-security-advisories is an extremily pathological case:

  • it has 470 branches
  • each branch has lots of conflicts
  • those branches are named based on UUIDv4, meaning that their sorting is totally random. The custom minification scheme of the Composer metadata files precisely relies on the fact that successive versions of a package often have many requirements in common, and it encodes the difference between versions instead of the full version (except for the first version of the list of course). With a random sorting, this minification scheme hits its worst case, because there is lots of changes between 2 successive versions, which might them change back for the third one, etc...

Looking at the repo, it seems all those UUID-based versions are not even relevant: they correspond to PRs opened by the bot to add new vulnerabilities in the main branch. There is a few ways in which @LeTraceurSnork could improve the situation here:

  1. use a fork to submit those PRs instead of making the bot create the branch in their own repo (this way, Packagist would never see those branches)
  2. delete the branch once the PR is merged (and of course, clean all historical branches corresponding to past PRs)
  3. Use a UUIDv7 for the generated branch name (if they are kept in the main repo, otherwise it does not matter), so that the sorting of branches is less random (but not sure it will help a lot actually, as it seems all the branches have been created at the same time based on the same commit, so they don't build on top of previous merged PRs. It might still help for the future in case the existing state is from a batch import at the start of the automation)

@LeTraceurSnork
Copy link

Aw, snap, I'm sorry. Will try to fix that this week. Thank you for your feedback btw 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants