Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query caching + ByteString pinning = heap fragmentation #114

Open
ulidtko opened this issue May 17, 2023 · 2 comments
Open

Query caching + ByteString pinning = heap fragmentation #114

ulidtko opened this issue May 17, 2023 · 2 comments

Comments

@ulidtko
Copy link

ulidtko commented May 17, 2023

Hey @phadej 👋

It appears there's a significant issue with newtype Query = Query ByteString. Namely, heap fragmentation.

See haskell/bytestring#193 — TL;DR: the strict ByteString uses ForeignPtr as the underlying type (and that's unlikely to change anytime soon). And it means that all long-lived ByteStrings are pinned in memory — GC is not allowed to move them, which affects heap compaction.

In some usage patterns, the heap will expand and turn into a metaphorical colander or sieve: with pinned allocations sprinkled around with wide unused gaps in-between. In a case I've got at hand currently, a DB-heavy test scenario causes large increase in heap blocks (pages) with <10% utilization. Following @mpickering's work, I tend to agree it's a symptom of fragmentation. Using ghc-debug, I've found pinned BS constructors in lowest-utilization heap blocks; among the worst examples, 6-byte COMMIT pinning a whole 4096-byte page.

No need to reach far — Yesod's persistent-postgresql performs query caching which permanently¹ saves prepared statements in a Map Text Statement in memory. The Text key is good — unlike ByteString, it's not pinned — what's bad is this line where persistent-postgresql is forced to pre-encode the Text query to a ByteString so that Query of this package can be constructed. That Statement closure then goes into the StatementCache, and the original SQL query is "forever" retained twice:

  1. as a Text in the map key,
  2. as a pinned ByteString in Query argument in Statement closure environment.
    The latter is really bad.

What are your thoughts about changing to newtype Query = Query ShortByteString ? Docs of ShortByteString explain it's not contributing to heap fragmentation, like ByteString does.

Ideally though, newtype Query = Query Text would be best IMO — enabling sharing of the same Text form of query which StatementCache keeps already anyway. Any reasons to not go with Text I'm missing?


¹ until the cache gets explicitly cleared, which can be done with Internal APIs there.

@phadej
Copy link
Collaborator

phadej commented May 17, 2023

There shouldn't be growing amount of long lived Query objects in normal programs. What are you doing?

@ulidtko
Copy link
Author

ulidtko commented May 17, 2023

I'm testing a conduit-streaming content from ~thousands of Large Objects per request in a Yesod webapp. Repro not easy to minimize.

What's observed is RSS (OS memory) growth — while "heap live bytes" metric is flat.

I agree — there's only a small amount of queries in the StatementCache, it's a small contribution but definitely part of the problem. There're other pieces which contribute too; I just have to start somewhere.

Edit: @phadej this app has hundreds different queries. I'm certainly not generating them dynamically. My targeted repro test raises StatementCache from 11 entries to 15 and it stays constant at 15 as expected on further iterations. Still, the pinned queries do retain memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants