Replies: 7 comments 5 replies
-
I think we are missing
|
Beta Was this translation helpful? Give feedback.
-
Thread on Readers Cache |
Beta Was this translation helpful? Give feedback.
-
Thread on Pandaproxy (REST & Schema Registry): General:
Schema Registry
|
Beta Was this translation helpful? Give feedback.
-
Thread on vanilla coproc (w/o v8) The issue I am most concerned about would be what happens when a user pushes many scripts, to cut back on memory usage there is a cache of ntps to context info which are shared across scripts. This would be the main culprit for any memory usage oddities within coproc. A second would be possibly degraded performance in the case there are many scripts, since each script has its own run loop, I would begin to be concerned if we would possibly be hurting performance in other areas of the system by holding up the reactor. Maybe we could use priorities here to solve this. |
Beta Was this translation helpful? Give feedback.
-
Thread on archival. There're two parts of the problem. Transient memory allocation and long term. Long term memory used by archival is mostly manifests. We're storing a manifest per partition on a leader shard. It uses Transient memory allocations are mostly used by uploads. We have create a buffered output stream for uploads. Also, the manifests are linearized when they're parsed. If the manifest will grow big it will cause OOM eventually. To mitigate this I planned to split large manifests into parts which could be updated/parsed individually. |
Beta Was this translation helpful? Give feedback.
-
Maybe we can create a replacements for vector and unordered_map that never allocate more than configured number of contiguous bytes. What do you think? |
Beta Was this translation helpful? Give feedback.
-
We are missing an |
Beta Was this translation helpful? Give feedback.
-
Hello @vectorizedio/core
Without core dumps or reproducers, the OOM events we have been seeing are pretty tough to diagnose. We’re working on improving both approaches, but I tried many hours today to get a customer workload to OOM without any luck. These OOMs were once easy to diagnose because the culprits were egregious violators, but now we are searching in the tail of optimization.
So, it would be helpful for everyone to think about sub-systems you are working on and what kinds of memory usage might accumulate. Total size is important, but in core we also care about cases where large contiguous regions are allocated. By large I don’t mean MBs, I mean KBs (for example, there may be plenty of free memory, but not enough for your std::vector of a few hundred integers).
Segment index
Every segment on in a cluster has an associated index structure consisting of 3 vectors of integers ([]int32, []int32, []int64). A 1 GB segment will have roughly 23,000 entries which is about 2 80K allocations and a 150K allocation. These allocations fit into the large category.
The ratio of segment data to index size is approximately 3000:1
One customer has a cluster with approximately 13,000 active segments. They also have around 4 TB of data on disk. This works out to roughly 250 MB per segment on average. Even scaling down by a quarter, 80K/150K become 20K/37K. These are still large, and there are about 13K of them.
Segment reader
This doesn't appear to hold onto any significant resources. It's an open file descriptor and some metadata.
O(connections) overhead
We've had a report of a workload with 1000 nodes * 5 producers/node, so O(5000) producer connections on a 3 node cluster with 96 cores.
Readers cache
Holds log readers open until they become invalid or they become inactive for 30 seconds. Log reader has a lot of stuff going on, but probably its most significant allocation would be the buffer in its active
seastar::input_stream
. AFAIK this might be 128kb?@mmaslankaprv are there scenarios where the total number of cached log readers might become large? It doesn't seem like there are any hard bounds in place.
Fetch session cache
@mmaslankaprv anything to think about here?
Chunk cache
This is not hooked up to the reclaimer, but no matter how hard I try I can't get it to use much memory.
Raft
@mmaslankaprv what are the scenarios we need to be concerned about here? where does data get batched up, and where might it get queued without back pressure being applied?
Foreign memory
Minor optimization, but we probably have a lot of cases: ownership of heap data sent across core should be in a foreign pointer. I noticed today the following areas where it looks like there are lots of non-foreign-owned cross core movements:
Schema registry
@BenPope
@jcsp
HTTP proxy
@BenPope
Coproc
@graphcareful
@VadimPlh
Transactions and idempotence
@rystsov
Archival
@Lazin
Anything else?
Beta Was this translation helpful? Give feedback.
All reactions