-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-37253] Add state size in application status and deployment metrics #941
Conversation
f0ecf40
to
dd22b3d
Compare
...perator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java
Outdated
Show resolved
Hide resolved
@@ -403,6 +403,28 @@ public static Long calculateClusterMemoryUsage(Configuration conf, int taskManag | |||
return tmTotalMemory + jmTotalMemory; | |||
} | |||
|
|||
public static Long calculateClusterStateSize(Configuration conf, int taskManagerReplicas) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be called totalClusterMemorySize instead of state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I don't think this is used anywhere... :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
State size or checkpoint size isn't directly related to the cluster memory size. For the heap memory backend, we would expect the state size to be lower than the overall memory. For RocksDB, it could even exceed the cluster memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok but I still don't get 3 things:
- Where is this used?
- Why do we need this bad approximation if state size metrics are available from Flink?
- This is basically just total memory, why do we call it state size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see your comment in #941 (comment), it was somehow hidden when I replied.
Sorry, this code was unused code. I have removed it.
…rics This adds state size, i.e. the size of the last completed checkpoint, to the deployment status. It also exposes the state size as a deployment metric.
Have you tested this in a (local) kubernetes env with different Flink versions? Does it work as expected? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the manually tested for correctness for the supported Flink versions and e2es pass then good to go
This adds state size, i.e. the size of the last completed checkpoint, to the
deployment status. It also exposes the state size as a deployment metric.