[FLINK-37253] Add state size in application status and deployment metrics #941

mxm · 2025-02-04T16:04:38Z

This adds state size, i.e. the size of the last completed checkpoint, to the
deployment status. It also exposes the state size as a deployment metric.

...perator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java

gyfora · 2025-02-05T14:06:02Z

...kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/utils/FlinkUtils.java

@@ -403,6 +403,28 @@ public static Long calculateClusterMemoryUsage(Configuration conf, int taskManag
        return tmTotalMemory + jmTotalMemory;
    }

+    public static Long calculateClusterStateSize(Configuration conf, int taskManagerReplicas) {


Shouldn't this be called totalClusterMemorySize instead of state?

Also I don't think this is used anywhere... :)

State size or checkpoint size isn't directly related to the cluster memory size. For the heap memory backend, we would expect the state size to be lower than the overall memory. For RocksDB, it could even exceed the cluster memory.

Ok but I still don't get 3 things:

Where is this used?

Why do we need this bad approximation if state size metrics are available from Flink?

This is basically just total memory, why do we call it state size?

I didn't see your comment in #941 (comment), it was somehow hidden when I replied.

Sorry, this code was unused code. I have removed it.

…rics This adds state size, i.e. the size of the last completed checkpoint, to the deployment status. It also exposes the state size as a deployment metric.

gyfora · 2025-02-05T15:47:23Z

Have you tested this in a (local) kubernetes env with different Flink versions? Does it work as expected?

gyfora

If the manually tested for correctness for the supported Flink versions and e2es pass then good to go

mxm · 2025-02-05T17:27:57Z

Tried it out on a local k8s cluster with various Flink versions:

mxm requested a review from gyfora February 4, 2025 16:04

mxm force-pushed the FLINK-37253 branch 5 times, most recently from f0ecf40 to dd22b3d Compare February 5, 2025 12:04

gyfora reviewed Feb 5, 2025

View reviewed changes

...perator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java Outdated Show resolved Hide resolved

gyfora reviewed Feb 5, 2025

View reviewed changes

mxm added 2 commits February 5, 2025 16:07

[FLINK-37253] Add state size in application status and deployment met…

766fd08

…rics This adds state size, i.e. the size of the last completed checkpoint, to the deployment status. It also exposes the state size as a deployment metric.

Test: Add CheckpointStatistics to increase inner class visibility

df3ee42

mxm force-pushed the FLINK-37253 branch from dd22b3d to df3ee42 Compare February 5, 2025 15:07

Removed unused code

74a1cb1

gyfora approved these changes Feb 5, 2025

View reviewed changes

mxm merged commit b7d6f9d into apache:main Feb 7, 2025
115 of 118 checks passed

mxm deleted the FLINK-37253 branch February 7, 2025 09:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-37253] Add state size in application status and deployment metrics #941

[FLINK-37253] Add state size in application status and deployment metrics #941

mxm commented Feb 4, 2025

gyfora Feb 5, 2025

gyfora Feb 5, 2025

mxm Feb 5, 2025

gyfora Feb 5, 2025

mxm Feb 5, 2025

gyfora commented Feb 5, 2025

gyfora left a comment

mxm commented Feb 5, 2025

[FLINK-37253] Add state size in application status and deployment metrics #941

[FLINK-37253] Add state size in application status and deployment metrics #941

Conversation

mxm commented Feb 4, 2025

gyfora Feb 5, 2025

Choose a reason for hiding this comment

gyfora Feb 5, 2025

Choose a reason for hiding this comment

mxm Feb 5, 2025

Choose a reason for hiding this comment

gyfora Feb 5, 2025

Choose a reason for hiding this comment

mxm Feb 5, 2025

Choose a reason for hiding this comment

gyfora commented Feb 5, 2025

gyfora left a comment

Choose a reason for hiding this comment

mxm commented Feb 5, 2025