Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-streaming jobs, Beam and Checkpointing #127

Open
prkadalb opened this issue Nov 19, 2019 · 5 comments
Open

Non-streaming jobs, Beam and Checkpointing #127

prkadalb opened this issue Nov 19, 2019 · 5 comments

Comments

@prkadalb
Copy link

Hello,

I'm trying to use the flinkk8soperator with Beam (with Flink being the runner).

The operator is able to launch the Job Manager and Task manager pods and can submit the job as well. It works fine for streaming applications.

However, when I try to run a batch application, it turns out that Beam does not enable checkpointing in Flink.

The k8s operator, however, assumes that checkpointing is turned on, and throws an error as the checkpoint API returns a HTTP 404.

checkpoints, err := f.flinkClient.GetCheckpointCounts(ctx, getURLFromApp(app, hash), app.Status.JobStatus.JobID)

https://github.com/apache/flink/blob/7aafb248770070f0fc1bb2bd49d7bbffbb873699/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/checkpoints/CheckpointingStatisticsHandler.java#L94

https://github.com/apache/beam/blob/7b3a3fa6c9291692b56dbc358dfc075724b993b6/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java#L223

Is it possible to let the operator know somehow that checkpoints are not enabled, and that a 404 error on the checkpoint API is not fatal?

Thanks!

@anandswaminathan
Copy link
Contributor

cc @tweise

@tweise
Copy link
Contributor

tweise commented Nov 22, 2019

@anandswaminathan we should support jobs that don't enable checkpointing. There are also streaming use cases where it makes sense to not enable checkpointing.

@anandswaminathan
Copy link
Contributor

@tweise @prkadalb

We can definitely find a way to indicate that. Also I believe there is a small bug with respect to deletion of Finished jobs as well.

What do you think is the best way for the operator to identify that - a job is batch job and that checkpointing is disabled. Also if you have ideas - feel free to submit a PR. @mwylde and I would be happy to review.

@tweise
Copy link
Contributor

tweise commented Dec 7, 2019

#138

@tweise
Copy link
Contributor

tweise commented Mar 21, 2020

Is this still an issue? We are using the operator with a Beam streaming job that does not have checkpointing enabled and also recently added the option to skip savepoint during upgrade: #184

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants