Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Spark Source interface #13661

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open

Conversation

fernst
Copy link
Member

@fernst fernst commented Sep 1, 2021

This interface will be used to create Spark RDDs and RDDCollection entities from Spark programs.

This will be useful when integrating the Spark BigQuery Connector for BQ Pushdown execution https://github.com/GoogleCloudDataproc/spark-bigquery-connector

@fernst fernst requested review from tivv and albertshau September 1, 2021 20:46
@gitpod-io
Copy link

gitpod-io bot commented Sep 1, 2021

@google-cla google-cla bot added the cla: yes label Sep 1, 2021
@fernst fernst added the build Triggers github actions build label Sep 1, 2021
Copy link
Contributor

@tivv tivv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't call it from platform here, do you?

*
* @param context {@link SparkExecutionPluginContext} for this job
*/
public abstract JavaRDD<OUT> run(SparkExecutionPluginContext context) throws Exception;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a different name? E.g. read or create? Also javadoc needs to be corrected as reading do not really execute any jobs nor persist data

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, let me fix the javadoc and rename this method

@fernst fernst force-pushed the create-spark-source-interface branch from ca8a496 to 6415537 Compare September 2, 2021 16:34
private static final long serialVersionUID = 4829903051232692690L;

/**
* User Spark job which will be executed and is responsible for generating a new RDD for the resf of the pipeline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: resf

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'job' is a misleading term here since this method won't normally trigger a physical Spark job unless somebody implements it by calling some Spark actions. Think it's better to leave out the first part and just say 'Responsible for ...'


public static final String PLUGIN_TYPE = "sparksource";

private static final long serialVersionUID = 4829903051232692690L;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need a serialVersionUID on an abstract class?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Triggers github actions build cla: yes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants