-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Spark Source interface #13661
base: develop
Are you sure you want to change the base?
Added Spark Source interface #13661
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't call it from platform here, do you?
* | ||
* @param context {@link SparkExecutionPluginContext} for this job | ||
*/ | ||
public abstract JavaRDD<OUT> run(SparkExecutionPluginContext context) throws Exception; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a different name? E.g. read
or create
? Also javadoc needs to be corrected as reading do not really execute any jobs nor persist data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, let me fix the javadoc and rename this method
ca8a496
to
6415537
Compare
private static final long serialVersionUID = 4829903051232692690L; | ||
|
||
/** | ||
* User Spark job which will be executed and is responsible for generating a new RDD for the resf of the pipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: resf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'job' is a misleading term here since this method won't normally trigger a physical Spark job unless somebody implements it by calling some Spark actions. Think it's better to leave out the first part and just say 'Responsible for ...'
|
||
public static final String PLUGIN_TYPE = "sparksource"; | ||
|
||
private static final long serialVersionUID = 4829903051232692690L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need a serialVersionUID on an abstract class?
This interface will be used to create Spark RDDs and RDDCollection entities from Spark programs.
This will be useful when integrating the Spark BigQuery Connector for BQ Pushdown execution https://github.com/GoogleCloudDataproc/spark-bigquery-connector