Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARKC-577: Removal of Driver Duplicate Classes #1245

Closed
wants to merge 12 commits into from

Conversation

absurdfarce
Copy link

Description

How did the Spark Cassandra Connector Work or Not Work Before this Patch

Connector was using internal serializable reps for keyspace/table metadata because corresponding Java driver classes weren't serializable. This changed in v4.6.0.

General Design of the patch

Replaced internal refs with Java driver types directly wherever possible. Internal classes within the connector were used for several different functions:

  1. Metadata retrieval/access
  2. As a definition/descriptor for future table creation

In some cases (1) above could be handled with direct replacement, but we were also storing multiple layers of metadata within a single object (i.e. some table level information was stored in ColumnDef). Unfortunately the 4.x driver doesn't allow traversing the metadata tree in this way. In order to make this information available without too much breakage to the existing API some of the old classes are preserved with a new role: containers for all metadata types on a "branch" of this tree. Thus TableDef stores keyspace + table metadata, ColumnDef stores keyspace + table + column metadata, etc.

Fixes: SPARKC-577

How Has This Been Tested?

Still a WIP, hasn't been tested meaningfully yet

Checklist:

  • I have a ticket in the OSS JIRA
  • I have performed a self-review of my own code
  • Locally all tests pass (make sure tests fail without your patch)

val maxIndex = maxCol.componentIndex.get
val requiredColumns = tableDef.clusteringColumns.takeWhile(_.componentIndex.get <= maxIndex)
val maxIndex = tableDef.clusteringColumns.indexOf(maxCol)
val requiredColumns = tableDef.clusteringColumns.take(maxIndex + 1)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to highlight this for review. I'm pretty sure the logic I have in there now mirrors what was being done but wanted to make sure this was looked at more closely.

* ColumnSelector modified to work with both TableDef and TableDescriptor
** Need an IT for the TableDef case as it isn't really covered anymore
* DatasetFunctions.createCassandraTable() modified to take table options + per-clustering column ordering val
@absurdfarce
Copy link
Author

Will likely wind up closing this in favor of #1250

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant