Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARKC-504: Save null UDTs #1228

Closed
wants to merge 11 commits into from
5 changes: 4 additions & 1 deletion CHANGES.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
3.0.0-Alpha
3.0-Alpha
* Include all DS Specific Features
* Includes CassandraHiveMetastore
* Structured Streaming Sink Support
Expand All @@ -12,6 +12,9 @@

*******************************************************************************

2.4.3
* Fix issue with Hadoop3 Compatibility

2.4.2
* Support for Scala 2.12 (SPARKC-458)

Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@

| What | Where |
| ---------- | ----- |
| Packages | [Spark Cassandra Connector Spark Packages Website](https://spark-packages.org/package/datastax/spark-cassandra-connector) |
| Community | Chat with us at [DataStax Academy's #spark-connector Slack channel](#slack) |
| Scala Docs | Most Recent Release (2.4.2): [Spark-Cassandra-Connector](https://datastax.github.io/spark-cassandra-connector/ApiDocs/2.4.2/spark-cassandra-connector/), [Embedded-Cassandra](https://datastax.github.io/spark-cassandra-connector/ApiDocs/2.4.2/spark-cassandra-connector-embedded/) |

| Community | Chat with us at [Datastax and Cassandra Q&A](https://community.datastax.com/index.html) |
| Scala Docs | Most Recent Release (2.4.3): [Spark-Cassandra-Connector](https://datastax.github.io/spark-cassandra-connector/ApiDocs/2.4.2/spark-cassandra-connector/), [Embedded-Cassandra](https://datastax.github.io/spark-cassandra-connector/ApiDocs/2.4.2/spark-cassandra-connector-embedded/) |
| Latest Production Release | [2.4.3](https://mvnrepository.com/artifact/com.datastax.spark/spark-cassandra-connector_2.12/2.4.3) |
| Latest Preview Release | [3.0-alpha](https://mvnrepository.com/artifact/com.datastax.spark/spark-cassandra-connector_2.12/3.0-alpha) |
## Features

*Lightning-fast cluster computing with Apache Spark™ and Apache Cassandra®.*
Expand Down Expand Up @@ -105,7 +105,7 @@ This project has also been published to the Maven Central Repository.
For SBT to download the connector binaries, sources and javadoc, put this in your project
SBT config:

libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "2.4.2"
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "2.4.3"

* The default Scala version for Spark 2.0+ is 2.11 please choose the appropriate build. See the
[FAQ](doc/FAQ.md) for more information
Expand Down
8 changes: 4 additions & 4 deletions doc/0_quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,14 @@ Configure a new Scala project with the Apache Spark and dependency.

The dependencies are easily retrieved via Maven Central

libraryDependencies += "com.datastax.spark" % "spark-cassandra-connector_2.11" % "2.4.2"
libraryDependencies += "com.datastax.spark" % "spark-cassandra-connector_2.11" % "2.4.3"

The spark-packages libraries can also be used with spark-submit and spark shell, these
commands will place the connector and all of its dependencies on the path of the
Spark Driver and all Spark Executors.

$SPARK_HOME/bin/spark-shell --packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.2
$SPARK_HOME/bin/spark-submit --packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.2
$SPARK_HOME/bin/spark-shell --packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.3
$SPARK_HOME/bin/spark-submit --packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.3

For the list of available versions, see:
- https://spark-packages.org/package/datastax/spark-cassandra-connector
Expand Down Expand Up @@ -58,7 +58,7 @@ Run the `spark-shell` with the packages line for your version. To configure
the default Spark Configuration pass key value pairs with `--conf`

$SPARK_HOME/bin/spark-shell --conf spark.cassandra.connection.host=127.0.0.1 \
--packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.2
--packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.3

This command would set the Spark Cassandra Connector parameter
`spark.cassandra.connection.host` to `127.0.0.1`. Change this
Expand Down
2 changes: 1 addition & 1 deletion doc/13_spark_shell.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Find additional versions at [Spark Packages](https://spark-packages.org/package/
```bash
cd spark/install/dir
#Include the --master if you want to run against a spark cluster and not local mode
./bin/spark-shell [--master sparkMasterAddress] --jars yourAssemblyJar --packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.2 --conf spark.cassandra.connection.host=yourCassandraClusterIp
./bin/spark-shell [--master sparkMasterAddress] --jars yourAssemblyJar --packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.3 --conf spark.cassandra.connection.host=yourCassandraClusterIp
```

By default spark will log everything to the console and this may be a bit of an overload. To change this copy and modify the `log4j.properties` template file
Expand Down
4 changes: 2 additions & 2 deletions doc/15_python.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ https://spark-packages.org/package/datastax/spark-cassandra-connector

```bash
./bin/pyspark \
--packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.2
--packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.3
```

### Loading a DataFrame in Python
Expand Down Expand Up @@ -46,7 +46,7 @@ source and by specifying keyword arguments for `keyspace` and `table`.

### Saving a DataFrame in Python to Cassandra

A DataFrame can be saved to an *existing* Cassandra table by using the the `org.apache.spark.sql.cassandra` source and by specifying keyword arguments for `keyspace` and `table` and saving mode (`append`, `overwrite`, `error` or `ignore`, see [Data Sources API doc](https://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes)).
A DataFrame can be saved to an *existing* Cassandra table by using the the `org.apache.spark.sql.cassandra` source and by specifying keyword arguments for `keyspace` and `table` and saving mode (`append`, `overwrite`, `error` or `ignore`, see [Data Sources API doc](https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html#save-modes)).

#### Example Saving to a Cassanra Table as a Pyspark DataFrame
```python
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,7 @@ object MappedToGettableDataConverter extends StrictLogging {
for (i <- columnValues.indices)
columnValues(i) = converters(i).convert(columnValues(i))
struct.newInstance(columnValues: _*)
case None =>
case _ =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe i'm just paranoid, but should we do None | null? I may just be paranoid (this wouldn't be the first time...)

Copy link
Author

@acustiq acustiq Mar 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think that the _ wildcard symbol catches both None and null as it matches anything:

case _ => // Wild card pattern -- matches anything

Source: https://docs.scala-lang.org/tutorials/FAQ/finding-symbols.html

null.asInstanceOf[struct.ValueRepr]
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,13 @@ class MappedToGettableDataConverterSpec extends FlatSpec with Matchers {
row.isNullAt(1) shouldEqual true
}

it should "convert null UDTs to null" in {
val userTable = newTable(loginColumn, passwordColumn)
val converter = MappedToGettableDataConverter[UserWithOption](userTable, userTable.columnRefs)
val row = converter.convert(null)
row shouldEqual null
}

case class UserWithNestedOption(login: String, address: Option[Address])

it should "convert None case class to null" in {
Expand Down