Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support partially specified writes from case classes #1139

Open
wants to merge 50 commits into
base: b2.0
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
5d21efd
Merge branch 'SPARKC-476-b1.6' into SPARKC-476-master
RussellSpitzer Feb 28, 2017
3d5022c
Merge tag 'v2.0.0'
RussellSpitzer Mar 7, 2017
d97c03c
Merge pull request #1092 from datastax/SPARKC-476-master
RussellSpitzer Mar 14, 2017
5fe21d1
Merge branch 'b1.6'
RussellSpitzer Mar 14, 2017
86e37ab
Merge branch 'b2.0'
RussellSpitzer Mar 21, 2017
dd6cae0
SPARKC-475: Add implicit RowWriterFactory for RDD[Row]
RussellSpitzer Mar 22, 2017
bdbd658
SPARKC-466: Add a CassandraRDDMock for end users to use in Unit Testing
RussellSpitzer Mar 24, 2017
94d1821
Merge branch 'b2.0'
RussellSpitzer Mar 30, 2017
16a6154
Update Doc References to Latest Version
RussellSpitzer Mar 30, 2017
a2b16cb
Merge pull request #1106 from datastax/SPARKC-466
RussellSpitzer Apr 3, 2017
7585fba
Merge pull request #1103 from datastax/SPARKC-475
RussellSpitzer Apr 3, 2017
85d5925
Refresh Documentation to Use Spark 2.X concepts
RussellSpitzer Mar 23, 2017
72ee19d
Merge pull request #1105 from datastax/2.0-DocRefresh
RussellSpitzer Apr 3, 2017
f952474
Merge branch 'b2.0'
RussellSpitzer Apr 3, 2017
af45b15
Merge tag 'v2.0.1'
RussellSpitzer Apr 4, 2017
4fca10c
Merge branch 'b2.0'
RussellSpitzer Apr 4, 2017
2d9c102
Merge branch 'b2.0'
RussellSpitzer Apr 4, 2017
e19883d
Add 1.6.6 Api Doc Links
RussellSpitzer Apr 4, 2017
2378bd3
Merge branch 'b2.0'
RussellSpitzer May 17, 2017
e63b771
Preparing 2.0.2 Release
RussellSpitzer May 17, 2017
2595231
Merge tag 'v2.0.2'
RussellSpitzer May 17, 2017
75eef87
Fix GenerateDocs
RussellSpitzer May 18, 2017
952a36d
Updated Api Doc Links
RussellSpitzer May 18, 2017
8e7ee2f
Add link to spark-connector Slack channel at DataStax Academy Slack
jaceklaskowski May 18, 2017
58aca07
Merge pull request #1114 from jaceklaskowski/readme-slack
RussellSpitzer May 18, 2017
3a33912
[DOCS][MINOR] Formatting
jaceklaskowski May 22, 2017
8ea8163
Merge pull request #1115 from jaceklaskowski/docs-formatting
RussellSpitzer May 22, 2017
fe9911b
SPARKC-493: Fix generate docs script
May 30, 2017
02089bb
Merge pull request #1121 from datastax/b2.0
artem-aliev Jun 1, 2017
012b0e1
Merge branch 'b1.6'
RussellSpitzer Jun 12, 2017
6ac59de
1.6.7 API Docs
RussellSpitzer Jun 12, 2017
24f392d
Merge branch 'b2.0'
RussellSpitzer Jun 13, 2017
c3ef9e1
Update 15_python.md
RussellSpitzer Jun 28, 2017
efc65c7
Add support for LcalDate year only parsing
Jun 29, 2017
5825d92
Update Java API Documentation.
Jun 30, 2017
8ee5da2
Merge branch 'b2.0'
RussellSpitzer Jul 7, 2017
ac6145d
Merge branch 'b2.0'
RussellSpitzer Jul 8, 2017
9a50162
Update Readme Doc Links
RussellSpitzer Jul 8, 2017
582c98a
doc: markdown typo - best commit in history
polomarcus Jul 18, 2017
38542cb
Merge pull request #1132 from polomarcus/doc-typo-16_partitioning
RussellSpitzer Jul 26, 2017
54b9be4
Merge pull request #1129 from beargummy/java-api-doc-update
RussellSpitzer Jul 26, 2017
1bb71d1
README updates
Jul 28, 2017
a0e5819
Merge pull request #1131 from datastax/DSP-13063
artem-aliev Aug 9, 2017
b7af81b
Merge branch 'b1.6'
RussellSpitzer Aug 14, 2017
9224ccd
Merge branch 'master' of github.com:datastax/spark-cassandra-connector
RussellSpitzer Aug 14, 2017
cf7b9d1
Merge branch 'b2.0'
RussellSpitzer Aug 14, 2017
7bddee2
Merge branch 'b2.0'
RussellSpitzer Aug 17, 2017
4ceaa20
support partially specified writes from case classes
Aug 21, 2017
96ab9c4
isTopLevel flag for nested UDFs should require no unmapped columns
Aug 24, 2017
727a7e4
fix integration test
Aug 31, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 38 additions & 37 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,20 @@
# Spark Cassandra Connector [![Build Status](https://travis-ci.org/datastax/spark-cassandra-connector.svg)](http://travis-ci.org/datastax/spark-cassandra-connector)
### [Spark Cassandra Connector Spark Packages Website](http://spark-packages.org/package/datastax/spark-cassandra-connector)
Chat with us at [DataStax Academy #spark-cassandra-connector](#datastax-academy)
# Spark Cassandra Connector [![Build Status](https://travis-ci.org/datastax/spark-cassandra-connector.svg)](https://travis-ci.org/datastax/spark-cassandra-connector)

### Most Recent Release Scala Docs
## Quick Links

### 2.0.0
* [Spark-Cassandra-Connector](http://datastax.github.io/spark-cassandra-connector/ApiDocs/2.0.0/spark-cassandra-connector/)
* [Embedded-Cassandra](http://datastax.github.io/spark-cassandra-connector/ApiDocs/2.0.0/spark-cassandra-connector-embedded/)
| What | Where |
| ---------- | ----- |
| Packages | [Spark Cassandra Connector Spark Packages Website](https://spark-packages.org/package/datastax/spark-cassandra-connector) |
| Community | Chat with us at [DataStax Academy's #spark-connector Slack channel](#slack) |
| Scala Docs | Most Recent Release (2.0.3): [Spark-Cassandra-Connector](https://datastax.github.io/spark-cassandra-connector/ApiDocs/2.0.3/spark-cassandra-connector/), [Embedded-Cassandra](https://datastax.github.io/spark-cassandra-connector/ApiDocs/2.0.3/spark-cassandra-connector-embedded/) |

[All Versions API Docs](#hosted-api-docs)
## Features

## Lightning-fast cluster computing with Apache Spark(TM) and Apache Cassandra(TM);
*Lightning-fast cluster computing with Apache Spark™ and Apache Cassandra®.*

This library lets you expose Cassandra tables as Spark RDDs, write Spark RDDs to Cassandra tables, and
execute arbitrary CQL queries in your Spark applications.

## Features

- Compatible with Apache Cassandra version 2.0 or higher (see table below)
- Compatible with Apache Spark 1.0 through 2.0 (see table below)
- Compatible with Scala 2.10 and 2.11
Expand All @@ -29,7 +27,7 @@ execute arbitrary CQL queries in your Spark applications.
- Partition RDDs according to Cassandra replication using `repartitionByCassandraReplica` call
- Converts data types between Cassandra and Scala
- Supports all Cassandra data types including collections
- Filters rows on the server side via the CQL `WHERE` clause
- Filters rows on the server side via the CQL `WHERE` clause
- Allows for execution of arbitrary CQL statements
- Plays nice with Cassandra Virtual Nodes
- Works with PySpark DataFrames
Expand Down Expand Up @@ -58,13 +56,13 @@ development for the next connector release in progress.
## Hosted API Docs
API documentation for the Scala and Java interfaces are available online:

### 2.0.0
* [Spark-Cassandra-Connector](http://datastax.github.io/spark-cassandra-connector/ApiDocs/2.0.0/spark-cassandra-connector/)
* [Embedded-Cassandra](http://datastax.github.io/spark-cassandra-connector/ApiDocs/2.0.0/spark-cassandra-connector-embedded/)
### 2.0.3
* [Spark-Cassandra-Connector](http://datastax.github.io/spark-cassandra-connector/ApiDocs/2.0.3/spark-cassandra-connector/)
* [Embedded-Cassandra](http://datastax.github.io/spark-cassandra-connector/ApiDocs/2.0.3/spark-cassandra-connector-embedded/)

### 1.6.5
* [Spark-Cassandra-Connector](http://datastax.github.io/spark-cassandra-connector/ApiDocs/1.6.5/spark-cassandra-connector/)
* [Embedded-Cassandra](http://datastax.github.io/spark-cassandra-connector/ApiDocs/1.6.5/spark-cassandra-connector-embedded/)
### 1.6.8
* [Spark-Cassandra-Connector](http://datastax.github.io/spark-cassandra-connector/ApiDocs/1.6.8/spark-cassandra-connector/)
* [Embedded-Cassandra](http://datastax.github.io/spark-cassandra-connector/ApiDocs/1.6.8/spark-cassandra-connector-embedded/)

### 1.5.2
* [Spark-Cassandra-Connector](http://datastax.github.io/spark-cassandra-connector/ApiDocs/1.5.2/spark-cassandra-connector/)
Expand All @@ -81,24 +79,27 @@ API documentation for the Scala and Java interfaces are available online:
* [Spark-Cassandra-Connector-Java](http://datastax.github.io/spark-cassandra-connector/ApiDocs/1.3.1/spark-cassandra-connector-java/)
* [Embedded-Cassandra](http://datastax.github.io/spark-cassandra-connector/ApiDocs/1.3.1/spark-cassandra-connector-embedded/)

### 1.2.0
### 1.2.0
* [Spark-Cassandra-Connector](http://datastax.github.io/spark-cassandra-connector/ApiDocs/1.2.0/spark-cassandra-connector/)
* [Spark-Cassandra-Connector-Java](http://datastax.github.io/spark-cassandra-connector/ApiDocs/1.2.0/spark-cassandra-connector-java/)
* [Embedded-Cassandra](http://datastax.github.io/spark-cassandra-connector/ApiDocs/1.2.0/spark-cassandra-connector-embedded/)

## Download
This project is available on Spark Packages; this is the easiest way to start using the connector:
http://spark-packages.org/package/datastax/spark-cassandra-connector
https://spark-packages.org/package/datastax/spark-cassandra-connector

This project has also been published to the Maven Central Repository.
For SBT to download the connector binaries, sources and javadoc, put this in your project
For SBT to download the connector binaries, sources and javadoc, put this in your project
SBT config:

libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "2.0.0"

libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "2.0.3"

* The default Scala version for Spark 2.0+ is 2.11 please choose the appropriate build. See the
[FAQ](doc/FAQ.md) for more information

## Building
See [Building And Artifacts](doc/12_building_and_artifacts.md)

## Documentation

- [Quick-start guide](doc/0_quick_start.md)
Expand All @@ -122,26 +123,26 @@ See [Building And Artifacts](doc/12_building_and_artifacts.md)
- [Tips for Developing the Spark Cassandra Connector](doc/developers.md)

## Online Training

### DataStax Academy

DataStax Academy provides free online training for Apache Cassandra and DataStax Enterprise. In [DS320: Analytics with Spark](https://academy.datastax.com/courses/ds320-analytics-with-apache-spark), you will learn how to effectively and efficiently solve analytical problems with Apache Spark, Apache Cassandra, and DataStax Enterprise. You will learn about Spark API, Spark-Cassandra Connector, Spark SQL, Spark Streaming, and crucial performance optimization techniques.

## Community

### Reporting Bugs

New issues may be reported using [JIRA](https://datastax-oss.atlassian.net/browse/SPARKC/). Please include
all relevant details including versions of Spark, Spark Cassandra Connector, Cassandra and/or DSE. A minimal
reproducible case with sample code is ideal.

### Mailing List
Questions and requests for help may be submitted to the [user mailing list](http://groups.google.com/a/lists.datastax.com/forum/#!forum/spark-connector-user).

### Gitter
Datastax is consolidating our chat resources to Slack at [DataStax Academy](#datastax-academy)
Questions and requests for help may be submitted to the [user mailing list](https://groups.google.com/a/lists.datastax.com/forum/#!forum/spark-connector-user).

The gitter room will be shut down in the near future
[![Join the chat at https://gitter.im/datastax/spark-cassandra-connector](https://badges.gitter.im/datastax/spark-cassandra-connector.svg)](https://gitter.im/datastax/spark-cassandra-connector?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
### Slack

### IRC
\#spark-cassandra-connector on irc.freenode.net. If you are new to IRC, you can use a [web-based client](http://webchat.freenode.net/?channels=#spark-cassandra-connector).
The project uses Slack to facilitate conversation in our community. Find us in the `#spark-connector` channel at [DataStax Academy Slack](https://academy.datastax.com/slack).

## Contributing

Expand Down Expand Up @@ -171,25 +172,25 @@ To run unit and integration tests:
By default, integration tests start up a separate, single Cassandra instance and run Spark in local mode.
It is possible to run integration tests with your own Cassandra and/or Spark cluster.
First, prepare a jar with testing code:

./sbt/sbt test:package

Then copy the generated test jar to your Spark nodes and run:

export IT_TEST_CASSANDRA_HOST=<IP of one of the Cassandra nodes>
export IT_TEST_SPARK_MASTER=<Spark Master URL>
./sbt/sbt it:test

## Generating Documents
To generate the Reference Document use
To generate the Reference Document use

./sbt/sbt spark-cassandra-connector-unshaded/run (outputLocation)

outputLocation defaults to doc/reference.md

## License

Copyright 2014-2016, DataStax, Inc.
Copyright 2014-2017, DataStax, Inc.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Expand Down
11 changes: 6 additions & 5 deletions doc/0_quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,14 @@ Configure a new Scala project with the Apache Spark and dependency.
The dependencies are easily retrieved via the spark-packages.org website. For example, if you're using `sbt`, your build.sbt should include something like this:

resolvers += "Spark Packages Repo" at "https://dl.bintray.com/spark-packages/maven"
libraryDependencies += "datastax" % "spark-cassandra-connector" % "2.0.0-s_2.11"
libraryDependencies += "datastax" % "spark-cassandra-connector" % "2.0.1-s_2.11"

The spark-packages libraries can also be used with spark-submit and spark shell, these
commands will place the connector and all of its dependencies on the path of the
Spark Driver and all Spark Executors.

$SPARK_HOME/bin/spark-shell --packages datastax:spark-cassandra-connector:2.0.0-s_2.11
$SPARK_HOME/bin/spark-submit --packages datastax:spark-cassandra-connector:2.0.0-s_2.11
$SPARK_HOME/bin/spark-shell --packages datastax:spark-cassandra-connector:2.0.1-s_2.11
$SPARK_HOME/bin/spark-submit --packages datastax:spark-cassandra-connector:2.0.1-s_2.11

For the list of available versions, see:
- https://spark-packages.org/package/datastax/spark-cassandra-connector
Expand Down Expand Up @@ -59,16 +59,17 @@ Run the `spark-shell` with the packages line for your version. To configure
the default Spark Configuration pass key value pairs with `--conf`

$SPARK_HOME/bin/spark-shell --conf spark.cassandra.connection.host=127.0.0.1 \
--packages datastax:spark-cassandra-connector:2.0.0-s_2.11
--packages datastax:spark-cassandra-connector:2.0.1-s_2.11

This command would set the Spark Cassandra Connector parameter
`spark.cassandra.connection.host` to `127.0.0.1`. Change this
to the address of one of the nodes in your Cassandra cluster.

Enable Cassandra-specific functions on the `SparkContext`, `RDD`, and `DataFrame`:
Enable Cassandra-specific functions on the `SparkContext`, `SparkSession`, `RDD`, and `DataFrame`:

```scala
import com.datastax.spark.connector._
import org.apache.spark.sql.cassandra._
```

### Loading and analyzing data from Cassandra
Expand Down
2 changes: 1 addition & 1 deletion doc/10_embedded.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,6 @@ Simply add this to your SBT build, or in the appropriate format for a Maven buil
"com.datastax.spark" %% "spark-cassandra-connector-embedded" % {latest.version}

## Examples
https://github.com/datastax/SparkBuildExamples
[Spark Build Examples](https://github.com/datastax/SparkBuildExamples)

[Next - Performance Monitoring](11_metrics.md)
4 changes: 2 additions & 2 deletions doc/13_spark_shell.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Using the Spark Cassandra Connector with the Spark Shell

These instructions were last confirmed with Cassandra 3.0.9, Spark 2.0.2 and Connector 2.0.0.
These instructions were last confirmed with Cassandra 3.0.9, Spark 2.0.2 and Connector 2.0.1.

For this guide, we assume an existing Cassandra deployment, running either locally or on a cluster, a local installation of Spark and an optional Spark cluster. For detail setup instructions see [setup spark-shell](13_1_setup_spark_shell.md)

Expand All @@ -18,7 +18,7 @@ Find additional versions at [Spark Packages](http://spark-packages.org/package/d
```bash
cd spark/install/dir
#Include the --master if you want to run against a spark cluster and not local mode
./bin/spark-shell [--master sparkMasterAddress] --jars yourAssemblyJar --packages datastax:spark-cassandra-connector:2.0.0-s_2.11 --conf spark.cassandra.connection.host=yourCassandraClusterIp
./bin/spark-shell [--master sparkMasterAddress] --jars yourAssemblyJar --packages datastax:spark-cassandra-connector:2.0.1-s_2.11 --conf spark.cassandra.connection.host=yourCassandraClusterIp
```

By default spark will log everything to the console and this may be a bit of an overload. To change this copy and modify the `log4j.properties` template file
Expand Down
Loading