SPARKC-312: Implementing FilterOptimizer #1019

ponkin · 2016-08-12T01:31:00Z

FilterOptimizer will to transform where clause to equivalent disjunction normal form.
For example where clause a = 5 and (b > 5 or b < 3) can be transformed to equivalent a = 5 and b > 5 or a =5 and b < 3, so now we can create two different table scans with where clause a = 5 and b > 5 and a =5 andb < 3 and union them.

…s available

…e.optimization + some more tests

RussellSpitzer

Just putting some more comments on here, I think we additionally definitely need some integration tests to check some edge cases and such.

RussellSpitzer · 2016-10-18T21:37:32Z

...-cassandra-connector/src/it/scala/org/apache/spark/sql/CassandraPrunedFilteredScanSpec.scala

  val withPushdown = Map("pushdown" -> "true")
+  val withWhereClauseOptimizationEnabled = Map("spark.cassandra.sql.enable.where.clause.optimization" -> "true")


Replace the string here with the parameter defined in the conf file
EnableWhereClauseOptimizationParam.name just in case we change things later :)

RussellSpitzer · 2016-10-18T21:45:27Z

...sandra-connector/src/main/scala/org/apache/spark/sql/cassandra/CassandraSourceRelation.scala

@@ -79,7 +78,15 @@ private[cassandra] class CassandraSourceRelation(
  def buildScan(): RDD[Row] = baseRdd.asInstanceOf[RDD[Row]]

  override def unhandledFilters(filters: Array[Filter]): Array[Filter] = filterPushdown match {
-    case true => predicatePushDown(filters).handledBySpark.toArray
+    case true => 
+      val optimizedFilters = FiltersOptimizer(filters).build()


I think it may be better if we got the FilterOptimzer into the predicatePushDown function, then I think we could skip having it written in a bunch of places.

RussellSpitzer · 2016-10-18T21:51:02Z

spark-cassandra-connector/src/main/scala/org/apache/spark/sql/cassandra/FiltersOptimizer.scala

+  * val Array(f1, f2, ... fn) = ... // such that `where f1 AND f2 AND ... AND fn`
+  *
+  */
+class FiltersOptimizer(filters: Array[Filter]) {


I'm a little confused why there is a separate class here, do we ever use this without calling .build() immediately after?

RussellSpitzer · 2016-10-18T21:56:41Z

...sandra-connector/src/main/scala/org/apache/spark/sql/cassandra/CassandraSourceRelation.scala

-        val filteredRdd = maybePushdownFilters(prunedRdd, pushdownFilters)
+        val optimizedFilters = new FiltersOptimizer(filters).build()
+        val optimizationCanBeApplied = isOptimizationAvailable(optimizedFilters)
+        val filteredRdd = if(optimizationCanBeApplied) {


I think this is a rather dangerous optimization sometimes, so I think we should default to off. For example

Table where x < 3 or x > 5 and x ranges from 1 to 10000. Doing two scans here is probably much more expensive than a single scan.

I thought that both scans will be done in parallel. By default this option is set to false.

ponkin and others added 4 commits August 12, 2016 04:23

SPARKC-312: Implementing FilterOptimizer

cea8574

SPARKC-312: Fixing incorrect optimization checking

3441215

SPARKC-312: Fixing bug in unhandledFilters method when optimization i…

5d593f2

…s available

SPARKC-312: Introducing option spark.cassandra.sql.enable.where.claus…

1ec147d

…e.optimization + some more tests

RussellSpitzer reviewed Oct 18, 2016

View reviewed changes

Alexey Ponkin added 3 commits November 20, 2016 01:16

SPARKC-312: Fixing review comments

2bbb34f

SPARKC-312: Resolving merge conflicts

1849515

Fixing test: SparkSession instead of sqlContext

3eafd87

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARKC-312: Implementing FilterOptimizer #1019

SPARKC-312: Implementing FilterOptimizer #1019

ponkin commented Aug 12, 2016

RussellSpitzer left a comment

RussellSpitzer Oct 18, 2016

RussellSpitzer Oct 18, 2016

RussellSpitzer Oct 18, 2016

RussellSpitzer Oct 18, 2016

ponkin Nov 19, 2016 •

edited

Loading

		val withPushdown = Map("pushdown" -> "true")
		val withWhereClauseOptimizationEnabled = Map("spark.cassandra.sql.enable.where.clause.optimization" -> "true")

SPARKC-312: Implementing FilterOptimizer #1019

Are you sure you want to change the base?

SPARKC-312: Implementing FilterOptimizer #1019

Conversation

ponkin commented Aug 12, 2016

RussellSpitzer left a comment

Choose a reason for hiding this comment

RussellSpitzer Oct 18, 2016

Choose a reason for hiding this comment

RussellSpitzer Oct 18, 2016

Choose a reason for hiding this comment

RussellSpitzer Oct 18, 2016

Choose a reason for hiding this comment

RussellSpitzer Oct 18, 2016

Choose a reason for hiding this comment

ponkin Nov 19, 2016 • edited Loading

Choose a reason for hiding this comment

ponkin Nov 19, 2016 •

edited

Loading