Output mode (OutputMode
) describes what data is written to a streaming sink when there is new data available in streaming data sources (in a trigger / streaming batch).
Output mode of a streaming query is specified using outputMode method of DataStreamWriter
.
val inputStream = spark.
readStream.
format("rate").
load
import org.apache.spark.sql.streaming.{OutputMode, Trigger}
import scala.concurrent.duration._
val consoleOutput = inputStream.
writeStream.
format("console").
option("truncate", false).
trigger(Trigger.ProcessingTime(10.seconds)).
queryName("rate-console").
option("checkpointLocation", "checkpoint").
outputMode(OutputMode.Update). // <-- update output mode
start
OutputMode | Name | Behaviour | ||||||
---|---|---|---|---|---|---|---|---|
|
Default output mode that writes "new" rows only.
Required for datasets with Used for flatMapGroupsWithState operator
|
|||||||
|
Writes all rows (every time there are updates) and therefore corresponds to a traditional batch query.
|
|||||||
|
Write the rows that were updated (every time there are updates). If the query does not contain aggregations, it is equivalent to Append mode. Used for mapGroupsWithState and flatMapGroupsWithState operators |