Releases: RumbleDB/rumble
Rumble 1.2 Chestnut Oak
Mostly optimizations to group by aggregations. A few more functions implemented.
Rumble 1.1 Arbutus Oak
This is the second beta release of Rumble, a JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.
New:
- bugfixes.
- more functions
- FLWOR expressions are now internally mapped to DataFrames and Spark SQL, which brings a 2x performance improvement for grouping and sorting queries.
The jar file was built with ANTLR 4.7 and is compatible with all tested distributions of Spark 2.3+. It is meant to be used with the spark-submit script either as an interactive shell, or to execute a single query from a JSONiq file (local or HDFS) and output the result either on stdin or back to the disk (local or HDFS). This works both locally and with a deployed cluster.
The jar file was compiled with Java 8 and is forward compatible with later Java versions (e.g., Java 11).
The jar file for older versions of Spark (2.0+) with ANTLR 4.5.3 is available on request (if you receive a warning on the command line).
Documentation: http://rumble.readthedocs.io/en/latest/
Rumble 1.0.0 "Linden Oak"
This is the first beta release of Rumble, a JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.
New:
- Bugfixes.
- Jar auto-displays CLI examples when invoked with no parameters, also with java.
- distinct-values() is pushed down to Spark
- Fixes NullPointerException in some cases when exceptions are raised in closures
The jar file was built with ANTLR 4.7 and is compatible with all tested distributions of Spark 2.3+. It is meant to be used with the spark-submit script either as an interactive shell, or to execute a single query from a JSONiq file (local or HDFS) and output the result either on stdin or back to the disk (local or HDFS). This works both locally and with a deployed cluster.
The jar file for older versions of Spark (2.0+) with ANTLR 4.5.3 is available on request (if you receive a warning on the command line).
Documentation: http://rumble.readthedocs.io/en/latest/
Sparksoniq 0.9.7 Mahogany
New alpha release for Sparksoniq, a JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.
New:
- Bugfixes.
- It is now possible to read a query locally (--query-path), and output the results on stdin rather than to the local filesystem.
- Fix error on non-existing JSONObject keySet() method due to a backward incompatibility of org.json in some environments.
The jar file was built with ANTLR 4.7 and is compatible with all tested distributions of Spark 2.3+. It is meant to be used with the spark-submit script either as an interactive shell, or to execute a single query from a JSONiq file (local or HDFS) and output the result either on stdin or back to the disk (local or HDFS). This works both locally and with a deployed cluster.
The jar file for older versions of Spark (2.0+) with ANTLR 4.5.3 is available on request (if you receive a warning on the command line).
Documentation: http://sparksoniq.readthedocs.io/en/latest/
Sparksoniq 0.9.6 "Olive Tree"
New alpha release for Sparksoniq, a JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.
New:
- New functions text-file#1, text-file#2, tokenize#1, tokenize#2 to open text files as input. Now billions of lines can be manipulated as sequences of strings with FLWORs, in the same way billions of objects could until now.
- Fixing serialization bugs (escaping)
- Fixing bug in string literal escaping in the shell
- Fix bug with local count clause execution
- Fix bug in the shell leading to a crash when a parallelized FLWOR execution was outputting the empty sequence
- Fix bug leading to a crash when the where clause expression was not returning a boolean in local execution. Now the effective boolean value is taken.
The jar file with ANTLR 4.7 is to be used with Spark 2.3+. Older versions (2.0+) use ANTLR 4.5.3.
Documentation: http://sparksoniq.readthedocs.io/en/latest/
Sparksoniq 0.9.5 "Larch"
New alpha release for Sparksoniq, a JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.
New:
- Many bugfixes
- All FLWOR clauses are now supported locally (that is when parallelize() or json-file() is not used) Locally means: without invoking Spark transformations. Local FLWOR expressions can execute on the client but also within a transformation triggered by a non-local FLWOR.
- Local FLWOR expressions can fully nest. All queries of the tutorial now work and you can use and abuse let clauses.
- Pushdowns: json-file("file.json").foo[].bar[[2]].foobar works on Spark
- Significant improvements in memory footprint: some queries are no longer materialized in memory (e.g., filtering query with a where clause or count).
- Significant improvements in performance: a file of 16,000,000 objects was successfully tested for count, filtering, grouping and ordering with a local Spark execution on a single laptop. Performance also improved on bigger datasets on clusters.
The jar file with ANTLR 4.7 is to be used with Spark 2.3+. Older versions (2.0+) use ANTLR 4.5.3.
Documentation: http://sparksoniq.readthedocs.io/en/latest/
Sparksoniq 0.9.4 "Birch"
New alpha release for Sparksoniq, a JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.
New: various bugfixes:
- count clauses are supported and pushed down to Spark
- simple keys must no longer be quoted when constructing objects (in particular: null pointer exception is fixed)
- error message when a function name+arity is not found is more helpful
- it is no longer necessary to supply the --master option twice on the CLI: only once to spark-submit is enough.
The jar files no longer contain the Spark libraries, as they are provided by the local environment or the cluster.
The jar file with ANTLR 4.7 is to be used with Spark 2.3+. Older versions (2.0+) use ANTLR 4.5.3.
Documentation: http://sparksoniq.readthedocs.io/en/latest/
Sparksoniq 0.9.3 "Cedar"
Third alpha release for Sparksoniq, a JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.
New: various bugfixes:
- Ctrl+D now exits nicely from the shell
- count() calls are pushed down to Spark if the nested expression uses underlying RDDs.
- various exceptions are now caught and displayed with a nice error messages.
- Strings can be concatenated with atomic types (they get serialized to a string)
- Lookup can be done on a sequence of objects
The jar files no longer contain the Spark libraries, as they are provided by the local environment or the cluster.
The jar file with ANTLR 4.7 is to be used with Spark 2.3+. Older versions (2.0+) use ANTLR 4.5.3.
Documentation: http://sparksoniq.readthedocs.io/en/latest/
Sparksoniq Alpha 0.9.2 "Cypress"
Second release for Sparksoniq, a JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.
New: various bugfixes (e.g., empty sequence handling), richer function library, general comparison operators.
The jar files no longer contain the Spark libraries, as they are provided by the local environment or the cluster.
The jar file with ANTLR 4.7 is to be used with Spark 2.3+. Older versions (2.0+) use ANTLR 4.5.3.
Documentation: http://sparksoniq.readthedocs.io/en/latest/
Sparksoniq Alpha 0.9.1 "Spruce"
First release for Sparksoniq, a JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.
Documentation: http://sparksoniq.readthedocs.io/en/latest/