Skip to content

Releases: RumbleDB/rumble

Rumble 1.2 Chestnut Oak

17 Oct 09:09
06d23b5
Compare
Choose a tag to compare
Pre-release

Mostly optimizations to group by aggregations. A few more functions implemented.

Rumble 1.1 Arbutus Oak

08 Aug 11:08
30e9103
Compare
Choose a tag to compare
Pre-release

This is the second beta release of Rumble, a JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.

New:

  • bugfixes.
  • more functions
  • FLWOR expressions are now internally mapped to DataFrames and Spark SQL, which brings a 2x performance improvement for grouping and sorting queries.

The jar file was built with ANTLR 4.7 and is compatible with all tested distributions of Spark 2.3+. It is meant to be used with the spark-submit script either as an interactive shell, or to execute a single query from a JSONiq file (local or HDFS) and output the result either on stdin or back to the disk (local or HDFS). This works both locally and with a deployed cluster.

The jar file was compiled with Java 8 and is forward compatible with later Java versions (e.g., Java 11).

The jar file for older versions of Spark (2.0+) with ANTLR 4.5.3 is available on request (if you receive a warning on the command line).

Documentation: http://rumble.readthedocs.io/en/latest/

Rumble 1.0.0 "Linden Oak"

31 May 13:58
02fed3c
Compare
Choose a tag to compare
Pre-release

This is the first beta release of Rumble, a JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.

New:

  • Bugfixes.
  • Jar auto-displays CLI examples when invoked with no parameters, also with java.
  • distinct-values() is pushed down to Spark
  • Fixes NullPointerException in some cases when exceptions are raised in closures

The jar file was built with ANTLR 4.7 and is compatible with all tested distributions of Spark 2.3+. It is meant to be used with the spark-submit script either as an interactive shell, or to execute a single query from a JSONiq file (local or HDFS) and output the result either on stdin or back to the disk (local or HDFS). This works both locally and with a deployed cluster.

The jar file for older versions of Spark (2.0+) with ANTLR 4.5.3 is available on request (if you receive a warning on the command line).

Documentation: http://rumble.readthedocs.io/en/latest/

Sparksoniq 0.9.7 Mahogany

20 May 13:00
b0a0a69
Compare
Choose a tag to compare
Pre-release

New alpha release for Sparksoniq, a JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.

New:

  • Bugfixes.
  • It is now possible to read a query locally (--query-path), and output the results on stdin rather than to the local filesystem.
  • Fix error on non-existing JSONObject keySet() method due to a backward incompatibility of org.json in some environments.

The jar file was built with ANTLR 4.7 and is compatible with all tested distributions of Spark 2.3+. It is meant to be used with the spark-submit script either as an interactive shell, or to execute a single query from a JSONiq file (local or HDFS) and output the result either on stdin or back to the disk (local or HDFS). This works both locally and with a deployed cluster.

The jar file for older versions of Spark (2.0+) with ANTLR 4.5.3 is available on request (if you receive a warning on the command line).

Documentation: http://sparksoniq.readthedocs.io/en/latest/

Sparksoniq 0.9.6 "Olive Tree"

23 Apr 13:56
f93097d
Compare
Choose a tag to compare
Pre-release

New alpha release for Sparksoniq, a JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.

New:

  • New functions text-file#1, text-file#2, tokenize#1, tokenize#2 to open text files as input. Now billions of lines can be manipulated as sequences of strings with FLWORs, in the same way billions of objects could until now.
  • Fixing serialization bugs (escaping)
  • Fixing bug in string literal escaping in the shell
  • Fix bug with local count clause execution
  • Fix bug in the shell leading to a crash when a parallelized FLWOR execution was outputting the empty sequence
  • Fix bug leading to a crash when the where clause expression was not returning a boolean in local execution. Now the effective boolean value is taken.

The jar file with ANTLR 4.7 is to be used with Spark 2.3+. Older versions (2.0+) use ANTLR 4.5.3.

Documentation: http://sparksoniq.readthedocs.io/en/latest/

Sparksoniq 0.9.5 "Larch"

04 Mar 10:08
1e45945
Compare
Choose a tag to compare
Pre-release

New alpha release for Sparksoniq, a JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.

New:

  • Many bugfixes
  • All FLWOR clauses are now supported locally (that is when parallelize() or json-file() is not used) Locally means: without invoking Spark transformations. Local FLWOR expressions can execute on the client but also within a transformation triggered by a non-local FLWOR.
  • Local FLWOR expressions can fully nest. All queries of the tutorial now work and you can use and abuse let clauses.
  • Pushdowns: json-file("file.json").foo[].bar[[2]].foobar works on Spark
  • Significant improvements in memory footprint: some queries are no longer materialized in memory (e.g., filtering query with a where clause or count).
  • Significant improvements in performance: a file of 16,000,000 objects was successfully tested for count, filtering, grouping and ordering with a local Spark execution on a single laptop. Performance also improved on bigger datasets on clusters.

The jar file with ANTLR 4.7 is to be used with Spark 2.3+. Older versions (2.0+) use ANTLR 4.5.3.

Documentation: http://sparksoniq.readthedocs.io/en/latest/

Sparksoniq 0.9.4 "Birch"

22 Nov 10:32
Compare
Choose a tag to compare
Pre-release

New alpha release for Sparksoniq, a JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.

New: various bugfixes:

  • count clauses are supported and pushed down to Spark
  • simple keys must no longer be quoted when constructing objects (in particular: null pointer exception is fixed)
  • error message when a function name+arity is not found is more helpful
  • it is no longer necessary to supply the --master option twice on the CLI: only once to spark-submit is enough.

The jar files no longer contain the Spark libraries, as they are provided by the local environment or the cluster.

The jar file with ANTLR 4.7 is to be used with Spark 2.3+. Older versions (2.0+) use ANTLR 4.5.3.

Documentation: http://sparksoniq.readthedocs.io/en/latest/

Sparksoniq 0.9.3 "Cedar"

15 Nov 14:14
0ebbdbb
Compare
Choose a tag to compare
Pre-release

Third alpha release for Sparksoniq, a JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.

New: various bugfixes:

  • Ctrl+D now exits nicely from the shell
  • count() calls are pushed down to Spark if the nested expression uses underlying RDDs.
  • various exceptions are now caught and displayed with a nice error messages.
  • Strings can be concatenated with atomic types (they get serialized to a string)
  • Lookup can be done on a sequence of objects

The jar files no longer contain the Spark libraries, as they are provided by the local environment or the cluster.

The jar file with ANTLR 4.7 is to be used with Spark 2.3+. Older versions (2.0+) use ANTLR 4.5.3.

Documentation: http://sparksoniq.readthedocs.io/en/latest/

Sparksoniq Alpha 0.9.2 "Cypress"

11 Oct 14:29
f13133c
Compare
Choose a tag to compare
Pre-release

Second release for Sparksoniq, a JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.

New: various bugfixes (e.g., empty sequence handling), richer function library, general comparison operators.

The jar files no longer contain the Spark libraries, as they are provided by the local environment or the cluster.

The jar file with ANTLR 4.7 is to be used with Spark 2.3+. Older versions (2.0+) use ANTLR 4.5.3.

Documentation: http://sparksoniq.readthedocs.io/en/latest/

Sparksoniq Alpha 0.9.1 "Spruce"

18 Jan 16:57
0f03cce
Compare
Choose a tag to compare
Pre-release

First release for Sparksoniq, a JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.

Documentation: http://sparksoniq.readthedocs.io/en/latest/