(We need to include JBang-specific information to function.md where we talk how we support Java UDFs, and also mention it in deepdive.md in the preprocessor section.)
Summary
DataSQRL now supports writing Flink User-Defined Functions (UDFs) as single-file Java scripts using JBang. This feature needs user-facing documentation explaining how to use it, with practical examples.
Background
JBang UDFs allow users to write custom functions as simple .java files without needing a full Maven/Gradle project. The DataSQRL CLI automatically detects, compiles, and packages these into the pipeline.
Key implementation details:
- JBang files are detected by the shebang line:
///usr/bin/env jbang "$0" "$@" ; exit $?
- Flink dependencies are provided automatically — users must NOT declare them in
//DEPS
- Multiple JBang files are batched into a single
jbang-udfs.jar for efficiency
- Regular
.java files without the shebang are ignored by the preprocessor
What to Document
1. Getting Started
- JBang must be installed on the user's system (
curl -Ls https://sh.jbang.dev | bash)
- UDF files go in the
usrlib/ directory of the SQRL project
2. Writing a Scalar Function
///usr/bin/env jbang "$0" "$@" ; exit $?
import org.apache.flink.table.functions.ScalarFunction;
public class MyScalarFunction extends ScalarFunction {
public Long eval(Long a, Long b) {
return a + b;
}
}
3. Writing an Async Scalar Function
///usr/bin/env jbang "$0" "$@" ; exit $?
import org.apache.flink.table.functions.AsyncScalarFunction;
import org.apache.flink.table.functions.FunctionContext;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class MyAsyncScalarFunction extends AsyncScalarFunction {
private transient ExecutorService executor;
@Override
public void open(FunctionContext context) throws Exception {
this.executor = Executors.newFixedThreadPool(10);
}
@Override
public void close() throws Exception {
if (executor != null && !executor.isShutdown()) {
executor.shutdown();
if (!executor.awaitTermination(60, TimeUnit.SECONDS)) {
executor.shutdownNow();
}
}
}
public void eval(CompletableFuture<String> result, String param1, int param2) {
executor.submit(() -> {
try {
Thread.sleep(1000);
String response = "Processed " + param1 + " with " + param2;
result.complete(response);
} catch (Exception e) {
result.completeExceptionally(e);
}
});
}
}
4. Using UDFs in SQRL Scripts
IMPORT usrlib.MyScalarFunction;
IMPORT usrlib.MyAsyncScalarFunction;
-- Aliases are also supported:
IMPORT usrlib.MyScalarFunction AS AddFunction;
MyTable := SELECT val, MyScalarFunction(val, val) AS sum
FROM (VALUES ((1)), ((2)), ((3))) AS t(val);
MyAsyncTable := SELECT val, MyAsyncScalarFunction(val, ival) AS result
FROM (VALUES (('a'), (1)), (('b'), (2))) AS t(val, ival);
5. Supported UDF Types
ScalarFunction
AsyncScalarFunction
TableFunction
AsyncTableFunction
AggregateFunction
TableAggregateFunction
6. Important Rules
- The shebang line
///usr/bin/env jbang "$0" "$@" ; exit $? must be the first line
- Do NOT add Flink
//DEPS — they cause build errors since Flink is on the classpath already
- External (non-Flink)
//DEPS are allowed (e.g., //DEPS com.google.code.gson:gson:2.10)
- Each file must contain exactly one
public class that extends a Flink UDF class
- Package declarations are optional
- Function names are case-insensitive in SQRL scripts
7. How It Works Under the Hood
- The preprocessor scans
.java files for the JBang shebang
- All detected JBang UDF files are batched into a single
jbang export fatjar invocation
- The resulting
jbang-udfs.jar is placed in the lib directory
- A
.function.json manifest is generated for each UDF, enabling SQRL script imports
Reference Implementation
See the integration test at sqrl-testing/sqrl-testing-container/src/test/resources/jbang/ for a working example with both sync and async UDFs.
(We need to include JBang-specific information to
function.mdwhere we talk how we support Java UDFs, and also mention it indeepdive.mdin the preprocessor section.)Summary
DataSQRL now supports writing Flink User-Defined Functions (UDFs) as single-file Java scripts using JBang. This feature needs user-facing documentation explaining how to use it, with practical examples.
Background
JBang UDFs allow users to write custom functions as simple
.javafiles without needing a full Maven/Gradle project. The DataSQRL CLI automatically detects, compiles, and packages these into the pipeline.Key implementation details:
///usr/bin/env jbang "$0" "$@" ; exit $?//DEPSjbang-udfs.jarfor efficiency.javafiles without the shebang are ignored by the preprocessorWhat to Document
1. Getting Started
curl -Ls https://sh.jbang.dev | bash)usrlib/directory of the SQRL project2. Writing a Scalar Function
3. Writing an Async Scalar Function
4. Using UDFs in SQRL Scripts
5. Supported UDF Types
ScalarFunctionAsyncScalarFunctionTableFunctionAsyncTableFunctionAggregateFunctionTableAggregateFunction6. Important Rules
///usr/bin/env jbang "$0" "$@" ; exit $?must be the first line//DEPS— they cause build errors since Flink is on the classpath already//DEPSare allowed (e.g.,//DEPS com.google.code.gson:gson:2.10)public classthat extends a Flink UDF class7. How It Works Under the Hood
.javafiles for the JBang shebangjbang export fatjarinvocationjbang-udfs.jaris placed in the lib directory.function.jsonmanifest is generated for each UDF, enabling SQRL script importsReference Implementation
See the integration test at
sqrl-testing/sqrl-testing-container/src/test/resources/jbang/for a working example with both sync and async UDFs.