Skip to content

Document JBang UDF support with examples #1854

@velo

Description

@velo

(We need to include JBang-specific information to function.md where we talk how we support Java UDFs, and also mention it in deepdive.md in the preprocessor section.)

Summary

DataSQRL now supports writing Flink User-Defined Functions (UDFs) as single-file Java scripts using JBang. This feature needs user-facing documentation explaining how to use it, with practical examples.

Background

JBang UDFs allow users to write custom functions as simple .java files without needing a full Maven/Gradle project. The DataSQRL CLI automatically detects, compiles, and packages these into the pipeline.

Key implementation details:

  • JBang files are detected by the shebang line: ///usr/bin/env jbang "$0" "$@" ; exit $?
  • Flink dependencies are provided automatically — users must NOT declare them in //DEPS
  • Multiple JBang files are batched into a single jbang-udfs.jar for efficiency
  • Regular .java files without the shebang are ignored by the preprocessor

What to Document

1. Getting Started

  • JBang must be installed on the user's system (curl -Ls https://sh.jbang.dev | bash)
  • UDF files go in the usrlib/ directory of the SQRL project

2. Writing a Scalar Function

///usr/bin/env jbang "$0" "$@" ; exit $?
import org.apache.flink.table.functions.ScalarFunction;

public class MyScalarFunction extends ScalarFunction {

  public Long eval(Long a, Long b) {
    return a + b;
  }
}

3. Writing an Async Scalar Function

///usr/bin/env jbang "$0" "$@" ; exit $?
import org.apache.flink.table.functions.AsyncScalarFunction;
import org.apache.flink.table.functions.FunctionContext;

import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

public class MyAsyncScalarFunction extends AsyncScalarFunction {

    private transient ExecutorService executor;

    @Override
    public void open(FunctionContext context) throws Exception {
        this.executor = Executors.newFixedThreadPool(10);
    }

    @Override
    public void close() throws Exception {
        if (executor != null && !executor.isShutdown()) {
            executor.shutdown();
            if (!executor.awaitTermination(60, TimeUnit.SECONDS)) {
                executor.shutdownNow();
            }
        }
    }

    public void eval(CompletableFuture<String> result, String param1, int param2) {
        executor.submit(() -> {
            try {
                Thread.sleep(1000);
                String response = "Processed " + param1 + " with " + param2;
                result.complete(response);
            } catch (Exception e) {
                result.completeExceptionally(e);
            }
        });
    }
}

4. Using UDFs in SQRL Scripts

IMPORT usrlib.MyScalarFunction;
IMPORT usrlib.MyAsyncScalarFunction;
-- Aliases are also supported:
IMPORT usrlib.MyScalarFunction AS AddFunction;

MyTable := SELECT val, MyScalarFunction(val, val) AS sum
           FROM (VALUES ((1)), ((2)), ((3))) AS t(val);

MyAsyncTable := SELECT val, MyAsyncScalarFunction(val, ival) AS result
                FROM (VALUES (('a'), (1)), (('b'), (2))) AS t(val, ival);

5. Supported UDF Types

  • ScalarFunction
  • AsyncScalarFunction
  • TableFunction
  • AsyncTableFunction
  • AggregateFunction
  • TableAggregateFunction

6. Important Rules

  • The shebang line ///usr/bin/env jbang "$0" "$@" ; exit $? must be the first line
  • Do NOT add Flink //DEPS — they cause build errors since Flink is on the classpath already
  • External (non-Flink) //DEPS are allowed (e.g., //DEPS com.google.code.gson:gson:2.10)
  • Each file must contain exactly one public class that extends a Flink UDF class
  • Package declarations are optional
  • Function names are case-insensitive in SQRL scripts

7. How It Works Under the Hood

  • The preprocessor scans .java files for the JBang shebang
  • All detected JBang UDF files are batched into a single jbang export fatjar invocation
  • The resulting jbang-udfs.jar is placed in the lib directory
  • A .function.json manifest is generated for each UDF, enabling SQRL script imports

Reference Implementation

See the integration test at sqrl-testing/sqrl-testing-container/src/test/resources/jbang/ for a working example with both sync and async UDFs.

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentation

Type

No type

Projects

Status

Backlog

Relationships

None yet

Development

No branches or pull requests

Issue actions