Skip to content

User Defined Function Usage

Tengjun Jin edited this page Dec 22, 2023 · 5 revisions

Introduction

Examples

Machine Learning Model As a UDF:

This query will execute objects_inference over metadata and return the inference results.

User defined function:

async def async_objects_inference(blob_id):
  for service in aidb_engine._config.inference_bindings:
    if service.service.name == 'objects00':
      inference_service = service
      # replaced by your own ML inference
      outputs = await inference_service.infer(blob_id)
    return outputs[0]

def objects_inference(blob_id):
  df = pd.DataFrame({'blob_id': [blob_id]})
  return asyncio_run(async_objects_inference(df))

Query:

SELECT objects_inference(frame)
FROM blobs_00

Using User Defined Function as a join condition

This query will return matched pairs of text and image.

User Defined Function

# Pseudo code
def is_match(img, text, threshold):
  match_pro = CLIP_model(img, text)
  if match_pro > threshold:
    return True
  else:
    return False

Query:

SELECT img_id, text_id
FROM img_table JOIN text_table ON is_match(img_table.img, text_table.text, 0.5) = TRUE

UDF to reduce data processing effort

By utilizing SQL queries with UDFs for data filtering, you can bypass the need for coding logical operations and frequently modifying filtering conditions in the code after obtaining the function results.

User Defined Function

# Pseudo code
def function1(col):
  return 

def function2(col1, col2):
  return 

Query:

SELECT function1(col1)
FROM table1
WHERE function1(col2) > 100 AND function2(col3, 2) < 100

Do filter on UDF outputs

If the User Defined Function produces multiple output columns, direct filtering on specific UDF outputs is still feasible by assigning aliases to these outputs.

User Defined Function

# Pseudo code
def function1(col):
  return x, y, z

Query:

SELECT function1(col1) AS (output1, output2, output3)
FROM table1
WHERE output1 > 100 AND output2 < 100

User Defined Function Output Format:

  1. No output: we will drop that input row.

  2. Single value: like integer or float value.

  3. Multiple rows outputs: using a list of elements (e.g. [1, 2, 3]) or dataframe (e.g. pd.Dataframe({'x': [1, 2, 3]})).

  4. Multiple columns outputs: using a list of tuple (e.g. [(1, 2, 3)] or dataframe (e.g. pd.Dataframe({'x': [1], 'y':[2], 'z':[3]}))