Skip to content

User Defined Function Usage

Tengjun Jin edited this page Dec 22, 2023 · 5 revisions

Introduction

Examples

Machine Learning Model As a UDF:

This query will execute objects model over metadata and return the inference results.

User defined function:

async def async_objects_inference(blob_id):
  for service in aidb_engine._config.inference_bindings:
    if service.service.name == 'objects00':
      inference_service = service
      # replaced by your own ML inference
      outputs = await inference_service.infer(blob_id)
    return outputs[0]

def objects_inference(blob_id):
  df = pd.DataFrame({'blob_id': [blob_id]})
  return asyncio_run(async_objects_inference(df))

Query:

SELECT objects_inference(frame)
FROM blobs_00

Using machine learning model as a join condition

This query will return matched text, img pair.

User Defined Function

# Pseudo code
def is_match(img, text, threshold):
  match_pro = CLIP_model(img, text)
  if match_pro > threshold:
    return True
  else:
    return False

Query:

SELECT img_id, text_id
FROM img_table JOIN text_table ON is_match(img_table.img, text_table.text) = TRUE

UDF to reduce data processing effort

You can directly filter data by sql query, which can avoid writing logical operations and changing filtering conditions in code frequently after getting function results.

User Defined Function

# Pseudo code
def function1(col):
  return 

def function2(col1, col2):
  return 

Query:

SELECT function1(col1)
FROM table1
WHERE function1(col2) > 100 AND function2(col3, 2) < 100

Do filter on UDF outputs

If the UDF has multiple output columns. You can still do filter directly on specific UDF output by assigning aliases for UDF outputs. # User Defined Function

# Pseudo code
def function1(col):
  return x, y, z

Query:

SELECT function1(col1) AS (output1, output2, output3)
FROM table1
WHERE output1 > 100 AND output2 < 100

User Defined Function Output Format:

  1. No output: we will drop that input row.

  2. Single value, like integer or float number

  3. Multiple rows outputs, using a list of elements (e.g. [1, 2, 3]) or dataframe (e.g. pd.Dataframe({'x': [1, 2, 3]})).

  4. Multiple columns outputs, using a list of tuple (e.g. [(1, 2, 3)] or dataframe (e.g. pd.Dataframe({'x': [1], 'y':[2], 'z':[3]}))