Skip to content

feat: Add function to check if all required ontology elements are present #856

@dhorkel

Description

@dhorkel

Our team currently uses an agent to ingest annotations into our internal database. As part of data validation, we always want to check whether all required annotations are present in the LabelRowV2 instance.

I thought this is potentially a common enough use case that it would be useful to pass it back for possible integration into Encord.

Here is my current implementation. Note: my implementation does check when nested options are required, as we have not used them.

def check_all_required_present(lr: LabelRowV2) -> bool:
    """Check that all required objects and classifications are present in the label row.

    Parameters
    ----------
    lr : LabelRowV2
        The label row to check.

    Returns
    -------
    bool
        True if all required objects and classifications are present, False otherwise.
    """
    logger = get_logger()
    # Check that all required objects/classifications are present
    lr.initialise_labels()
    logger.info("Getting required classifications and objects")
    # Note: Objects themselves are unhashable so I'm using the feature_node_hash as key
    required_classifications = {
        cla.feature_node_hash: cla.title
        for cla in lr.ontology_structure.classifications
        if cla.attributes[0].required
    }
    required_objects = {
        obj.feature_node_hash: obj.name
        for obj in lr.ontology_structure.objects
        if obj.required
    }
    logger.info("Getting classification and object instances")
    classification_instances = lr.get_classification_instances()
    object_instances = lr.get_object_instances()
    logger.info("Checking for missing required classifications and objects")
    found_classifications = {
        inst.ontology_item.feature_node_hash: inst.ontology_item.title
        for inst in classification_instances
    }
    found_objects = {
        inst.ontology_item.feature_node_hash: inst.ontology_item.name
        for inst in object_instances
    }
    missing_classification_hashes = set(required_classifications.keys()) - set(
        found_classifications.keys()
    )
    missing_object_hashes = set(required_objects.keys()) - set(found_objects.keys())
    if missing_classification_hashes or missing_object_hashes:
        logger.error("Missing required classifications or objects")
        for hash in missing_classification_hashes:
            logger.error(f"Missing classification: {required_classifications[hash]}")
        for hash in missing_object_hashes:
            logger.error(f"Missing object: {required_objects[hash]}")
        return False
    return True

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions