Skip to content

Conversation

@JohannesBin
Copy link

Summary

Extends ExtractedEntity with an optional attributes field, populated during entity extraction.

Type of Change

  • Bug fix
  • New feature
  • Performance improvement
  • Documentation/Tests

Objective

Entity attribute extraction currently requires defining entity_types with Pydantic models. This triggers O(n) additional LLM calls via _extract_entity_attributes for n entities. Without predefined schemas, no attributes are extracted.

The entity extraction pass already processes full episode context. Attribute identification is a natural byproduct of entity recognition. This change extends the extraction prompt to request attributes inline, achieving attribute discovery with zero marginal LLM calls and no schema requirement.

{"name": "Acme Corp", "entity_type_id": 0, "attributes": {"employee_count": 150}}

New attributes are also merged into existing nodes during deduplication.

Default empty dict ensures backward compatibility.

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • All existing tests pass

Breaking Changes

  • This PR contains breaking changes

Checklist

  • Code follows project style guidelines (make lint passes)
  • Self-review completed
  • Documentation updated where necessary
  • No secrets or sensitive information committed

Related Issues

Closes #

Currently, extracting entity attributes requires defining entity_types with
Pydantic models, triggering O(n) additional LLM calls. Without schemas, no
attributes are extracted.

The entity extraction pass already processes full episode context. This change
extends the extraction prompt to request attributes inline, achieving attribute
discovery with zero marginal LLM calls and no schema requirement.

Changes:
- Add optional attributes field to ExtractedEntity (defaults to {})
- Update extraction prompts to request attributes inline
- Pass extracted attributes to EntityNode on creation
- Merge new attributes into existing nodes during deduplication

Fully backwards compatible.
@danielchalef
Copy link
Member

danielchalef commented Jan 1, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@JohannesBin
Copy link
Author

I have read the CLA Document and I hereby sign the CLA

danielchalef added a commit that referenced this pull request Jan 1, 2026
OpenAI's structured outputs require all properties to be in the
required array. Changed attributes from optional with default_factory
to required field. LLM returns {} when no attributes found.
OpenAI's structured outputs don't allow additionalProperties in objects.
Changed attributes from dict[str, str] to list[EntityAttribute] with
explicit key/value fields. Convert to dict when creating EntityNode.
@JohannesBin
Copy link
Author

Update: Encountered schema validation errors with OpenAI's default structured outputs configuration. The issue: dict[str, str] generates additionalProperties: { type: "string" }, but OpenAI strict mode requires additionalProperties: false for all objects.

Fix: Introduced EntityAttribute wrapper model with explicit key/value fields. The list is converted to a dict when constructing EntityNode.

Reference: OpenAI Structured Outputs docs

When extracting monetary values, include currency if stated (e.g., '50M USD').
If currency not explicitly mentioned, preserve original format without assuming.
@JohannesBin
Copy link
Author

JohannesBin commented Jan 2, 2026

Known limitation: Attribute conflict handling

During testing, i've observed that conflicting attributes across episodes accumulate rather than resolve:

Episode 1: "Person A is skeptical"
Episode 2: "Person A is actually supportive"

Result: {attitude: "skeptical", supportive: "true"} // both present

Last-write-wins on key collision, but different phrasing creates different keys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants