feat(extraction): inline attribute extraction during entity extraction #1131

JohannesBin · 2026-01-01T23:01:50Z

Summary

Extends ExtractedEntity with an optional attributes field, populated during entity extraction.

Type of Change

Bug fix
New feature
Performance improvement
Documentation/Tests

Objective

Entity attribute extraction currently requires defining entity_types with Pydantic models. This triggers O(n) additional LLM calls via _extract_entity_attributes for n entities. Without predefined schemas, no attributes are extracted.

The entity extraction pass already processes full episode context. Attribute identification is a natural byproduct of entity recognition. This change extends the extraction prompt to request attributes inline, achieving attribute discovery with zero marginal LLM calls and no schema requirement.

{"name": "Acme Corp", "entity_type_id": 0, "attributes": {"employee_count": 150}}

New attributes are also merged into existing nodes during deduplication.

Default empty dict ensures backward compatibility.

Testing

Unit tests added/updated
Integration tests added/updated
All existing tests pass

Breaking Changes

This PR contains breaking changes

Checklist

Code follows project style guidelines (make lint passes)
Self-review completed
Documentation updated where necessary
No secrets or sensitive information committed

Related Issues

Closes #

Currently, extracting entity attributes requires defining entity_types with Pydantic models, triggering O(n) additional LLM calls. Without schemas, no attributes are extracted. The entity extraction pass already processes full episode context. This change extends the extraction prompt to request attributes inline, achieving attribute discovery with zero marginal LLM calls and no schema requirement. Changes: - Add optional attributes field to ExtractedEntity (defaults to {}) - Update extraction prompts to request attributes inline - Pass extracted attributes to EntityNode on creation - Merge new attributes into existing nodes during deduplication Fully backwards compatible.

danielchalef · 2026-01-01T23:02:03Z

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

JohannesBin · 2026-01-01T23:03:17Z

I have read the CLA Document and I hereby sign the CLA

OpenAI's structured outputs require all properties to be in the required array. Changed attributes from optional with default_factory to required field. LLM returns {} when no attributes found.

OpenAI's structured outputs don't allow additionalProperties in objects. Changed attributes from dict[str, str] to list[EntityAttribute] with explicit key/value fields. Convert to dict when creating EntityNode.

JohannesBin · 2026-01-02T21:09:08Z

Update: Encountered schema validation errors with OpenAI's default structured outputs configuration. The issue: dict[str, str] generates additionalProperties: { type: "string" }, but OpenAI strict mode requires additionalProperties: false for all objects.

Fix: Introduced EntityAttribute wrapper model with explicit key/value fields. The list is converted to a dict when constructing EntityNode.

Reference: OpenAI Structured Outputs docs

When extracting monetary values, include currency if stated (e.g., '50M USD'). If currency not explicitly mentioned, preserve original format without assuming.

JohannesBin · 2026-01-02T21:59:58Z

Known limitation: Attribute conflict handling

During testing, i've observed that conflicting attributes across episodes accumulate rather than resolve:

Episode 1: "Person A is skeptical"
Episode 2: "Person A is actually supportive"

Result: {attitude: "skeptical", supportive: "true"} // both present

Last-write-wins on key collision, but different phrasing creates different keys.

danielchalef added a commit that referenced this pull request Jan 1, 2026

@JohannesBin has signed the CLA in #1131

165676e

JohannesBin added 3 commits January 2, 2026 22:02

fix: make attributes required for OpenAI strict mode compatibility

83a0f34

OpenAI's structured outputs require all properties to be in the required array. Changed attributes from optional with default_factory to required field. LLM returns {} when no attributes found.

fix: use list[EntityAttribute] for OpenAI strict mode

c846746

OpenAI's structured outputs don't allow additionalProperties in objects. Changed attributes from dict[str, str] to list[EntityAttribute] with explicit key/value fields. Convert to dict when creating EntityNode.

chore: remove PR description file

61ecae4

feat: add currency preservation hint to attribute extraction

5ae9b5a

When extracting monetary values, include currency if stated (e.g., '50M USD'). If currency not explicitly mentioned, preserve original format without assuming.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(extraction): inline attribute extraction during entity extraction #1131

feat(extraction): inline attribute extraction during entity extraction #1131

JohannesBin commented Jan 1, 2026

Uh oh!

danielchalef commented Jan 1, 2026 •

edited

Loading

Uh oh!

JohannesBin commented Jan 1, 2026

Uh oh!

JohannesBin commented Jan 2, 2026

Uh oh!

JohannesBin commented Jan 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(extraction): inline attribute extraction during entity extraction #1131

Are you sure you want to change the base?

feat(extraction): inline attribute extraction during entity extraction #1131

Conversation

JohannesBin commented Jan 1, 2026

Summary

Type of Change

Objective

Testing

Breaking Changes

Checklist

Related Issues

Uh oh!

danielchalef commented Jan 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JohannesBin commented Jan 1, 2026

Uh oh!

JohannesBin commented Jan 2, 2026

Uh oh!

JohannesBin commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danielchalef commented Jan 1, 2026 •

edited

Loading

JohannesBin commented Jan 2, 2026 •

edited

Loading