Skip to content

F023: Memory Admission Control — A-MAC-inspired fact scoring gate#126

Open
tfatykhov wants to merge 1 commit intomainfrom
feature/F023-memory-admission-control
Open

F023: Memory Admission Control — A-MAC-inspired fact scoring gate#126
tfatykhov wants to merge 1 commit intomainfrom
feature/F023-memory-admission-control

Conversation

@tfatykhov
Copy link
Copy Markdown
Owner

Summary

Adds spec for F023 — Memory Admission Control, an A-MAC-inspired 5-factor scoring gate that evaluates every candidate fact before it enters long-term storage.

Problem

Currently learn_fact() stores everything the LLM says to store. There is zero defense against:

  • Hallucinated facts entering long-term memory
  • Near-redundant facts that pass the 0.95 dedup threshold
  • Low-utility transient observations diluting recall quality

5-Factor Scoring

  • Grounding — ROUGE-L against recent conversation turns (anti-hallucination)
  • Novelty — Cosine similarity, continuous range (not binary dedup)
  • Type Prior — Category-based weights (preferences=0.95, tools=0.60)
  • Recency — Exponential decay, 24h half-life
  • Utility — LLM assessment (Phase 2 only)

4 Phases

  1. Rule-based scoring (~250 LOC)
  2. LLM utility factor (~60 LOC)
  3. Calibration feedback loop via F017 (~150 LOC)
  4. Contradiction-aware admission via F022 (~80 LOC)

Shadow mode rollout: scores without rejecting for first week.

A-MAC-inspired 5-factor admission scoring for learn_fact():
- Phase 1: Grounding (anti-hallucination), Novelty, Type Prior, Recency
- Phase 2: LLM Utility assessment
- Phase 3: Calibration feedback loop via F017 usage tracking
- Phase 4: Contradiction-aware admission (integrates with F022)

Addresses memory poisoning gap — currently zero defense against
hallucinated facts entering long-term storage.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 40f3b8d302

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- `admission_enabled` is False (global toggle)
- `input.confidence == 1.0` (explicit override by LLM)
- `input.source` is `"system"`, `"migration"`, or `"initiation"` (non-conversational)
- `input.category == "rule"` (rules are always explicit declarations)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove rule-category bypass from admission scoring

Bypassing admission whenever input.category == "rule" creates a reliable path around the new quality gate, because category assignment is LLM-driven and therefore fallible/manipulable; any hallucinated or low-novelty fact mislabeled as rule would skip grounding/novelty checks and be persisted, which directly undermines the feature’s stated goal of preventing bad memories at write time.

Useful? React with 👍 / 👎.

@tfatykhov
Copy link
Copy Markdown
Owner Author

Emerson — Initial review

Strong spec. The A-MAC adaptation is thoughtful — you took the architecture and adjusted it for single-user context instead of blindly replicating a benchmark-optimized system. A few observations:

What works well

  • Grounding check is the killer feature. This is the single biggest gap in Heart right now. Zero hallucination defense is genuinely dangerous for long-running agents — one fabricated fact in long-term storage can cascade through recall forever.
  • Shadow mode rollout is exactly right. Admission control is one of those features where false rejections are worse than false admissions (Tim can't easily re-teach facts). Building baseline data before flipping the switch is smart.
  • Continuous novelty (0.70-0.95 range) vs binary dedup (>0.95) is a clean extension. The existing dedup is too coarse — a fact at 0.92 similarity gets through even though it adds almost nothing.
  • Type priors using existing category field — this is Papert's Principle in action. The categories already exist and are unused for admission. You're building a detour around existing infrastructure, not replacing it.

Concerns to address

P1: Grounding check + inference-type facts
ROUGE-L against conversation turns will score low on facts that are valid inferences rather than textual echoes. Example: Tim says "I'm flying to Tokyo next week for a conference." The LLM might correctly infer and store "Tim will be traveling internationally." That's grounded in meaning but won't have high ROUGE-L overlap. The confidence=1.0 override handles this... but only if the LLM consistently uses it. I'd suggest tracking override rates in shadow mode — if >30% of admitted facts need overrides, the grounding check is too strict.

P1: Tool output in grounding window (your Open Question #3)
This isn't optional — it's critical for Phase 1. Web research, run_python, any tool that produces content the LLM might store as a fact — all of that content needs to be in the grounding window. Without it, every tool-derived fact either gets rejected or needs an override. I'd move this from "open question" to "Phase 1 requirement."

P2: Override abuse potential (your Open Question #2)
The confidence=1.0 override is necessary but risky. Claude models tend to be... assertive about their conclusions. I'd bet the override rate will be higher than you expect. Consider:

  • Rate limiting: max N overrides per episode/session
  • Logging override justifications (not just the fact, but why the LLM chose to override)
  • Phase 3 calibration should specifically track override precision (what % of overridden facts actually get used?)

P2: Recency factor source_timestamp
Where does source_timestamp come from for a newly learned fact? The fact is being created now, so it's always "current." I think you mean the timestamp of the conversation turn that triggered the fact? That needs to be clearer — and the implementation needs to extract the source turn's timestamp, not just use datetime.now().

P3: F011 connection
You mentioned F011 (skill discovery) depends on recall quality, which depends on admission quality. This is true but indirect — F023 cleans inputs, F011 uses outputs. Worth noting in the spec's feature interaction table but not a direct dependency.

On the F022 → F023 pipeline

The pipeline vision makes sense: F023 gates admission → F022 links via graph → F011 activates skills from recall. Each layer builds on cleaner data from the layer below. Phase 4's contradiction-aware admission is the right integration point with F022 — don't try to do it earlier.

Verdict

Approve with revisions. The two P1s (tool output in grounding window as Phase 1 requirement, and inference-type fact tracking) should be addressed before merge. The P2s can be open questions that get resolved during implementation.

Solid work. The shadow mode approach means even if the weights are wrong initially, we'll have data to correct them.

Copy link
Copy Markdown
Owner Author

@tfatykhov tfatykhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 CodeReviewer — Spec Review: F023 Memory Admission Control

Summary

Well-researched spec with strong theoretical backing (A-MAC paper). The 5-factor scoring architecture is sound, the phase rollout plan is excellent, and the complementary relationship with F017 (admission vs retrieval filtering) is a good architectural choice.

However, there are 2 P1 critical issues that would make the system non-functional as specified, 4 P2 issues that need resolution before implementation, and 2 P3 suggestions.


P1 — Critical Issues

P1-1: confidence=1.0 bypass makes admission a no-op by default

The spec states (Bypass conditions section):

input.confidence == 1.0 (explicit override by LLM)

But FactInput.confidence defaults to 1.0 in nous/heart/schemas.py:

confidence: float = Field(default=1.0, ge=0.0, le=1.0)

And the learn_fact tool also defaults to confidence=1.0:

async def learn_fact(content, ..., confidence: float = 1.0, ...):

This means every fact learned via the standard tool call bypasses admission scoring by default. The LLM almost never explicitly sets confidence — it just calls learn_fact(content=..., category=...). The entire admission system would be a no-op.

Fix options:

  1. Change the bypass to require an explicit override=True flag (not tied to confidence)
  2. Change the default confidence to something lower (e.g., 0.8) — but this has ripple effects
  3. Only bypass when confidence is explicitly passed AND equals 1.0 (requires tracking whether it was explicitly set vs defaulted)

Option 1 is cleanest — add an admission_override: bool = False parameter to learn_fact and FactInput.

P1-2: FactManager has no access to conversation messages for grounding check

FactManager.__init__ takes (db, embeddings, agent_id). It has no reference to WorkingMemoryManager, ConversationState, or any source of recent conversation turns.

The grounding check needs the last N conversation messages, but the spec says only "Uses the last N messages from working memory / episode transcript" without specifying how FactManager gets them.

The _learn() call chain: learn_fact() (tools.py) → heart.learn()FactManager.learn()FactManager._learn(). None of these currently pass conversation context.

Fix — the spec must specify the plumbing:

  • Option A: Add recent_messages: list[str] | None = None parameter to learn() / _learn(), and have tools.py fetch them from ConversationState before calling
  • Option B: Give FactManager a reference to WorkingMemoryManager at construction time
  • Option C: Have _compute_admission_score() query ConversationState from the database directly using self.db

Option C is simplest (no API changes to callers) but couples FactManager to conversation storage.


P2 — Should Fix

P2-1: ROUGE-L grounding blind spot for tool-sourced facts

The spec acknowledges this in Open Questions #3 and #6 but offers no solution. This is not an edge case — tool output is a primary source of facts in Nous. web_fetch, web_search, run_python results all feed into learn_fact.

If the grounding window only checks user/assistant messages (not tool call results), a large fraction of legitimate facts will get low grounding scores. The spec needs to specify:

  • Which message types are included (user, assistant, tool results?)
  • How tool output is accessed (it is in the conversation message array as tool_result messages)
  • Whether this changes the grounding window size considerations

This should be resolved in the spec, not deferred to Open Questions.

P2-2: FactDetail schema mismatch

The spec says _rejected_detail() returns a FactDetail with admitted=False, but FactDetail has no admitted field. Current fields: id, agent_id, content, category, subject, confidence, source, learned_at, last_confirmed, confirmation_count, superseded_by, contradiction_of, active, tags, created_at, contradiction_warning.

The spec claims "No schema changes for Phase 1" but this requires adding at minimum admitted: bool and possibly admission_score: float and rejection_reason: str to FactDetail. These are Pydantic schema changes (not DB schema), but they need to be specified.

P2-3: category="rule" bypass is risky

The spec bypasses admission for category="rule" because "rules are always explicit declarations." But the LLM assigns categories — a hallucinated fact could be categorized as "rule" and bypass all checks. This creates a second bypass loophole alongside the confidence=1.0 issue.

The Type Prior table already gives rules 0.90, which is generous. Let rules go through scoring with their high prior rather than bypassing entirely. If a rule is hallucinated, grounding will catch it.

P2-4: Recency factor adds minimal signal in Phase 1

With a 24h half-life, most facts learned during active conversation (age ~0) score ~1.0. Even 6-hour-old context scores 0.84. The factor barely differentiates in practice because the vast majority of learn_fact calls happen during live conversation.

The only meaningful case is F016 pre-prune extraction, where facts come from tool output that could be minutes to hours old. Consider:

  • Whether 20% weight for a factor that is almost always ~1.0 is dead weight
  • Whether source_timestamp is even reliably available (the spec assumes it but FactInput has no timestamp field — it has source_episode_id which could be used to derive age, but that is not addressed)

P3 — Nice to Have

P3-1: Type Prior table has dead entry for "rule"

Since rules bypass admission entirely (per the bypass conditions), the TYPE_PRIORS["rule"] = 0.90 entry is dead code. Either remove the bypass (recommended, see P2-3) or remove the table entry. Having both is confusing.

P3-2: INDEX.md ordering

The PR inserts F023 between F022 and F021, resulting in the order: F022, F023, F021. Numerical order would be F021, F022, F023. This is a pre-existing issue (F021 was already out of order) but the PR perpetuates it.


What Works Well

  1. Problem statement — Clearly articulated with concrete examples. The "prevent vs filter" principle is sound.
  2. Research backing — A-MAC mapping to Nous gaps is thorough and honest.
  3. Phase rollout — Shadow → soft → full mode is excellent risk management.
  4. Continuous novelty — Extending binary dedup to continuous scoring is a meaningful improvement.
  5. Rejection UX — Feeding rejection info back to the LLM is well-designed.
  6. Scope control — "What We are NOT Building" section shows disciplined scoping.
  7. Metrics — Comprehensive observability plan.
  8. F017 complementarity — Using F017 usage data for Phase 3 calibration is an elegant feedback loop.

Verdict

DO NOT implement until P1 issues are resolved. The confidence=1.0 bypass (P1-1) makes the entire system a no-op, and the missing conversation access plumbing (P1-2) means the highest-weighted factor (grounding) cannot be implemented as specified.

After P1 fixes, this is a strong spec that should produce a valuable feature. The P2 issues are design refinements that can be addressed in the spec update.

Issue count: 2 P1 | 4 P2 | 2 P3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant