Skip to content

Cognitive Evaluation OS designed for reproducible, evidence-based AI evaluation. Extracted from 50+ real evaluator logs. Fully compatible with ModelRefiner v4.x.

License

Notifications You must be signed in to change notification settings

hideyuki001/meta-framework-v3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 

Repository files navigation

Meta Framework v3.0 — Cognitive Evaluation OS

A procedural, reproducible, and model-agnostic Evaluation Operating System (Eval-OS)
designed by Hideyuki Okabe (2025)

Source document: META FRAMEWORK v3.0
(Internal reference only)


🎯 Overview

Meta Framework v3.0 is a Cognitive Evaluation Operating System built from
dozens of real-world multimodal evaluation sessions.

It addresses the core challenges of LLM/GenAI evaluation:

  • Ban on Speculation (zero inferred facts)
  • Clear Evidence Chains (Observation → Evidence → Verdict)
  • Structured Judgments using YES/NO and TRUE/FALSE/NA
  • Full reproducibility (same memo → same verdict)
  • Safe cultural & ethical processing
  • Drift Detection to reduce evaluator inconsistency
  • Full compatibility with ModelRefiner v4.x
  • Enterprise-grade reliability

🧠 Architecture (8-Layer Evaluation OS)

Layer 0 — Philosophy & Forbidden Operations

Foundational rules: No speculation, no cultural absolutism, no impressionistic reasoning.

Layer 1 — Structural Kernel

Base classes for Observation Memo → Evidence → Verdict
including Evidence-chain enforcement.

Layer 2 — Evaluation Engine

Rubric normalization, YES/NO logic, TRUE/FALSE/NA cultural judgment.

Layer 3 — Justification Engine

3-line justification format, inference suppression, traceability guarantees.

Layer 4 — Auto-Structure Booster

Natural-language memo → structured OM
(entity extraction, anomaly detection, negative pattern detection).

Layer 5 — Meta-Cognitive Regulation

Evaluator drift detection, consistency analysis, forbidden inference monitoring.

Layer 6 — Advanced Evaluation Layer

Multi-label rubrics, ambiguous prompt handling, conflict resolution.

Layer 7 — Interface Modes

  • Mini-v3
  • Full-OS evaluation mode
  • Embedding Mode (integration in external apps)

Layer 8 — Evolution Layer

Roadmap: v3.1 → v4.0
Rubric DAG, cultural dimension upgrade, safety expansion.


📌 Why v3.0?

  • Optimized for large-scale, rubric-driven AI evaluation
  • Extracted from 40–50+ real evaluation logs
  • Condensed from v2 (45k+ characters) into a workable core
  • Stronger cultural/ethical safety rules
  • Built-in drift detection & meta-review
  • Fully interoperable with ModelRefiner v4.x
  • Generalized for safe public use (no project-specific content)

🚀 Use Cases

  • Large-scale rubric-based / evidence-based evaluation
  • Safety / RLHF assessments
  • Cultural & ethical compliance QA
  • Image / video multimodal evaluation
  • Translation QA (successor to v2)
  • Creative refinement with ModelRefiner v4.x
  • Integration with custom LLMs & workflow agents

📡 Integration Example (Python)

from meta_framework_v3 import EvaluatorOS

evaluator = EvaluatorOS()

result = evaluator.evaluate(
    prompt=user_prompt,
    observation_memo=user_memo,
    rubric_set="full_rubric"
)

print(result.summary)
📈 Status
Version: 3.0

Completion: 96%

Ready for: Production-grade evaluation

Next: v3.1 (Rubric DAG + conflict-case library)

📜 License
MIT License (planned)

👤 Author
Hideyuki Okabe
Freelance AI Evaluator / Translation QA Specialist
Developer of ModelRefiner v4.x & Meta Framework v23.0

About

Cognitive Evaluation OS designed for reproducible, evidence-based AI evaluation. Extracted from 50+ real evaluator logs. Fully compatible with ModelRefiner v4.x.

Resources

License

Stars

Watchers

Forks

Packages

No packages published