A procedural, reproducible, and model-agnostic Evaluation Operating System (Eval-OS)
designed by Hideyuki Okabe (2025)
Source document: META FRAMEWORK v3.0
(Internal reference only)
Meta Framework v3.0 is a Cognitive Evaluation Operating System built from
dozens of real-world multimodal evaluation sessions.
It addresses the core challenges of LLM/GenAI evaluation:
- Ban on Speculation (zero inferred facts)
- Clear Evidence Chains (Observation → Evidence → Verdict)
- Structured Judgments using YES/NO and TRUE/FALSE/NA
- Full reproducibility (same memo → same verdict)
- Safe cultural & ethical processing
- Drift Detection to reduce evaluator inconsistency
- Full compatibility with ModelRefiner v4.x
- Enterprise-grade reliability
Foundational rules: No speculation, no cultural absolutism, no impressionistic reasoning.
Base classes for Observation Memo → Evidence → Verdict
including Evidence-chain enforcement.
Rubric normalization, YES/NO logic, TRUE/FALSE/NA cultural judgment.
3-line justification format, inference suppression, traceability guarantees.
Natural-language memo → structured OM
(entity extraction, anomaly detection, negative pattern detection).
Evaluator drift detection, consistency analysis, forbidden inference monitoring.
Multi-label rubrics, ambiguous prompt handling, conflict resolution.
- Mini-v3
- Full-OS evaluation mode
- Embedding Mode (integration in external apps)
Roadmap: v3.1 → v4.0
Rubric DAG, cultural dimension upgrade, safety expansion.
- Optimized for large-scale, rubric-driven AI evaluation
- Extracted from 40–50+ real evaluation logs
- Condensed from v2 (45k+ characters) into a workable core
- Stronger cultural/ethical safety rules
- Built-in drift detection & meta-review
- Fully interoperable with ModelRefiner v4.x
- Generalized for safe public use (no project-specific content)
- Large-scale rubric-based / evidence-based evaluation
- Safety / RLHF assessments
- Cultural & ethical compliance QA
- Image / video multimodal evaluation
- Translation QA (successor to v2)
- Creative refinement with ModelRefiner v4.x
- Integration with custom LLMs & workflow agents
from meta_framework_v3 import EvaluatorOS
evaluator = EvaluatorOS()
result = evaluator.evaluate(
prompt=user_prompt,
observation_memo=user_memo,
rubric_set="full_rubric"
)
print(result.summary)
📈 Status
Version: 3.0
Completion: 96%
Ready for: Production-grade evaluation
Next: v3.1 (Rubric DAG + conflict-case library)
📜 License
MIT License (planned)
👤 Author
Hideyuki Okabe
Freelance AI Evaluator / Translation QA Specialist
Developer of ModelRefiner v4.x & Meta Framework v2–3.0