Clyra-AI
diff --git a/‎docs/blog/ai-engineering-control-problem/index.html‎
Lines changed: 9 additions & 9 deletions b/‎docs/blog/ai-engineering-control-problem/index.html‎
Lines changed: 9 additions & 9 deletions
diff --git a/‎docs/blog/ai-engineering-maturity-model/index.html‎
Lines changed: 8 additions & 8 deletions b/‎docs/blog/ai-engineering-maturity-model/index.html‎
Lines changed: 8 additions & 8 deletions
diff --git a/‎docs/blog/control-benchmarks/agent-action-risk-scenarios-minimum-test-set/index.html‎
Lines changed: 4 additions & 4 deletions b/‎docs/blog/control-benchmarks/agent-action-risk-scenarios-minimum-test-set/index.html‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎docs/blog/control-benchmarks/buyers-cannot-evaluate-agentic-control-clearly/index.html‎
Lines changed: 7 additions & 7 deletions b/‎docs/blog/control-benchmarks/buyers-cannot-evaluate-agentic-control-clearly/index.html‎
Lines changed: 7 additions & 7 deletions
diff --git a/‎docs/blog/control-benchmarks/index.html‎
Lines changed: 2 additions & 2 deletions b/‎docs/blog/control-benchmarks/index.html‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/blog/control-benchmarks/measure-control-efficacy-for-ai-agents/index.html‎
Lines changed: 4 additions & 4 deletions b/‎docs/blog/control-benchmarks/measure-control-efficacy-for-ai-agents/index.html‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎docs/blog/control-benchmarks/pilot-evaluation-framework-for-agentic-tools/index.html‎
Lines changed: 13 additions & 13 deletions b/‎docs/blog/control-benchmarks/pilot-evaluation-framework-for-agentic-tools/index.html‎
Lines changed: 13 additions & 13 deletions
@@ -169,14 +169,14 @@ <h1>AI Engineering Is a Control Problem, Not a Prompt Problem</h1>
       <section class="section page-nav" aria-labelledby="page-nav-heading">
         <h2 id="page-nav-heading">In this piece</h2>
         <div class="page-nav-list">
-          <a href="#the-operational-reality">The operational reality</a>
-          <a href="#the-anti-pattern">The anti-pattern</a>
-          <a href="#the-better-system-pattern">The better system pattern</a>
+          <a href="#the-operational-reality">Where the pressure shows up</a>
+          <a href="#the-anti-pattern">The failure mode</a>
+          <a href="#the-better-system-pattern">The better pattern</a>
           <a href="#what-leaders-should-optimize-for-instead">What leaders should optimize for instead</a>
           <a href="#why-security-cares">Why security cares</a>
           <a href="#why-platform-and-engineering-care">Why platform and engineering care</a>
           <a href="#concrete-example-manual-steering-vs-governed-issue-to-pr-flow">Concrete example: manual steering vs governed issue-to-PR flow</a>
-          <a href="#practical-next-step">Practical next step</a>
+          <a href="#practical-next-step">What to do next</a>
         </div>
         <div class="link-row">
           <a href="/blog/operating-notes/">Series home</a>
@@ -186,7 +186,7 @@ <h2 id="page-nav-heading">In this piece</h2>
       </section>
 
       <section class="section" aria-labelledby="quick-read">
-        <h2 id="quick-read">Quick read</h2>
+        <h2 id="quick-read">The short version</h2>
         <div class="summary-grid">
           <article class="card">
             <p class="post-stage">The rule</p>
@@ -216,7 +216,7 @@ <h3>Review your current agent workflow as a control system</h3>
       </section>
 
       <section class="section article-section">
-        <h2 id="the-operational-reality">The operational reality</h2>
+        <h2 id="the-operational-reality">Where the pressure shows up</h2>
         <p>
           A prompt is cheap to improve. A containment event is not. That is why
           the "better prompting" conversation weakens the moment an agent leaves
@@ -242,7 +242,7 @@ <h2 id="the-operational-reality">The operational reality</h2>
       </section>
 
       <section class="section article-section">
-        <h2 id="the-anti-pattern">The anti-pattern</h2>
+        <h2 id="the-anti-pattern">The failure mode</h2>
         <p>
           The anti-pattern is prompt-centrism: treating the quality of the
           instructions as if it were the same thing as control. It is not. A
@@ -269,7 +269,7 @@ <h2 id="the-anti-pattern">The anti-pattern</h2>
       </section>
 
       <section class="section article-section">
-        <h2 id="the-better-system-pattern">The better system pattern</h2>
+        <h2 id="the-better-system-pattern">The better pattern</h2>
         <p>
           The better pattern is to treat AI engineering as a governed software
           delivery system. The model still matters, but it sits inside a larger
@@ -415,7 +415,7 @@ <h3>3. Controlled promotion</h3>
       </section>
 
       <section class="section article-section">
-        <h2 id="practical-next-step">Practical next step</h2>
+        <h2 id="practical-next-step">What to do next</h2>
         <p>
           Pick one agent workflow your team already uses. Ignore the prompt for
           a moment and map the control surface instead.
 
@@ -167,14 +167,14 @@ <h1>The AI Engineering Maturity Model</h1>
       <section class="section page-nav" aria-labelledby="page-nav-heading">
         <h2 id="page-nav-heading">In this piece</h2>
         <div class="page-nav-list">
-          <a href="#the-operational-reality">The operational reality</a>
-          <a href="#the-anti-pattern">The anti-pattern</a>
-          <a href="#the-better-system-pattern">The better system pattern</a>
+          <a href="#the-operational-reality">Where the pressure shows up</a>
+          <a href="#the-anti-pattern">The failure mode</a>
+          <a href="#the-better-system-pattern">The better pattern</a>
           <a href="#how-to-use-the-model-without-theater">How to use the model without theater</a>
           <a href="#why-security-cares">Why security cares</a>
           <a href="#why-platform-and-engineering-care">Why platform and engineering care</a>
           <a href="#concrete-example-a-realistic-90-day-progression">Concrete example: a realistic 90-day progression</a>
-          <a href="#practical-next-step">Practical next step</a>
+          <a href="#practical-next-step">What to do next</a>
         </div>
         <div class="link-row">
           <a href="/blog/operating-notes/">Series home</a>
@@ -184,7 +184,7 @@ <h2 id="page-nav-heading">In this piece</h2>
       </section>
 
       <section class="section article-section">
-        <h2 id="the-operational-reality">The operational reality</h2>
+        <h2 id="the-operational-reality">Where the pressure shows up</h2>
         <p>
           We see the same pattern across teams. They start with interactive
           prompting and a few strong engineers. Then they add repository
@@ -207,7 +207,7 @@ <h2 id="the-operational-reality">The operational reality</h2>
       </section>
 
       <section class="section article-section">
-        <h2 id="the-anti-pattern">The anti-pattern</h2>
+        <h2 id="the-anti-pattern">The failure mode</h2>
         <p>
           The anti-pattern is autonomy inflation: assuming that once an agent is
           useful interactively, the organization is ready for background
@@ -228,7 +228,7 @@ <h2 id="the-anti-pattern">The anti-pattern</h2>
       </section>
 
       <section class="section article-section">
-        <h2 id="the-better-system-pattern">The better system pattern</h2>
+        <h2 id="the-better-system-pattern">The better pattern</h2>
         <p>
           The better pattern is staged capability growth. We find it useful to
           think in five levels.
@@ -352,7 +352,7 @@ <h3>Days 61-90</h3>
       </section>
 
       <section class="section article-section">
-        <h2 id="practical-next-step">Practical next step</h2>
+        <h2 id="practical-next-step">What to do next</h2>
         <p>
           Assess one team or one repo against the five levels and be strict
           about what evidence counts.
 
@@ -173,12 +173,12 @@ <h2 id="page-nav-heading">In this piece</h2>
         <div class="page-nav-list">
           <a href="#research-grounding">Research grounding</a>
           <a href="#why-scenario-design-matters">Why scenario design matters</a>
-          <a href="#the-anti-pattern">The anti-pattern</a>
+          <a href="#the-anti-pattern">The failure mode</a>
           <a href="#the-minimum-test-set">The minimum test set</a>
           <a href="#why-security-leaders-care">Why security leaders care</a>
           <a href="#why-platform-and-engineering-care">Why platform and engineering care</a>
           <a href="#concrete-artifact-a-scenario-matrix">Concrete artifact: a scenario matrix</a>
-          <a href="#practical-next-step">Practical next step</a>
+          <a href="#practical-next-step">What to do next</a>
         </div>
         <div class="link-row">
           <a href="/blog/control-benchmarks/">Series home</a>
@@ -231,7 +231,7 @@ <h2 id="why-scenario-design-matters">Why scenario design matters</h2>
       </section>
 
       <section class="section article-section">
-        <h2 id="the-anti-pattern">The anti-pattern</h2>
+        <h2 id="the-anti-pattern">The failure mode</h2>
         <p>
           The anti-pattern is to let the vendor choose only the safest or most
           flattering workflow. A tidy refactor, a documentation update, or a
@@ -375,7 +375,7 @@ <h3>Threshold to widen</h3>
       </section>
 
       <section class="section article-section">
-        <h2 id="practical-next-step">Practical next step</h2>
+        <h2 id="practical-next-step">What to do next</h2>
         <p>
           Before the next pilot, ask the tool owner to write down the five
           scenario families above and fill in the matrix before the first demo.
 
@@ -170,13 +170,13 @@ <h1>Why Buyers Still Cannot Evaluate Agentic Control Clearly</h1>
         <h2 id="page-nav-heading">In this piece</h2>
         <div class="page-nav-list">
           <a href="#research-grounding">Research grounding</a>
-          <a href="#the-operational-reality">The operational reality</a>
-          <a href="#the-anti-pattern">The anti-pattern</a>
+          <a href="#the-operational-reality">Where the pressure shows up</a>
+          <a href="#the-anti-pattern">The failure mode</a>
           <a href="#the-benchmark-language-buyers-actually-need">The benchmark language buyers actually need</a>
           <a href="#why-security-leaders-care">Why security leaders care</a>
           <a href="#why-platform-and-engineering-care">Why platform and engineering care</a>
           <a href="#concrete-artifact-a-first-pass-evaluation-matrix">Concrete artifact: a first-pass evaluation matrix</a>
-          <a href="#practical-next-step">Practical next step</a>
+          <a href="#practical-next-step">What to do next</a>
         </div>
         <div class="link-row">
           <a href="/blog/control-benchmarks/">Series home</a>
@@ -186,7 +186,7 @@ <h2 id="page-nav-heading">In this piece</h2>
       </section>
 
       <section class="section" aria-labelledby="quick-read">
-        <h2 id="quick-read">Quick read</h2>
+        <h2 id="quick-read">The short version</h2>
         <div class="summary-grid">
           <article class="card">
             <p class="post-stage">The problem</p>
@@ -233,7 +233,7 @@ <h2 id="research-grounding">Research grounding</h2>
       </section>
 
       <section class="section article-section">
-        <h2 id="the-operational-reality">The operational reality</h2>
+        <h2 id="the-operational-reality">Where the pressure shows up</h2>
         <p>
           A Head of AppSec or CISO now gets asked a version of the same
           question every quarter: which agentic tools are mature enough to let
@@ -263,7 +263,7 @@ <h2 id="the-operational-reality">The operational reality</h2>
       </section>
 
       <section class="section article-section">
-        <h2 id="the-anti-pattern">The anti-pattern</h2>
+        <h2 id="the-anti-pattern">The failure mode</h2>
         <p>
           The anti-pattern is to compare agentic products as if the hard part
           were still interface quality and developer delight. Buyers end up
@@ -425,7 +425,7 @@ <h3>Pilot discipline</h3>
       </section>
 
       <section class="section article-section">
-        <h2 id="practical-next-step">Practical next step</h2>
+        <h2 id="practical-next-step">What to do next</h2>
         <p>
           Pick the next agentic tool your organization is likely to pilot and
           rewrite the evaluation brief before the demo happens. If the brief is
 
@@ -208,7 +208,7 @@ <h3>
             <p class="post-stage">Evidence</p>
             <h3>
               <a href="/blog/control-benchmarks/proof-completeness-for-ai-agent-changes/"
-                >Proof Completeness: What Evidence Must Exist Before an AI Agent Change Is Trustworthy</a
+                >Proof Completeness for AI Agent Changes</a
               >
             </h3>
             <p>
@@ -222,7 +222,7 @@ <h3>
             <p class="post-stage">Pilot design</p>
             <h3>
               <a href="/blog/control-benchmarks/pilot-evaluation-framework-for-agentic-tools/"
-                >A Practical Pilot Evaluation Framework for Agentic Tools</a
+                >How to Run a Buyer-Grade Agent Pilot</a
               >
             </h3>
             <p>
 
@@ -170,12 +170,12 @@ <h2 id="page-nav-heading">In this piece</h2>
         <div class="page-nav-list">
           <a href="#research-grounding">Research grounding</a>
           <a href="#what-control-efficacy-actually-means">What control efficacy actually means</a>
-          <a href="#the-anti-pattern">The anti-pattern</a>
+          <a href="#the-anti-pattern">The failure mode</a>
           <a href="#the-five-metrics-that-matter">The five metrics that matter</a>
           <a href="#why-security-leaders-care">Why security leaders care</a>
           <a href="#why-platform-and-engineering-care">Why platform and engineering care</a>
           <a href="#concrete-artifact-a-control-efficacy-scorecard">Concrete artifact: a control efficacy scorecard</a>
-          <a href="#practical-next-step">Practical next step</a>
+          <a href="#practical-next-step">What to do next</a>
         </div>
         <div class="link-row">
           <a href="/blog/control-benchmarks/">Series home</a>
@@ -223,7 +223,7 @@ <h2 id="what-control-efficacy-actually-means">What control efficacy actually mea
       </section>
 
       <section class="section article-section">
-        <h2 id="the-anti-pattern">The anti-pattern</h2>
+        <h2 id="the-anti-pattern">The failure mode</h2>
         <p>
           The anti-pattern is to accept proxies for control. Prompt guidance,
           reviewer expectations, and post-hoc logs can all be helpful. None of
@@ -367,7 +367,7 @@ <h3>Operational cost</h3>
       </section>
 
       <section class="section article-section">
-        <h2 id="practical-next-step">Practical next step</h2>
+        <h2 id="practical-next-step">What to do next</h2>
         <p>
           Take one current agent pilot and rewrite the success criteria in
           control-efficacy terms. The goal is to make the next steering
 
@@ -3,7 +3,7 @@
   <head>
     <meta charset="utf-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1" />
-    <title>A Practical Pilot Evaluation Framework for Agentic Tools | CAISI Blog</title>
+    <title>How to Run a Buyer-Grade Agent Pilot | CAISI Blog</title>
     <meta
       name="description"
       content="A practical framework for buyer-grade pilots that measure control quality, proof quality, and operational fit instead of demo theater."
@@ -13,15 +13,15 @@
     <meta name="robots" content="index,follow,max-image-preview:large,max-snippet:-1,max-video-preview:-1" />
     <meta property="og:site_name" content="CAISI" />
     <meta property="og:type" content="article" />
-    <meta property="og:title" content="A Practical Pilot Evaluation Framework for Agentic Tools | CAISI Blog" />
+    <meta property="og:title" content="How to Run a Buyer-Grade Agent Pilot | CAISI Blog" />
     <meta property="og:description" content="A practical framework for buyer-grade pilots that measure control quality, proof quality, and operational fit instead of demo theater." />
     <meta property="og:url" content="https://caisi.dev/blog/control-benchmarks/pilot-evaluation-framework-for-agentic-tools/" />
     <meta property="og:image" content="https://caisi.dev/assets/caisi-social.png" />
-    <meta property="og:image:alt" content="A Practical Pilot Evaluation Framework for Agentic Tools | CAISI Blog" />
+    <meta property="og:image:alt" content="How to Run a Buyer-Grade Agent Pilot | CAISI Blog" />
     <meta property="og:image:width" content="1600" />
     <meta property="og:image:height" content="900" />
     <meta name="twitter:card" content="summary_large_image" />
-    <meta name="twitter:title" content="A Practical Pilot Evaluation Framework for Agentic Tools | CAISI Blog" />
+    <meta name="twitter:title" content="How to Run a Buyer-Grade Agent Pilot | CAISI Blog" />
     <meta name="twitter:description" content="A practical framework for buyer-grade pilots that measure control quality, proof quality, and operational fit instead of demo theater." />
     <meta name="twitter:image" content="https://caisi.dev/assets/caisi-social.png" />
     <meta name="author" content="David Ahmann" />
@@ -39,7 +39,7 @@
           "@type": "WebPage",
           "@id": "https://caisi.dev/blog/control-benchmarks/pilot-evaluation-framework-for-agentic-tools/#webpage",
           "url": "https://caisi.dev/blog/control-benchmarks/pilot-evaluation-framework-for-agentic-tools/",
-          "name": "A Practical Pilot Evaluation Framework for Agentic Tools | CAISI Blog",
+          "name": "How to Run a Buyer-Grade Agent Pilot | CAISI Blog",
           "description": "A practical framework for buyer-grade pilots that measure control quality, proof quality, and operational fit instead of demo theater.",
           "inLanguage": "en",
           "isPartOf": {
@@ -71,13 +71,13 @@
             {
               "@type": "ListItem",
               "position": 4,
-              "name": "A Practical Pilot Evaluation Framework for Agentic Tools"
+              "name": "How to Run a Buyer-Grade Agent Pilot"
             }
           ]
         },
         {
           "@type": "BlogPosting",
-          "headline": "A Practical Pilot Evaluation Framework for Agentic Tools",
+          "headline": "How to Run a Buyer-Grade Agent Pilot",
           "description": "A practical framework for buyer-grade pilots that measure control quality, proof quality, and operational fit instead of demo theater.",
           "mainEntityOfPage": "https://caisi.dev/blog/control-benchmarks/pilot-evaluation-framework-for-agentic-tools/",
           "url": "https://caisi.dev/blog/control-benchmarks/pilot-evaluation-framework-for-agentic-tools/",
@@ -145,7 +145,7 @@
           <span class="divider">/</span>
           <a href="/blog/control-benchmarks/">Benchmark Series</a>
           <span class="divider">/</span>
-          <span>A Practical Pilot Evaluation Framework for Agentic Tools</span>
+          <span>How to Run a Buyer-Grade Agent Pilot</span>
         </p>
         <p class="eyebrow">Benchmark Series / Post 5 of 5 / Pilot Design</p>
         <div class="post-author">
@@ -154,7 +154,7 @@
             By <a href="https://www.linkedin.com/in/dahmann/">David Ahmann (LinkedIn)</a>
           </p>
         </div>
-        <h1>A Practical Pilot Evaluation Framework for Agentic Tools</h1>
+        <h1>How to Run a Buyer-Grade Agent Pilot</h1>
         <p class="lead">
           The pilot ends, the team is impressed, and the real decision still is
           not clear. Everyone learned that a strong operator could get useful
@@ -171,12 +171,12 @@ <h2 id="page-nav-heading">In this piece</h2>
         <div class="page-nav-list">
           <a href="#research-grounding">Research grounding</a>
           <a href="#what-most-agent-pilots-actually-test">What most pilots actually prove</a>
-          <a href="#the-anti-pattern">The anti-pattern</a>
+          <a href="#the-anti-pattern">The failure mode</a>
           <a href="#a-practical-pilot-framework">A practical pilot framework</a>
           <a href="#what-good-pilot-outputs-look-like">What a serious pilot should leave behind</a>
           <a href="#why-security-and-platform-should-co-own-it">Why security and platform should co-own it</a>
           <a href="#concrete-artifact-a-pilot-scorecard">Concrete artifact: a pilot scorecard</a>
-          <a href="#practical-next-step">Practical next step</a>
+          <a href="#practical-next-step">What to do next</a>
         </div>
         <div class="link-row">
           <a href="/blog/control-benchmarks/">Series home</a>
@@ -229,7 +229,7 @@ <h2 id="what-most-agent-pilots-actually-test">What most pilots actually prove</h
       </section>
 
       <section class="section article-section">
-        <h2 id="the-anti-pattern">The anti-pattern</h2>
+        <h2 id="the-anti-pattern">The failure mode</h2>
         <p>
           The anti-pattern is productivity theater. The tool completes a few
           convenient tasks, the stakeholders see enough upside to stay excited,
@@ -372,7 +372,7 @@ <h3>Next control investment</h3>
       </section>
 
       <section class="section article-section">
-        <h2 id="practical-next-step">Practical next step</h2>
+        <h2 id="practical-next-step">What to do next</h2>
         <p>
           For the next pilot, write the exit memo before the work starts. If
           that feels premature, it usually means the team has not decided what