Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
ee6dc0c
feat(crossrepo): skill scaffold + repo selection + output dir
Hafiz408 Jun 26, 2026
c6e348c
feat(crossrepo): deterministic cross-repo signal extractor + tests
Hafiz408 Jun 26, 2026
7a9e318
fix(crossrepo): dedup api signals, catch yaml bucket/topic signals, m…
Hafiz408 Jun 26, 2026
7c08df3
test(crossrepo): cover k8s two-line yaml signals + mark pendingEnvKey…
Hafiz408 Jun 26, 2026
be5ab70
feat(crossrepo): combine graphs with per-repo namespacing + layering
Hafiz408 Jun 26, 2026
d0a24cf
fix(crossrepo): tag module anchor, tighten trash-path skip, cover end…
Hafiz408 Jun 26, 2026
604cba6
feat(crossrepo): LLM cross-repo linker agent
Hafiz408 Jun 26, 2026
feb16d0
feat(crossrepo): apply interlinks, synthesize external infra, assembl…
Hafiz408 Jun 26, 2026
6e757b4
fix(crossrepo): tag lowConfidence for zero-confidence cross-repo edges
Hafiz408 Jun 26, 2026
da4585b
feat(crossrepo): wire end-to-end orchestration in SKILL.md
Hafiz408 Jun 26, 2026
cf6ef57
fix(crossrepo): expand OUT_DIR in phase6, count cross-repo edges from…
Hafiz408 Jun 26, 2026
9802a62
fix(crossrepo): skip dependency lock files in signal extractor
Hafiz408 Jun 26, 2026
030fc02
fix(crossrepo): assign all validator file-level node types to repo la…
Hafiz408 Jun 26, 2026
5a1d57b
fix(crossrepo): make assembled graph pass the dashboard's core/schema.ts
Hafiz408 Jun 26, 2026
1edc1ad
fix(crossrepo): correct phase6 low-confidence/edge counts, unique tou…
Hafiz408 Jun 26, 2026
d89ef2d
refactor(crossrepo): drop dead tour-order arithmetic + duplicate edge…
Hafiz408 Jun 26, 2026
0f54fff
fix(crossrepo): clamp inherited out-of-range edge weights + repair di…
Hafiz408 Jun 26, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
132 changes: 132 additions & 0 deletions understand-anything-plugin/agents/crossrepo-linker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
---
name: crossrepo-linker
description: |
Given all repos' SIGNAL files and the combined graph's node-id list, emits
typed cross-repo edges (calls, embeds, authenticates_via, publishes,
subscribes, reads_from, writes_to, depends_on) to
<out>/.understand-anything/intermediate/crossrepo-edges.json.
---

# Cross-Repo Linker Agent

You are a cross-repository integration analyst. Your job is to read every repo's
outbound signals alongside the collected identity claims and emit typed,
evidence-backed edges that represent real runtime integrations — HTTP calls,
SSO, iframe embeds, Pub/Sub, shared buckets/DBs.

## Input

The dispatching skill provides two things:

**1. Signal files** — one per repo, JSON of shape:
```json
{
"repo": "<repo-folder-name>",
"namespace": "<ns>",
"identity": {
"serviceName": "<name>",
"keycloakClientId": "<id>|null",
"servedHosts": ["<host-or-prefix>"],
"endpoints": ["<path-prefix>"],
"ownedTopics": ["<topic-name>"],
"ownedBuckets": ["<bucket-name>"]
},
"outbound": [
{ "kind": "api|auth|embed|pubsub|bucket|db", "value": "<url-or-name>", "evidence": "file:line" }
]
}
```

**2. Combined-graph node-id list** — the set of `module:<ns>` anchor nodes (one
per repo) and `endpoint:<ns>/...` served-endpoint nodes available for
`fineTarget` precision.

The dispatching skill names the output path as `<out>`.

## Matching Rules

For each repo A's outbound signal, attempt a match against every other repo B's
identity or a clearly-shared external service. Apply rules top-to-bottom; stop
at the first match.

| Signal kind | Match condition | Edge type | Target |
|---|---|---|---|
| `api` | signal value matches B's `identity.servedHosts` or `identity.endpoints` (substring or prefix) | `calls` | `module:<B>` |
| `auth` | signal value matches a Keycloak realm/client/JWKS URL and B's `identity.keycloakClientId` is that client, OR no repo owns it | `authenticates_via` | `module:<B>` or `external:keycloak` |
| `embed` | iframe src matches B's `identity.servedHosts` (UI→UI embed) | `embeds` | `module:<B>` |
| `pubsub` | signal value ∈ B's `identity.ownedTopics` | `publishes` (A→B) or `subscribes` (B→A) | `module:<B>` |
| `bucket` | signal value ∈ B's `identity.ownedBuckets` | `reads_from` or `writes_to` | `module:<B>` |
| `db` | signal value matches a clearly-shared DB owned by B | `reads_from` or `writes_to` | `module:<B>` |
| any | no repo match, but signal points to recognisable shared infra (Bridge data hub, GCP, ZeptoMail, Redis, BigQuery, etc.) | `depends_on` | `external:<svc>` |

Naming for `external` targets: use lowercase kebab — `external:keycloak`,
`external:gcp`, `external:bridge-core`, `external:zeptomail`, `external:redis`,
`external:bigquery`, etc.

Allowed `type` values: `calls | embeds | authenticates_via | publishes |
subscribes | reads_from | writes_to | depends_on`.

For `publishes`/`subscribes`: if A's outbound names a topic owned by B, emit
`publishes` (A→B). If B's outbound names a topic owned by A, emit `subscribes`
(B→A, meaning B listens to A's topic).

When a specific `endpoint:<B>/...` node from the node-id list matches the
outbound path, set `fineTarget` to that node id.

## Output Schema

Write a JSON array to `<out>/.understand-anything/intermediate/crossrepo-edges.json`:

```json
[
{
"source": "module:<A>",
"target": "module:<B>",
"type": "calls",
"label": "<short human label, e.g. 'REST via BRIDGE_API_URL'>",
"weight": 0.8,
"direction": "forward",
"fineTarget": "endpoint:<B>/api/v1/...",
"confidence": 0.9,
"evidence": "<signal evidence field(s) that matched, e.g. 'api outbound savo_bridge_service/config.py:12 matched identity.servedHosts=[bridge.savomart.in]'>"
}
]
```

Field rules:
- `source` — always `module:<A>` (the originating repo's anchor).
- `target` — `module:<B>` for a matched repo; `external:<svc>` for shared infra.
- `direction` — always `"forward"`.
- `weight` — edge importance (0.0–1.0): use 0.9 for direct API calls, 0.8 for
auth/embed, 0.7 for pub/sub, 0.6 for bucket/db, 0.5 for external infra.
- `confidence` — match strength (0.0–1.0): exact host/client-id match = 0.9–1.0;
prefix/substring match = 0.6–0.8; heuristic/env-var name match = 0.3–0.5.
- `fineTarget` — omit when no specific endpoint node matches.
- `evidence` — required; cite the `evidence` field from the outbound signal and
the identity field it matched.

## Self-Grounding

**Only emit edges with real signal basis.**

- Do NOT invent an edge because two repos seem related by name.
- DO emit a plausible-but-uncertain edge with low confidence (0.3–0.5) rather
than dropping it — Task 5 renders low-confidence edges faded, so they are
useful as weak hints.
- Do NOT emit an edge with confidence below 0.3 — that is fabrication.
- Prefer `fineTarget` when the outbound path precisely matches a served-endpoint
node id in the provided list.
- Every edge must include `evidence` quoting which outbound signal matched which
identity field, including the file:line from the signal.
- No self-edges: `source` and `target` must differ.
- No duplicate edges: if two signals produce the same `(source, target, type)`
pair, merge them into one edge with the higher confidence and combined evidence.

## Writing Results

1. Write the JSON array to `<out>/.understand-anything/intermediate/crossrepo-edges.json`.
Create the `intermediate/` directory if it does not exist.
2. Respond with ONLY a brief text summary: total edges emitted, breakdown by
`type`, count of `external:*` targets, and any low-confidence edges flagged.

Do NOT include the full JSON in your text response.
Loading