Skip to content

Commit 6de1039

Browse files
authored
Merge pull request #77 from quotient-ai/antaripa/docs-tools
Added tools -> reports and detection
2 parents 2afeb70 + c155096 commit 6de1039

File tree

9 files changed

+420
-8
lines changed

9 files changed

+420
-8
lines changed

docs.json

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,8 +52,25 @@
5252
{
5353
"group": "Tools",
5454
"pages": [
55-
"tools/detections",
56-
"tools/reports"
55+
{
56+
"group": "Detections",
57+
"icon": "ghost",
58+
"pages": [
59+
"tools/detections/overview",
60+
"tools/detections/hallucination-detection",
61+
"tools/detections/document-relevance",
62+
"tools/detections/polling-and-results-api"
63+
]
64+
},
65+
{
66+
"group": "Reports",
67+
"icon": "clipboard",
68+
"pages": [
69+
"tools/reports/overview",
70+
"tools/reports/integration",
71+
"tools/reports/when-to-use"
72+
]
73+
}
5774
]
5875
},
5976
{
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
---
2+
title: "Document Relevance"
3+
description: "Measure whether retrieved documents actually support user queries."
4+
icon: "bug"
5+
---
6+
7+
# What is document relevance?
8+
9+
**Document relevance** measures how well your retrieval or search system finds context that is genuinely useful for answering the user's query. A document is considered **relevant** if it contains information that addresses at least one part of the query. Otherwise, it is marked **irrelevant**.
10+
11+
The document relevance score is calculated as the fraction of documents that are relevant to the query.
12+
13+
# How Quotient scores relevance
14+
15+
1. Compare each document (or chunk) against the full `user_query`.
16+
2. Determine whether the document contains information relevant to any part of the query:
17+
- If it does, mark it `relevant`.
18+
- If it does not, mark it `irrelevant`.
19+
3. Compute `relevant_documents / total_documents` to derive the overall score.
20+
21+
## What influences the score
22+
23+
- **Chunk granularity:** smaller chunks make it easier to mark only the useful passages as relevant.
24+
- **Query clarity:** ambiguous prompts can lower relevancy; capture clarifying follow-ups in `message_history`.
25+
- **Retriever filters:** tag each log with retriever configuration so you can compare performance across setups.
26+
27+
## Why track document relevance?
28+
29+
Document relevance is a core metric for evaluating retrieval-augmented systems. Even if the AI generates well, weak retrieval can degrade the final answer. Monitoring this metric helps teams:
30+
31+
- Assess whether retrieval surfaces useful context.
32+
- Debug cases where generation fails despite solid prompting.
33+
- Improve recall and precision of retrievers.
34+
- Watch for drift after retriever or data changes.
35+
36+
<Tip>
37+
A sudden dip in relevancy is often the earliest warning that embeddings, indexing, or filters changed. Alert on sustained drops before they cascade into hallucinations.
38+
</Tip>
39+
40+
<Tip>
41+
High-performing systems typically show \> 75% document relevance. Lower scores may signal ambiguous user queries, incorrect retrieval, or noisy source data.
42+
</Tip>
43+
44+
Next: [Polling & Results API](./polling-and-results-api).
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
---
2+
title: "Hallucination Detection"
3+
description: "Understand how Quotient identifies hallucinations and why the metric matters."
4+
icon: "flag"
5+
---
6+
7+
# What counts as a hallucination?
8+
9+
The **hallucination rate** measures how often a model generates information that cannot be found in its provided inputs, such as retrieved documents, user messages, or system prompts.
10+
11+
Quotient reports an **extrinsic hallucination rate**. We determine whether the model's output is externally unsupported by the context it was given.
12+
13+
<Accordion title="What is an Extrinsic Hallucination?">
14+
<Tip>
15+
Extrinsic hallucinations occur when a model generates content that is not backed by any input. This is distinct from **intrinsic hallucinations**, where the model generates text that is self-contradictory or logically incoherent regardless of the input.
16+
17+
We focus on **extrinsic** hallucination detection because this is what matters most in retrieval-augmented systems: **does the model stick to the facts it was given?**\
18+
\
19+
Refer to [How to Detect Hallucinations in Retrieval Augmented Systems: A Primer](https://blog.quotientai.co/how-to-detect-hallucinations-in-retrieval-augmented-systems-a-primer/) for an in-depth overview of hallucinations in augmented AI systems.
20+
</Tip>
21+
</Accordion>
22+
23+
# How Quotient detects hallucinations
24+
25+
1. **Segment the output** into atomic claims or sentences.
26+
2. **Cross-check every claim** against all available context:
27+
- `user_query` (what the user asked)
28+
- `documents` (retrieved evidence)
29+
- `message_history` (prior conversation turns)
30+
- `instructions` (system or developer guidance)
31+
3. **Flag unsupported claims** when no context backs them up.
32+
33+
If a sentence cannot be traced back to any provided evidence, it is marked as a hallucination.
34+
35+
## Inputs that improve detection quality
36+
37+
- **High-signal documents:** include only the evidence actually retrieved for the answer.
38+
- **Conversation history:** pass the full multi-turn exchange so references to earlier turns can be validated.
39+
- **Instructions:** provide system prompts so the detection pass understands guardrails and policies.
40+
41+
## Interpret hallucination results
42+
43+
- **`has_hallucination`**: Boolean flag indicating whether we found any unsupported claims.
44+
- **Highlighted spans**: In the dashboard, statements are color-coded to show what lacked support.
45+
- **Tag filters**: Slice hallucination rate by model, feature, or customer to prioritize remediation.
46+
47+
<Tip>
48+
Pair hallucination detection with assertions or automated tests when shipping prompt updates. A sudden spike often signals a regression in retrieval or guardrails.
49+
</Tip>
50+
51+
# Why monitor hallucinations?
52+
53+
Extrinsic hallucinations are a primary failure mode in augmented AI systems. Even when retrieval succeeds, generation can drift. Tracking this metric helps teams:
54+
55+
- Catch hallucinations early in development.
56+
- Monitor output quality after deployment.
57+
- Guide prompt iteration and model fine-tuning.
58+
59+
<Tip>
60+
Well-grounded systems typically show \< 5% hallucination rate. If yours is higher, it's often a signal that your data ingestion, retrieval pipeline, or prompting needs improvement.
61+
</Tip>
62+
63+
Next: [Document Relevance](./document-relevance).

tools/detections/overview.mdx

Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
---
2+
title: "Overview"
3+
description: "Understand how Quotient detections work and how to enable them in your logging pipeline."
4+
icon: "eye"
5+
---
6+
7+
<CardGroup>
8+
<Card title="Initialize the Logger" icon="chevron-right" href="#initialize-the-logger-with-detections">
9+
Configure detection types and sampling without leaving this page.
10+
</Card>
11+
<Card title="Hallucination Detection" icon="flag" href="/tools/detections/hallucination-detection">
12+
See how Quotient scores extrinsic hallucinations.
13+
</Card>
14+
<Card title="Document Relevance" icon="bug" href="/tools/detections/document-relevance">
15+
Measure whether retrieved documents support an answer.
16+
</Card>
17+
<Card title="Polling & Results" icon="clock" href="/tools/detections/polling-and-results-api">
18+
Retrieve detection results via the SDK.
19+
</Card>
20+
</CardGroup>
21+
22+
# What are Detections?
23+
24+
Detections are asynchronous analyses that run whenever you ship logs or traces to Quotient. They continuously score outputs for hallucinations, document relevance, and other reliability risks so you can intervene before they impact users.
25+
26+
Once configured, detections execute in the background. You can review outcomes in the dashboard or poll for them programmatically.
27+
28+
## Why enable detections
29+
30+
- **Catch issues fast:** surface hallucinations or irrelevant context without manually reviewing transcripts.
31+
- **Quantify reliability:** trend hallucination rate and document relevance over time or by tag.
32+
- **Prioritize fixes:** combine detection scores with tags (model version, customer tier) to see where to invest engineering time.
33+
34+
<Tip>
35+
Keep `detection_sample_rate` high during development to observe every interaction. Dial it down in production once metrics stabilize.
36+
</Tip>
37+
38+
## Configure detections in three steps
39+
40+
1. **Initialize the logger** with the detection types and sample rate that make sense for your workload.
41+
2. **Send logs or traces** that include the user prompt, model output, and supporting evidence.
42+
3. **Review the results** in the dashboard or via the SDK once detections finish processing.
43+
44+
# Initialize the Logger with Detections
45+
46+
Enable detections during logger initialization:
47+
48+
<CodeGroup>
49+
50+
```python logging.py
51+
from quotientai import QuotientAI, DetectionType
52+
53+
quotient = QuotientAI(api_key="your-quotient-api-key")
54+
55+
logger = quotient.logger.init(
56+
app_name="my-first-app",
57+
environment="dev",
58+
sample_rate=1.0,
59+
# automatically run hallucination and document relevance detection on every output
60+
detections=[DetectionType.HALLUCINATION, DetectionType.DOCUMENT_RELEVANCY],
61+
detection_sample_rate=1.0,
62+
)
63+
```
64+
65+
```typescript logging.ts
66+
import { QuotientAI, DetectionType } from "quotientai";
67+
68+
const quotient = new QuotientAI({ apiKey: "your-quotient-api-key" });
69+
70+
const logger = quotient.logger.init({
71+
appName: "my-first-app",
72+
environment: "dev",
73+
sampleRate: 1.0,
74+
// automatically run hallucination and document relevance detection on every output
75+
detections: [DetectionType.HALLUCINATION, DetectionType.DOCUMENT_RELEVANCY],
76+
detectionSampleRate: 1.0,
77+
});
78+
```
79+
80+
</CodeGroup>
81+
82+
# Send logs with detections enabled
83+
84+
After initialization, send logs that include the user query, model output, and any documents, instructions, or message history you want Quotient to evaluate.
85+
86+
<CodeGroup>
87+
88+
```python logging.py
89+
log_id = quotient.log(
90+
user_query="What is the capital of France?",
91+
model_output="The capital of France is Paris.",
92+
documents=[
93+
"France is a country in Western Europe.",
94+
"Paris is the capital of France.",
95+
],
96+
)
97+
```
98+
99+
```typescript logging.ts
100+
const logId = await quotient.log({
101+
userQuery: "What is the capital of France?",
102+
modelOutput: "The capital of France is Paris.",
103+
documents: ["France is a country in Western Europe.", "Paris is the capital of France."],
104+
});
105+
```
106+
107+
</CodeGroup>
108+
109+
# Interpret detection outcomes
110+
111+
Each detection result is attached to the originating log. In the dashboard you can:
112+
113+
- Inspect hallucination highlights and see which sentences lack evidence.
114+
- Review document relevance scores to spot noisy retrieval results.
115+
- Filter by tags (environment, customer, model) to zero in on problematic slices.
116+
117+
Head to the [Detections Dashboard](https://app.quotientai.co/detections) to review results, export findings, or share links with teammates.
118+
119+
<Tip>
120+
Combine detections with [Reports](/tools/reports) to move from single-log triage to trend analysis.
121+
</Tip>
122+
123+
![Detections Dashboard](../../assets/detections-screenshot.png)
124+
125+
Continue with [Hallucination Detection](./hallucination-detection).
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
---
2+
title: "Polling & Results API"
3+
description: "Retrieve detection outcomes through the Quotient SDK."
4+
icon: "clock"
5+
---
6+
7+
# Poll for detections via the SDK
8+
9+
Use the polling helpers when you want to block until detections finish so you can act on the results immediately (e.g., inside an evaluation harness or CI job).
10+
11+
<CodeGroup>
12+
13+
```python logging.py
14+
detection = quotient.poll_for_detection(log_id=log_id)
15+
```
16+
17+
```typescript logging.ts
18+
const detection = await quotient.pollForDetections(logId);
19+
```
20+
21+
</CodeGroup>
22+
23+
# Parameters
24+
25+
- `log_id` **(string)**: Identifier of the log you want to poll for detections.
26+
- `timeout` **(int)**: Maximum time to wait for a response in seconds. Defaults to `300`.
27+
- `poll_interval` **(float)**: Interval between checks in seconds. Defaults to `2.0`.
28+
29+
## Return value
30+
31+
`poll_for_detection` returns a `Log` object with these notable fields:
32+
33+
- `id` **(string)**: Unique identifier for the log.
34+
- `app_name` **(string)**: Application that generated the log.
35+
- `environment` **(string)**: Deployment environment (e.g., `dev`, `prod`).
36+
- `detections` **(array)**: Detection types configured for this log.
37+
- `detection_sample_rate` **(float)**: Sample rate applied for detections on this log.
38+
- `user_query` **(string)**: Logged user input.
39+
- `model_output` **(string)**: Logged model output.
40+
- `documents` **(array)**: Context documents used for the detection run.
41+
- `message_history` **(array)**: Prior messages following the OpenAI format.
42+
- `instructions` **(array)**: Instructions provided to the model.
43+
- `tags` **(object)**: Metadata associated with the log entry.
44+
- `created_at` **(datetime)**: Timestamp when the log was created.
45+
- `status` **(string)**: Current status of the log entry.
46+
- `has_hallucination` **(boolean)**: Whether hallucinations were detected.
47+
- `doc_relevancy_average` **(float)**: Average document relevancy score.
48+
- `updated_at` **(datetime)**: Timestamp when the log was last updated.
49+
- `evaluations` **(array)**: Evaluation results attached to the log.
50+
51+
## Example workflow
52+
53+
1. Log an interaction with detections enabled.
54+
2. Call the polling helper and wait for the promise/function to resolve.
55+
3. Inspect the returned `Log` payload for `has_hallucination` or `doc_relevancy_average` before deciding whether to alert, retry, or proceed.
56+
57+
<Tip>
58+
In long-running jobs, increase `timeout` or handle the timeout exception so you can fall back to asynchronous processing instead of failing the entire workflow.
59+
</Tip>
60+
61+
Back to the [Detections overview](./overview).

tools/reports.mdx

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@ They give you a daily snapshot of your users’ queries, clustered by topic and
1515

1616
Reports are automatically generated based on the logs and traces you send to Quotient. Reports are available once 100+ [detections](/tools/detections) have been generated within a 30 day window. You can find more information on how to send logs and traces below:
1717

18-
- [Logs](/data-collection/logs)
19-
- [Traces](/data-collection/traces)
18+
- [Logs](/core-functionalities/logs/overview)
19+
- [Traces](/core-functionalities/traces/overview)
2020

2121
![Reports](../assets/report-overview.png)
2222

@@ -31,7 +31,3 @@ The Reports feature is particularly valuable for:
3131
- **Query Optimization**: Find patterns in queries that lead to poor results
3232
- **Resource Allocation**: Focus improvements on high-volume or high-risk areas
3333
- **Trend Monitoring**: Track how system performance changes over time
34-
35-
36-
37-

tools/reports/integration.mdx

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
---
2+
title: "Integration"
3+
description: "Learn how reports are generated from your logs, traces, and detections."
4+
icon: "handshake"
5+
---
6+
7+
# How to Integrate Reports
8+
9+
Reports are automatically generated based on the logs and traces you send to Quotient. They become available once 100+ [detections](/tools/detections) have been generated within a 30-day window.
10+
11+
## Prerequisites
12+
13+
- **Consistent logging**: Send structured logs with `user_query`, `model_output`, and evidence so detections can run.
14+
- **Detections enabled**: Hallucination and document relevance detections provide the quality signals that power report scoring.
15+
- **Tag hygiene**: Attach tags such as `model`, `customer`, or `feature` to slice reports by meaningful segments.
16+
17+
## Data pipeline at a glance
18+
19+
1. Your application emits logs and traces through the Quotient SDKs.
20+
2. Detections execute asynchronously on each record.
21+
3. Reports aggregate detections and metadata into daily clusters with trend charts.
22+
4. You review the dashboard (or export via API) to plan remediation.
23+
24+
## Best practices
25+
26+
- Keep `detection_sample_rate` high enough to capture statistically significant coverage for each segment you care about.
27+
- Align tags with your roadmap—if you track `model_version` or `retriever`, you can measure the impact of each launch.
28+
- Review reports alongside [Logs](/core-functionalities/logs/overview) and [Traces](/core-functionalities/traces/overview) to trace issues from cluster to underlying interaction.
29+
30+
<Tip>
31+
If your traffic is bursty, consider uploading a curated evaluation set to quickly hit the 100-detection threshold and unlock reports before production volume ramps.
32+
</Tip>
33+
34+
Back to the [Reports overview](./overview) or continue to [When to Use Reports](./when-to-use).

0 commit comments

Comments
 (0)