Skip to content

Implement Library README Discovery, Backfilling, and Publishing (#242)#380

Open
SurbhiAgarwal1 wants to merge 4 commits intoopen-telemetry:mainfrom
SurbhiAgarwal1:feat/242-library-readme-support
Open

Implement Library README Discovery, Backfilling, and Publishing (#242)#380
SurbhiAgarwal1 wants to merge 4 commits intoopen-telemetry:mainfrom
SurbhiAgarwal1:feat/242-library-readme-support

Conversation

@SurbhiAgarwal1
Copy link
Copy Markdown

Implementation Detail: Library README Support (#242)

This document provides a technical breakdown of the changes implemented to support library README markdown files in the OpenTelemetry Ecosystem Explorer.

1. Registry Discovery & Extraction

Component: java-instrumentation-watcher
File: inventory_manager.py

The InventoryManager was enhanced to handle the library_readmes/ directory within the versioned registry.

  • Regex Parsing: Filenames like apache-httpclient-4.3-a3bae406cfcf.md are parsed using r"^(.*)-([a-f0-9]+)\.md$" to separate the library name from its content hash.
  • O(1) Map: Added load_library_readme_map(version) which caches this relationship, allowing the database builder to quickly correlate instrumentations to their READMEs.
  • Resilience: The loader handles missing directories and malformed files gracefully, logging warnings without interrupting the pipeline.

2. Metadata Augmentation & Backfilling

Component: explorer-db-builder
Files: main.py, metadata_backfiller.py

We integrated README support into the existing content-addressed metadata system.

  • Hash Injection: The main.py orchestrator augments both libraries and custom instrumentation metadata with the markdown_hash during the initial inventory load.
  • Backfill Propagation: By adding markdown_hash to BACKFILLABLE_FIELDS, the system ensures that older release versions (where the README might be missing in the registry) correctly inherit the link from newer versions if the library name matches.
  • Transformation Stability: Verified that the new field is preserved across all format transformations (0.1 through 0.5).

3. Database Writing & Publishing

Component: explorer-db-builder
File: database_writer.py

The DatabaseWriter manages the physical transfer of assets to the explorer's public storage.

  • Storage Path: /public/data/javaagent/markdown/{library-name}-{hash}.md.
  • Deduplication: Implemented an existence check to prevent redundant writes of identical content across different versions or libraries.
  • Error Handling: Failures in markdown publishing are caught and logged, ensuring that database generation is never blocked by individual file errors.

4. Frontend Integration

Component: ecosystem-explorer
Files: src/types/javaagent.ts, src/lib/api/javaagent-data.ts

Prepared the React frontend for content rendering:

  • Type Definition: Added markdown_hash?: string to the InstrumentationData interface.
  • Lazy Loading: Implemented loadLibraryReadme(libraryName, markdownHash) to fetch markdown content from the public directory on demand.

5. Build System & DX Improvements

Component: Multiple Watchers
Files: __init__.py

Modified the initialization logic to handle PackageNotFoundError. This allows developers to run the builder and tests directly from source in uninstalled environments (common in dev/CI containers).

6. Verification Summary

  • Tests: 309/309 passed (pytest ecosystem-automation/).
  • Linting: All modified files formatted and verified with ruff.
  • E2E Validation: Confirmed that v2.26.1 correctly backfilled README links from v2.27.0 and that the public/data directory contains the correct assets.

…pen-telemetry#242)

- Extend InventoryManager to discover and load library READMEs from registry
- Augment instrumentation metadata with markdown_hash and enable backfilling
- Implement markdown publishing to public data directory in DatabaseWriter
- Add frontend types and API support for README lazy loading
@SurbhiAgarwal1 SurbhiAgarwal1 requested review from a team as code owners May 7, 2026 02:41
@netlify
Copy link
Copy Markdown

netlify Bot commented May 7, 2026

Deploy Preview for otel-ecosystem-explorer ready!

Name Link
🔨 Latest commit 0e712c9
🔍 Latest deploy log https://app.netlify.com/projects/otel-ecosystem-explorer/deploys/69fea5cf3ec86f0008c0604c
😎 Deploy Preview https://deploy-preview-380--otel-ecosystem-explorer.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented May 7, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds end-to-end support for per-library README markdown assets in the Java agent ecosystem: discovery from the registry, propagation/backfilling into instrumentation metadata, publishing to the explorer’s public data directory, and a frontend API hook to lazy-load the markdown.

Changes:

  • Java instrumentation watcher: discover and load library README markdown assets from library_readmes/.
  • Explorer DB builder: inject/backfill markdown_hash into library/custom items and publish markdown files to public/data/javaagent/markdown/.
  • Frontend: extend Javaagent instrumentation types with markdown_hash and add an API helper to fetch README markdown on demand.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
ecosystem-explorer/src/types/javaagent.ts Adds markdown_hash to InstrumentationData for README association.
ecosystem-explorer/src/lib/api/javaagent-data.ts Adds loadLibraryReadme() for lazy-loading published markdown assets.
ecosystem-automation/java-instrumentation-watcher/src/java_instrumentation_watcher/inventory_manager.py Adds README map discovery + content loading + filename parsing for library_readmes/.
ecosystem-automation/java-instrumentation-watcher/src/java_instrumentation_watcher/init.py Makes __version__ resilient when package metadata isn’t installed.
ecosystem-automation/explorer-db-builder/src/explorer_db_builder/metadata_backfiller.py Backfills markdown_hash across versions.
ecosystem-automation/explorer-db-builder/src/explorer_db_builder/main.py Publishes READMEs and augments inventories with markdown_hash before backfill/write.
ecosystem-automation/explorer-db-builder/src/explorer_db_builder/database_writer.py Writes markdown assets to a content-addressed public directory with dedup.
ecosystem-automation/explorer-db-builder/src/explorer_db_builder/init.py Makes __version__ resilient when package metadata isn’t installed.
ecosystem-automation/collector-watcher/src/collector_watcher/init.py Makes __version__ resilient when package metadata isn’t installed.

Comment thread ecosystem-automation/explorer-db-builder/src/explorer_db_builder/main.py Outdated
Comment thread ecosystem-explorer/src/lib/api/javaagent-data.ts Outdated
Comment thread ecosystem-explorer/src/lib/api/javaagent-data.ts
Copy link
Copy Markdown
Member

@jaydeluca jaydeluca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great! tested things locally and everything looks good. Just a few suggestions from myself and copilot related to some of the logic

Comment thread ecosystem-explorer/src/lib/api/javaagent-data.ts Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.

Comment on lines +134 to +143
readme_map = {}
for item in readme_dir.iterdir():
if item.is_file() and item.suffix == ".md":
parsed = self._parse_readme_filename(item.name)
if parsed:
library_name, markdown_hash = parsed
readme_map[library_name] = markdown_hash
else:
logging.getLogger(__name__).warning(f"Malformed README filename in {version}: {item.name}")

Parse a README filename into (library_name, markdown_hash).
Format: {library-name}-{hash}.md
"""
match = re.match(r"^(.*)-([a-f0-9]+)\.md$", filename)
for version, readme_map in readme_maps.items():
for library_name, markdown_hash in readme_map.items():
content = inventory_manager.load_library_readme_content(version, library_name, markdown_hash)
if content:
Comment on lines +142 to +165
# Pre-load README maps for all versions to enable augmentation and backfilling
readme_maps = {v: inventory_manager.load_library_readme_map(v) for v in versions}

# Publish all READMEs to the database
for version, readme_map in readme_maps.items():
for library_name, markdown_hash in readme_map.items():
content = inventory_manager.load_library_readme_content(version, library_name, markdown_hash)
if content:
db_writer.write_markdown(library_name, markdown_hash, content)

def load_and_augment_inventory(version: Version) -> dict:
inventory = inventory_manager.load_versioned_inventory(version)
readme_map = readme_maps.get(version, {})

# Augment libraries and custom instrumentations with markdown_hash
for key in ["libraries", "custom"]:
if key in inventory:
for item in inventory[key]:
name = item.get("name")
if name and name in readme_map:
item["markdown_hash"] = readme_map[name]

return inventory

Comment on lines +206 to +232
def write_markdown(self, library_name: str, markdown_hash: str, content: str) -> None:
"""Write markdown file to the database.

Args:
library_name: Name of the library
markdown_hash: Hash of the markdown content
content: Markdown content string
"""
markdown_dir = self.database_dir / "markdown"
markdown_dir.mkdir(parents=True, exist_ok=True)
file_path = markdown_dir / f"{library_name}-{markdown_hash}.md"

if file_path.exists():
logger.debug(f"Markdown for '{library_name}' with hash {markdown_hash} already exists, skipping write")
return

try:
with open(file_path, "w", encoding="utf-8") as f:
f.write(content)
file_size = len(content.encode("utf-8"))
self.files_written += 1
self.total_bytes += file_size
logger.debug(f"Wrote markdown for '{library_name}' with hash {markdown_hash}")
except OSError as e:
logger.error(f"Failed to write markdown for '{library_name}': {e}")
# README publishing failures must never fail DB generation as per requirements

Comment on lines +84 to +88
const response = await fetch(`${BASE_PATH}/markdown/${libraryName}-${markdownHash}.md`);
if (!response.ok) {
throw new Error(`Failed to load README for ${libraryName}`);
}
return response.text();
export async function loadLibraryReadme(libraryName: string, markdownHash: string): Promise<string> {
const response = await fetch(`${BASE_PATH}/markdown/${libraryName}-${markdownHash}.md`);
if (!response.ok) {
throw new Error(`Failed to load README for ${libraryName}`);
- Implement deterministic README selection in InventoryManager (mtime + lexicographical).\n- Tighten README filename regex to enforce 12-char hashes.\n- Use explicit None checks for README content publishing.\n- Update loadLibraryReadme to use fetchWithCache with status reporting.\n- Add unit and integration tests for README publishing and backfilling.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants