Implement Library README Discovery, Backfilling, and Publishing (#242)#380
Open
SurbhiAgarwal1 wants to merge 4 commits intoopen-telemetry:mainfrom
Open
Implement Library README Discovery, Backfilling, and Publishing (#242)#380SurbhiAgarwal1 wants to merge 4 commits intoopen-telemetry:mainfrom
SurbhiAgarwal1 wants to merge 4 commits intoopen-telemetry:mainfrom
Conversation
…pen-telemetry#242) - Extend InventoryManager to discover and load library READMEs from registry - Augment instrumentation metadata with markdown_hash and enable backfilling - Implement markdown publishing to public data directory in DatabaseWriter - Add frontend types and API support for README lazy loading
✅ Deploy Preview for otel-ecosystem-explorer ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Contributor
There was a problem hiding this comment.
Pull request overview
Adds end-to-end support for per-library README markdown assets in the Java agent ecosystem: discovery from the registry, propagation/backfilling into instrumentation metadata, publishing to the explorer’s public data directory, and a frontend API hook to lazy-load the markdown.
Changes:
- Java instrumentation watcher: discover and load library README markdown assets from
library_readmes/. - Explorer DB builder: inject/backfill
markdown_hashinto library/custom items and publish markdown files topublic/data/javaagent/markdown/. - Frontend: extend Javaagent instrumentation types with
markdown_hashand add an API helper to fetch README markdown on demand.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| ecosystem-explorer/src/types/javaagent.ts | Adds markdown_hash to InstrumentationData for README association. |
| ecosystem-explorer/src/lib/api/javaagent-data.ts | Adds loadLibraryReadme() for lazy-loading published markdown assets. |
| ecosystem-automation/java-instrumentation-watcher/src/java_instrumentation_watcher/inventory_manager.py | Adds README map discovery + content loading + filename parsing for library_readmes/. |
| ecosystem-automation/java-instrumentation-watcher/src/java_instrumentation_watcher/init.py | Makes __version__ resilient when package metadata isn’t installed. |
| ecosystem-automation/explorer-db-builder/src/explorer_db_builder/metadata_backfiller.py | Backfills markdown_hash across versions. |
| ecosystem-automation/explorer-db-builder/src/explorer_db_builder/main.py | Publishes READMEs and augments inventories with markdown_hash before backfill/write. |
| ecosystem-automation/explorer-db-builder/src/explorer_db_builder/database_writer.py | Writes markdown assets to a content-addressed public directory with dedup. |
| ecosystem-automation/explorer-db-builder/src/explorer_db_builder/init.py | Makes __version__ resilient when package metadata isn’t installed. |
| ecosystem-automation/collector-watcher/src/collector_watcher/init.py | Makes __version__ resilient when package metadata isn’t installed. |
jaydeluca
reviewed
May 7, 2026
Member
jaydeluca
left a comment
There was a problem hiding this comment.
looks great! tested things locally and everything looks good. Just a few suggestions from myself and copilot related to some of the logic
Comment on lines
+134
to
+143
| readme_map = {} | ||
| for item in readme_dir.iterdir(): | ||
| if item.is_file() and item.suffix == ".md": | ||
| parsed = self._parse_readme_filename(item.name) | ||
| if parsed: | ||
| library_name, markdown_hash = parsed | ||
| readme_map[library_name] = markdown_hash | ||
| else: | ||
| logging.getLogger(__name__).warning(f"Malformed README filename in {version}: {item.name}") | ||
|
|
| Parse a README filename into (library_name, markdown_hash). | ||
| Format: {library-name}-{hash}.md | ||
| """ | ||
| match = re.match(r"^(.*)-([a-f0-9]+)\.md$", filename) |
| for version, readme_map in readme_maps.items(): | ||
| for library_name, markdown_hash in readme_map.items(): | ||
| content = inventory_manager.load_library_readme_content(version, library_name, markdown_hash) | ||
| if content: |
Comment on lines
+142
to
+165
| # Pre-load README maps for all versions to enable augmentation and backfilling | ||
| readme_maps = {v: inventory_manager.load_library_readme_map(v) for v in versions} | ||
|
|
||
| # Publish all READMEs to the database | ||
| for version, readme_map in readme_maps.items(): | ||
| for library_name, markdown_hash in readme_map.items(): | ||
| content = inventory_manager.load_library_readme_content(version, library_name, markdown_hash) | ||
| if content: | ||
| db_writer.write_markdown(library_name, markdown_hash, content) | ||
|
|
||
| def load_and_augment_inventory(version: Version) -> dict: | ||
| inventory = inventory_manager.load_versioned_inventory(version) | ||
| readme_map = readme_maps.get(version, {}) | ||
|
|
||
| # Augment libraries and custom instrumentations with markdown_hash | ||
| for key in ["libraries", "custom"]: | ||
| if key in inventory: | ||
| for item in inventory[key]: | ||
| name = item.get("name") | ||
| if name and name in readme_map: | ||
| item["markdown_hash"] = readme_map[name] | ||
|
|
||
| return inventory | ||
|
|
Comment on lines
+206
to
+232
| def write_markdown(self, library_name: str, markdown_hash: str, content: str) -> None: | ||
| """Write markdown file to the database. | ||
|
|
||
| Args: | ||
| library_name: Name of the library | ||
| markdown_hash: Hash of the markdown content | ||
| content: Markdown content string | ||
| """ | ||
| markdown_dir = self.database_dir / "markdown" | ||
| markdown_dir.mkdir(parents=True, exist_ok=True) | ||
| file_path = markdown_dir / f"{library_name}-{markdown_hash}.md" | ||
|
|
||
| if file_path.exists(): | ||
| logger.debug(f"Markdown for '{library_name}' with hash {markdown_hash} already exists, skipping write") | ||
| return | ||
|
|
||
| try: | ||
| with open(file_path, "w", encoding="utf-8") as f: | ||
| f.write(content) | ||
| file_size = len(content.encode("utf-8")) | ||
| self.files_written += 1 | ||
| self.total_bytes += file_size | ||
| logger.debug(f"Wrote markdown for '{library_name}' with hash {markdown_hash}") | ||
| except OSError as e: | ||
| logger.error(f"Failed to write markdown for '{library_name}': {e}") | ||
| # README publishing failures must never fail DB generation as per requirements | ||
|
|
Comment on lines
+84
to
+88
| const response = await fetch(`${BASE_PATH}/markdown/${libraryName}-${markdownHash}.md`); | ||
| if (!response.ok) { | ||
| throw new Error(`Failed to load README for ${libraryName}`); | ||
| } | ||
| return response.text(); |
| export async function loadLibraryReadme(libraryName: string, markdownHash: string): Promise<string> { | ||
| const response = await fetch(`${BASE_PATH}/markdown/${libraryName}-${markdownHash}.md`); | ||
| if (!response.ok) { | ||
| throw new Error(`Failed to load README for ${libraryName}`); |
- Implement deterministic README selection in InventoryManager (mtime + lexicographical).\n- Tighten README filename regex to enforce 12-char hashes.\n- Use explicit None checks for README content publishing.\n- Update loadLibraryReadme to use fetchWithCache with status reporting.\n- Add unit and integration tests for README publishing and backfilling.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implementation Detail: Library README Support (#242)
This document provides a technical breakdown of the changes implemented to support library README markdown files in the OpenTelemetry Ecosystem Explorer.
1. Registry Discovery & Extraction
Component:
java-instrumentation-watcherFile:
inventory_manager.pyThe
InventoryManagerwas enhanced to handle thelibrary_readmes/directory within the versioned registry.apache-httpclient-4.3-a3bae406cfcf.mdare parsed usingr"^(.*)-([a-f0-9]+)\.md$"to separate the library name from its content hash.load_library_readme_map(version)which caches this relationship, allowing the database builder to quickly correlate instrumentations to their READMEs.2. Metadata Augmentation & Backfilling
Component:
explorer-db-builderFiles:
main.py,metadata_backfiller.pyWe integrated README support into the existing content-addressed metadata system.
main.pyorchestrator augments bothlibrariesandcustominstrumentation metadata with themarkdown_hashduring the initial inventory load.markdown_hashtoBACKFILLABLE_FIELDS, the system ensures that older release versions (where the README might be missing in the registry) correctly inherit the link from newer versions if the library name matches.3. Database Writing & Publishing
Component:
explorer-db-builderFile:
database_writer.pyThe
DatabaseWritermanages the physical transfer of assets to the explorer's public storage./public/data/javaagent/markdown/{library-name}-{hash}.md.4. Frontend Integration
Component:
ecosystem-explorerFiles:
src/types/javaagent.ts,src/lib/api/javaagent-data.tsPrepared the React frontend for content rendering:
markdown_hash?: stringto theInstrumentationDatainterface.loadLibraryReadme(libraryName, markdownHash)to fetch markdown content from the public directory on demand.5. Build System & DX Improvements
Component: Multiple Watchers
Files:
__init__.pyModified the initialization logic to handle
PackageNotFoundError. This allows developers to run the builder and tests directly from source in uninstalled environments (common in dev/CI containers).6. Verification Summary
pytest ecosystem-automation/).ruff.v2.26.1correctly backfilled README links fromv2.27.0and that thepublic/datadirectory contains the correct assets.