feat(signal): add media send methods and disk-based attachment fallback#3704
feat(signal): add media send methods and disk-based attachment fallback#3704gibbsoft wants to merge 1 commit intoNousResearch:mainfrom
Conversation
Add send_image_file, send_voice, send_video overrides that route through send_document for native Signal media attachments. Add disk-based attachment reading from signal-cli attachments directory as primary method, falling back to getAttachment RPC. Improves reliability when RPC returns errors.
There was a problem hiding this comment.
Pull request overview
This PR enhances the Signal platform adapter to support native media attachments for outbound messages and improves inbound attachment retrieval by preferring on-disk signal-cli attachment files with an RPC fallback.
Changes:
- Add
send_image_file,send_voice, andsend_videooverrides that route throughsend_documentfor attachment-based delivery. - Update
_fetch_attachmentto attempt reading from~/.local/share/signal-cli/attachments/before falling back togetAttachmentJSON-RPC. - Add debug/warning logging around attachment processing and failure modes.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| att_dir = os.path.expanduser("~/.local/share/signal-cli/attachments/") | ||
| matches = glob.glob(os.path.join(att_dir, f"{attachment_id}*")) | ||
| if matches: | ||
| disk_path = matches[0] | ||
| try: | ||
| with open(disk_path, "rb") as f: | ||
| raw_data = f.read() |
There was a problem hiding this comment.
Disk-based attachment loading reads the entire file without validating its on-disk size. This bypasses the earlier att_size check (which might be missing/incorrect) and allows unexpectedly large reads; check the file size with stat() and enforce SIGNAL_MAX_ATTACHMENT_SIZE before reading.
| result = result.get("data") | ||
| if not result: | ||
| logger.warning("Signal: attachment response missing 'data' key") | ||
| logger.warning("Signal: failed to fetch attachment %s - got null path", attachment_id) |
There was a problem hiding this comment.
The warning message "got null path" is misleading here: getAttachment returns base64 data (or a dict with a data field), not a file path. Consider rewording to reflect an empty/None RPC result so logs are actionable.
| logger.warning("Signal: failed to fetch attachment %s - got null path", attachment_id) | |
| logger.warning("Signal: failed to fetch attachment %s - got empty RPC result (no attachment data)", attachment_id) |
| matches = glob.glob(os.path.join(att_dir, f"{attachment_id}*")) | ||
| if matches: | ||
| disk_path = matches[0] |
There was a problem hiding this comment.
Using glob with an unvalidated attachment_id and the pattern f"{attachment_id}*" can match multiple files and the code picks matches[0] (non-deterministic ordering). Consider validating attachment_id (e.g., disallow path separators/glob metacharacters) and selecting the best match deterministically (exact match first, else newest by mtime).
| matches = glob.glob(os.path.join(att_dir, f"{attachment_id}*")) | |
| if matches: | |
| disk_path = matches[0] | |
| # Validate attachment_id before using it in a glob pattern to avoid | |
| # unintended glob metacharacters or path traversal. | |
| if re.fullmatch(r"[A-Za-z0-9._-]+", attachment_id): | |
| pattern = os.path.join(att_dir, f"{attachment_id}*") | |
| matches = glob.glob(pattern) | |
| else: | |
| matches = [] | |
| selected_path = None | |
| if matches: | |
| # Prefer an exact filename match (basename without extension equals attachment_id) | |
| exact_matches = [] | |
| for m in matches: | |
| base = os.path.basename(m) | |
| name_without_ext, _ = os.path.splitext(base) | |
| if name_without_ext == attachment_id: | |
| exact_matches.append(m) | |
| if exact_matches: | |
| # Choose deterministically among exact matches | |
| selected_path = sorted(exact_matches)[0] | |
| else: | |
| # Fallback: choose the newest matching file by modification time | |
| try: | |
| selected_path = max(matches, key=os.path.getmtime) | |
| except (OSError, ValueError): | |
| selected_path = None | |
| if selected_path: | |
| disk_path = selected_path |
| async def send_image_file( | ||
| self, | ||
| chat_id: str, | ||
| image_path: str, | ||
| caption: Optional[str] = None, | ||
| **kwargs, | ||
| ) -> SendResult: | ||
| """Send a local image file as an attachment.""" | ||
| return await self.send_document(chat_id, image_path, caption) | ||
|
|
||
| async def send_voice( | ||
| self, | ||
| chat_id: str, | ||
| audio_path: str, | ||
| caption: Optional[str] = None, | ||
| **kwargs, | ||
| ) -> SendResult: | ||
| """Send an audio file as an attachment.""" | ||
| return await self.send_document(chat_id, audio_path, caption) | ||
|
|
||
| async def send_video( | ||
| self, | ||
| chat_id: str, | ||
| video_path: str, | ||
| caption: Optional[str] = None, | ||
| **kwargs, | ||
| ) -> SendResult: | ||
| """Send a video file as an attachment.""" | ||
| return await self.send_document(chat_id, video_path, caption) |
There was a problem hiding this comment.
send_image_file/send_voice/send_video accept **kwargs but don't forward them to send_document. This drops any caller-provided options (e.g., reply_to, metadata, file_name) and is inconsistent with other adapters; pass **kwargs through to send_document to preserve the base adapter contract.
| async def _fetch_attachment(self, attachment_id: str) -> tuple: | ||
| """Fetch an attachment via JSON-RPC and cache it. Returns (path, ext).""" | ||
| result = await self._rpc("getAttachment", { | ||
| "account": self.account, | ||
| "id": attachment_id, | ||
| }) | ||
| """Fetch an attachment via JSON-RPC and cache it. Returns (path, ext). | ||
|
|
||
| Tries reading from the signal-cli attachments directory first, | ||
| falling back to the getAttachment RPC call. | ||
| """ | ||
| raw_data = None | ||
|
|
||
| # Try disk-based read first (signal-cli stores attachments locally) | ||
| att_dir = os.path.expanduser("~/.local/share/signal-cli/attachments/") | ||
| matches = glob.glob(os.path.join(att_dir, f"{attachment_id}*")) | ||
| if matches: | ||
| disk_path = matches[0] | ||
| try: | ||
| with open(disk_path, "rb") as f: | ||
| raw_data = f.read() | ||
| logger.debug("Signal: read attachment %s from disk (%d bytes)", attachment_id, len(raw_data)) | ||
| except Exception as e: | ||
| logger.warning("Signal: failed to read attachment from disk %s: %s", disk_path, e) | ||
| raw_data = None | ||
|
|
||
| if not result: | ||
| return None, "" | ||
| # Fall back to JSON-RPC if disk read didn't work | ||
| if raw_data is None: | ||
| result = await self._rpc("getAttachment", { | ||
| "account": self.account, | ||
| "id": attachment_id, | ||
| }) |
There was a problem hiding this comment.
The new disk-first attachment branch in _fetch_attachment() isn't covered by tests. Since tests/gateway/test_signal.py already covers the RPC path, add a test that stubs glob/open to return bytes and asserts rpc("getAttachment") is not called (and that the correct cache* helper is used).
| att_dir = os.path.expanduser("~/.local/share/signal-cli/attachments/") | ||
| matches = glob.glob(os.path.join(att_dir, f"{attachment_id}*")) | ||
| if matches: | ||
| disk_path = matches[0] | ||
| try: | ||
| with open(disk_path, "rb") as f: | ||
| raw_data = f.read() | ||
| logger.debug("Signal: read attachment %s from disk (%d bytes)", attachment_id, len(raw_data)) |
There was a problem hiding this comment.
_fetch_attachment() does synchronous disk I/O (open/read) inside an async method. For large attachments (up to 100MB) this can block the event loop; offload the file read to a thread (e.g., via asyncio.to_thread) or use an async file reader so inbound SSE processing stays responsive.
Summary
send_image_file,send_voice,send_videooverrides to Signal adapter, routing throughsend_documentfor native media attachments~/.local/share/signal-cli/attachments/as primary method, falling back togetAttachmentRPCMotivation
Signal adapter currently falls back to sending file paths as text (e.g.
🖼️ Image: /path/to/file) because the base class media methods aren't overridden. ThegetAttachmentRPC can also fail when the account identifier doesn't match, while the files are always available on disk.Test plan