Skip to content

Conversation

obenland
Copy link
Member

@obenland obenland commented Oct 13, 2025

Proposed changes:

This PR introduces a new Attachments processor class that provides unified handling of ActivityPub attachments (images, videos, audio, documents) for both remote URLs and local files.

Key improvements:

  • Unified API: Single Attachments::process() method handles both remote and local attachments
  • Remote URL support: Downloads and imports media from ActivityPub network
  • Local file support: Processes files from Mastodon archive imports
  • Smart media markup: Generates blocks (or shortcodes) with gallery support for multiple images
  • Metadata preservation: Stores source URLs and alt text for images

Integration points:

  • Posts collection (includes/collection/class-posts.php): Uses Attachments::process() when creating/updating ActivityPub posts
  • Mastodon importer (includes/wp-admin/import/class-mastodon.php): Processes local attachment files from archive

Other information:

  • Have you written new tests for your changes, if applicable?

Testing instructions:

Test remote attachments (ActivityPub network):

  1. Set up a test ActivityPub post with images from another server
  2. Send a Create activity with attachments to your WordPress site
  3. Verify the post is created with images attached
  4. Check that alt text from the ActivityPub name field is preserved

Test local attachments (Mastodon import):

  1. Export a Mastodon archive with media attachments
  2. Go to Tools > Import > Mastodon
  3. Upload the archive ZIP file
  4. Verify posts are imported with their original media
  5. Check that images appear in the post content

Run tests:

npm run env-test

All 1137 tests should pass, including the 14 new Attachments tests.

pfefferle and others added 30 commits October 13, 2025 10:45
Introduces a new 'ap_object' post type and related taxonomies for handling ActivityPub objects, including registration, CRUD operations, and taxonomy management. Updates create and update handlers to process non-interaction objects using the new Objects collection. Enhances debug functionality to support the new post type and taxonomies.
Introduced Sanitize::content() to process and format content for ActivityPub, including block support and HTML sanitization. Updated Objects class to use the new sanitizer for post content.
Introduces a new Blocks::html_to_blocks() method to convert HTML content into block format using DOMDocument parsing and block mapping. Refactors Sanitize to use this new method, replacing the previous regex-based paragraph splitting logic for improved accuracy and maintainability.
The get_node_attributes private method was removed and the logic for setting the 'ordered' attribute on 'ol' nodes is now handled inline. This simplifies the code by reducing indirection and focusing only on the required attribute for ordered lists.
Refactored the Blocks class by renaming the html_to_blocks method to convert_from_html for improved clarity. Updated all references to use the new method name.
Introduces new PHPUnit tests for content sanitization in Sanitize::content, covering scenarios such as block support, malicious content, URLs, empty content, and safe HTML preservation. Also adds a test for Blocks::convert_from_html to verify HTML-to-block conversion when blocks are supported.
Introduces tests to verify content sanitization in Create handler, handling of private activities, and behavior with malformed object data. Also ensures required post types are registered during test setup.
Introduces a comprehensive PHPUnit test suite for the Activitypub\Collection\Objects class, covering object addition, updating, retrieval by GUID, and activity-to-post conversion, including edge cases and error handling. Mocks HTTP requests for remote actor fetching and ensures correct post type registration for test isolation.
Renamed update_actor and update_object to handle_actor_update and handle_object_update for improved clarity and consistency. Combined logic for handling object and interaction updates into handle_object_update. Updated corresponding test method names to match the new handler method names.
Changed the PHPUnit @Covers annotation from ::update_actor to ::handle_actor_update in Test_Update to accurately reflect the method being tested.
Changed the post type labels in the registration array from 'Posts'/'Post' to 'Objects'/'Object' to better reflect the content type.
Inserted a blank line after the use statements in class-update.php to improve code readability and adhere to coding standards.
Deleted the Blocks::convert_from_html method and its related test, and updated the Sanitize class to no longer convert HTML content to blocks. This simplifies content sanitization by removing automatic block conversion.
Refactored the Objects collection to Posts, updating class names, constants, and references in all relevant files. Adjusted post type from 'ap_object' to 'ap_post' and updated related taxonomy registrations, handlers, and tests to reflect the new naming convention. This improves clarity and consistency in the codebase.
Refactored the method name from create_object to create_post to better reflect its purpose of handling post creation. Updated all relevant references and docblocks for clarity.
Changed the @Covers annotation from ::create_object to ::create_post in the test_handle_create_object_with_sanitization method to accurately reflect the method being tested.
Adds registration of the '_activitypub_remote_actor_id' post meta for the custom post type, storing the local ID of the remote actor that created the object. This includes type enforcement, a description, and a sanitize callback.
The 'ap_object_type' taxonomy has been added to the registered taxonomies array for the relevant post type, allowing posts to be associated with both 'ap_tag' and 'ap_object_type'.
Deleted the 'labels' arrays from the registration of 'ap_tag' and 'ap_object_type' taxonomies. This will cause WordPress to use default labels for these taxonomies.
Eliminated the 'rewrite' arguments for 'ap_tag' and 'ap_object_type' taxonomies, reverting to default rewrite behavior. This may affect the URLs generated for these taxonomies.
Refactors the method name from handle_actor_update to update_actor for clarity and updates all references accordingly.
Renamed the test method from test_handle_actor_update to test_update_actor and updated the @Covers annotation and method calls to reference update_actor instead of handle_actor_update. This aligns the test with the updated function name.
Initializes $result as WP_Error by default and updates the docblock for the 'activitypub_handled_update' action to specify more precise types for the $result parameter. Removes unreachable assignment in the comment update branch.
Adds explicit WP_Error objects when Create or Update operations fail, ensuring consistent error reporting. Also updates the docblock for the Create handler to reflect possible result types.
Moved the default WP_Error assignment to the start of handle_object_update, ensuring $result is always initialized. Removed redundant error assignment after update attempts.
Replace site_supports_blocks() check with a filter-based approach:
- Add activitypub_attachments_media_markup filter for extensibility
- Default to block markup for all sites
- Create Classic Editor integration that hooks into the filter
- Classic Editor integration activates when:
  - Classic Editor plugin is active (CLASSIC_EDITOR_PLUGIN constant)
  - OR site_supports_blocks() returns false
- Generates shortcode-based markup (gallery, audio, video shortcodes)

This provides a cleaner architecture where:
- Core Attachments class focuses on default behavior (blocks)
- Plugins/integrations can customize via filter
- Classic Editor support is opt-in via integration
- No conditional logic in core attachment processing
@github-actions github-actions bot added the [Focus] Compatibility Ensuring the plugin plays well with other plugins label Oct 13, 2025
Simplified the Attachments::process method calls by removing unnecessary multi-line formatting and consolidating arguments onto a single line for improved readability.
Add comprehensive tests covering attachment processing in the Posts collection:
- test_add_with_attachments: Verifies attachments are created when adding posts
- test_update_with_new_attachments: Tests adding attachments during post update
- test_update_with_changed_attachments: Verifies old attachments are deleted and new ones created
- test_update_keeps_same_attachments: Ensures unchanged attachments are preserved

Tests verify:
- Attachment creation and association with posts
- Source URL metadata storage (_activitypub_source_url)
- Alt text metadata for images (_wp_attachment_image_alt)
- Media markup insertion into post content
- Attachment change detection and replacement logic
Simplified the comment describing Classic Editor support by removing details about shortcode-based attachment markup.
Simplified the mock download logic in test classes by removing unnecessary variable assignment and directly copying the test image file. This improves code clarity and reduces redundancy in the test setup.
Refactored the mock response and headers arrays to use single-line formatting for improved readability in the Test_Attachments unit test.
Make author_id optional and use it consistently:
- Posts collection: Pass 0 for author_id (actor is a custom post type, not a user)
- Mastodon importer: Pass user ID to set proper attachment ownership
- Attachments class: Accept author_id as optional parameter with default 0

WordPress will handle author_id appropriately - valid user IDs are used,
invalid IDs default to 0 or current user.
- Remove base_path parameter from Attachments::process() and save_attachment()
- Mastodon importer now prepends archive path to attachment URLs before processing
- Extract path prepending logic into separate prepend_archive_path() method
- Update all tests to pass full paths in attachment URLs
- Simplifies API: callers handle path resolution, Attachments class focuses on processing
@obenland obenland requested a review from Copilot October 13, 2025 21:30
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new Attachments processor class that provides unified handling of ActivityPub media attachments (images, videos, audio, documents) for both remote URLs and local files. The class centralizes attachment processing logic previously scattered across the codebase.

Key changes:

  • New Attachments class with unified process() method for both remote and local attachments
  • Integration with Posts collection for ActivityPub network media handling
  • Mastodon importer refactored to use the new Attachments processor
  • Classic Editor integration for shortcode-based media markup when blocks aren't supported

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
includes/class-attachments.php New Attachments processor class with unified media handling
includes/collection/class-posts.php Updated to use Attachments processor for ActivityPub post media
includes/wp-admin/import/class-mastodon.php Refactored to use Attachments processor, removing duplicate code
integration/class-classic-editor.php New Classic Editor integration for shortcode-based media markup
integration/load.php Updated to initialize Classic Editor integration
tests/phpunit/tests/includes/class-test-attachments.php Comprehensive test suite for the new Attachments class
tests/phpunit/tests/includes/collection/class-test-posts.php Updated tests for Posts collection attachment handling

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

obenland and others added 2 commits October 13, 2025 16:34
Use array_values() after array_filter() to reindex the array before comparison.
This prevents false positives when comparing arrays with different keys but same values.
Relocates the check and loading of required WordPress media functions to the start of save_attachment(). This ensures dependencies are loaded before any logic that may require them, improving reliability.
@pfefferle
Copy link
Member

I ran some tests and have some feedback not directly to the code, but the handling:

  1. We should not add external Attachments to the Media-Library. This is very spammy and floods the internal media handling with external media.
  2. When I delete a post, the Media-Attachment will not be deleted with the post.
  3. It seems that the current attachment handling has problems with Update activities. When I delete a post without deleting the Attachment, the new one was generated without the Attachment.
  4. The current version does not side-load inline images. I think we should support inline images and ignore attachments that reference the same one.
Screenshot 2025-10-15 at 10 29 49

Copy link
Member

@pfefferle pfefferle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^

@Jiwoon-Kim
Copy link
Contributor

Jiwoon-Kim commented Oct 15, 2025

My consistent opinion is that media file direct links and media objects should be clearly distinguished. The attachment page is the ideal place to include the JSON-LD representation of a media object.
I’ve suggested ideas for improving the media library several times before, and I plan to write a detailed post about it in the Discussions soon.

The content of a remote image object is essentially a hotlink.
It’s also possible to use it without uploading it to on-site storage, depending on the implementation.

@Jiwoon-Kim
Copy link
Contributor

Yes, I’m aware that Mastodon locally caches all attachment images, but WordPress operates under a very different architecture.

My current idea is to load all attachment JSON-LD metadata first, then apply lazy loading for inline images or gallery blocks.

This would provide better performance and flexibility without requiring immediate local uploads.

That said, we still need to consider how platforms like Mastodon, Pixelfed, and Threads handle this, since the appropriate approach may depend on the object type — for example, Article objects might need different attachment-fetching behavior compared to Note objects.

@Jiwoon-Kim
Copy link
Contributor

might need a page like ?remote_attachment_id, but it may only need to be applied to Articles and not to Notes.

Base automatically changed from cpt-ap-post to trunk October 16, 2025 16:01
@Jiwoon-Kim
Copy link
Contributor

Jiwoon-Kim commented Oct 16, 2025

In the case of directly attached media files, a ?remote_attachment_id page is not generated.
Such a page is created only for media objects.

In both cases, the default behavior is to maintain hotlinking (remote linking).

If a license extension is applied, the file is optionally cached locally and the file URL is replaced with a local one — but the original URI is preserved — only when explicit reuse permission is specified.

It is normal for the remote media attachment page to remain even after the remote article is deleted, because it implies that the media can still be reused through the media library.

I’ll post it soon in the discussion tab under the title “Media Upload Endpoint and Remote Media Library.”

Replaces call to Post_Types::register_object_post_type() with Post_Types::register_post_post_type() in the test setup to ensure the correct post type is registered for attachment tests.
@Jiwoon-Kim
Copy link
Contributor

Jiwoon-Kim commented Oct 17, 2025

#2329 (comment)

If there is no media library, implementing the media object endpoint will be difficult, so the existence of a media object can be assumed to mean that a media library and media upload endpoint exist.
If a media object exists, it can be reused. However, in the case of a direct media file link, it cannot be proven in ActivityPub that it is the same entity, so it is one-time. Therefore, in the case of WordPress, there is no reason to store such media locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants