Skip to content
This repository was archived by the owner on Dec 6, 2022. It is now read-only.
This repository was archived by the owner on Dec 6, 2022. It is now read-only.

Consideration for the availability of files attached to old revisions #655

@rooby

Description

@rooby

Drupal has always had a potential issue with files attached to old revisions of content.

The problem is public files attached to old revisions of content that are still available by direct link and are still searchable in search engines.

This is potentially a big problem for government sites if the site is misconfigured or misused (either accidentally or maliciously) because you might have policy documents or other legal documents that are out of date but are still accessible on the website by anyone.

The technicalities of the problem are:

When revisions are enabled, each individual revision of a node (or other revision enabled entity) counts as its own file usage. So if revisions are enabled and you replace a file on a file field with a new version of that file, the old version is retained because it is still used on that revision and it needs to be there in case the user wants to revert to that revision at some point.

Files in the public file system are always directly accessible if you know the URL. Generally people aren't required to guess the URL because Google or other search engines will find it for them. So if you have public files on an old revision they will stick around until that revision is deleted (and the file isn't used anywhere else on the site).

Using the private file system, Drupal will check that the user has access to view the file. For files attached to nodes, this means that node access rules will be checked and if the user doesn't have access to view unpublished revisions then they can't access the file. Since anonymous users will pretty much exclusively not have access to unpublished revisions this solves the problem (although it does add performance overheads to loading the file for uncached requests).

So any files where it might be an issue if a user can access old versions of that file should use the private file system.

So if the site builder is unaware of this behaviour (or just forgets about it when building a particular site) there could end up being a problem.

The next issue is that it is not necessarily easy to get an ideal configuration with the current version of GovCMS SaaS.

Unless all the content editors of a site are very competent and knowledgable on how the system works it would be best to avoid any possibly confusion that might arise by allowing them to select public/private each time they upload a file. Plus even if they are competent and do know about it, it's not hard to make a mistake and select the wrong one. It's also more user friendly if the system can work it out for them.

It's easy to have a document field that specifically uses private files but it's not so easy to handle files added via the WYSIWYG. In that case you either make all files going into the WYSIWYG private, which is not ideal because of performance implications, or you allow the uploader to choose per file, which is not ideal because it is error prone and not user friendly.

Another possible solution is to be able to set the scheme per file type, so for example all documents are private but all images are not. I don't think it's currently possible in GovCMS to do that though. It also leaves open the possibility of unwanted images being visible from old revisions.

Probably the safest way to handle it is to have all content related files in the private file system, so long as the performance trade-off is considered acceptable, which it might be due to the caching layers we have in place.

Whatever is the preferred approach, this is an issue that site builders need to be aware of and I think it would be good for GovCMS to have a recommendation as to how it is to be set up and to set sensible defaults.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions