Disable search engine crawling of non-canonical forks #90

PathogenDavid · 2024-07-25T01:49:22Z

It can be convenient in forks to enable deployment to GitHub Pages for the purposes of testing. However this inadvertently creates duplicate copies of the documentation accessible on the wider public internet, which means search engines have the potential to find them.

This runs the risk of polluting search results with content which is likely outdated. I believe it also runs the risk of harming the SEO of the official documentation website. (I'm no SEO expert but my understanding is Google in particular harshly penalizes websites which duplicate other websites.)

We should ~~generate a robots.txt and/or~~ add the appropriate meta tags to non-canonical copies of the docs website.

As a semi-related aside (since you specify it in the robots.txt), we should also enable the sitemap.xml generation. Looks like it just needs to be turned on.

The text was updated successfully, but these errors were encountered:

glopesdev · 2024-07-25T08:14:32Z

Having a robots.txt makes complete sense, I just never got to dive into how it works properly 🙂

PathogenDavid · 2024-07-26T03:46:40Z

One thing I didn't really think about when writing this is that the main website's robots.txt is what actually matters since the docs repo is nested in a subdirectory.

(Similarly for forks, the robots.txt in the GitHub Pages website of the user or the organization associated with the fork is what actually matters.)

This means we actually probably just go the route of adding <meta name="robots" content="noindex, nofollow"> tags to the <head> of every page instead.

glopesdev mentioned this issue Jul 25, 2024

Make GitHub Pages deployment optional on non-canonical forks #89

Merged

glopesdev mentioned this issue Jul 25, 2024

Update docfx version and export sitemap #91

Merged

banchan86 added the engineering Improvements to the website infrastructure or automation label Dec 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable search engine crawling of non-canonical forks #90

Disable search engine crawling of non-canonical forks #90

PathogenDavid commented Jul 25, 2024 •

edited

Loading

glopesdev commented Jul 25, 2024

PathogenDavid commented Jul 26, 2024 •

edited

Loading

Disable search engine crawling of non-canonical forks #90

Disable search engine crawling of non-canonical forks #90

Comments

PathogenDavid commented Jul 25, 2024 • edited Loading

glopesdev commented Jul 25, 2024

PathogenDavid commented Jul 26, 2024 • edited Loading

PathogenDavid commented Jul 25, 2024 •

edited

Loading

PathogenDavid commented Jul 26, 2024 •

edited

Loading