-
Notifications
You must be signed in to change notification settings - Fork 2
Features
EToolbox Link Inspector collects the following information about broken links:
- link href
- link type
- http status code
- status message
- reference to the page containing the link, including the page's title and location
- reference to the component containing the link, including the component's title, resource type and location
- reference to the property containing the link, including the property's name and location
The primary goal is detection of broken links on an Author instance and correcting them by content managers before going to live.
The tool leverages a data feed generated by the scheduled task instead of calculating data upon each request in order to prevent additional load on AEM instance and improve user experience (loading time). Internal and external links are retrieved from the content via traversing a repository (which is more efficient for large content volumes compared to querying) and then are validated (please see the section Links Validation for more details). After that all not valid links and all related details are assembled into the data feed file. Data feed generation is started as a sling job (ordered) to avoid overlapping executions.
By default, the scheduler is configured to start at 5 AM daily and is disabled (/system/console/configMgr/com.exadel.etoolbox.linkinspector.core.schedulers.DataFeedGenerationTask):
The data feed is stored as a json located at /content/etoolbox-link-inspector/data/datafeed.json
Data feed generation can be triggered manually by requesting the resource /content/etoolbox-link-inspector/servlet/triggerDataFeedGeneration via GET (admin permissions are required - admin user or member of administrators group). The servlet should be used for testing purposes only or in exceptional cases not covered by the scheduled execution, normally the data feed should be generated by the scheduler.
There is a set of configurable options allowing to set up content and links filtering for conducting a more precise inspection.
The data feed is generated based on the set of parameters enclosed within the OSGI config /system/console/configMgr/com.exadel.etoolbox.linkinspector.core.services.data.impl.GridResourcesGeneratorImpl
The service leverages the PoolingHttpClientConnectionManager for sending HEAD requests concurrently to validate external links. If the returned http status code is not equal to any code from the range 200-207, the link is recognized as not valid.
Connection timeout, socket timeout and user agent are configurable (/system/console/configMgr/com.exadel.etoolbox.linkinspector.core.services.impl.ExternalLinkCheckerImpl):
The following types of the external links are considered:
- A link stored in a single-value or multi-value property. The link starts with https://, https://
- A link contained in a single-value or multi-value property along with text content (RTE). The link starts with https://, https://
- A link stored in a single-value or multi-value property. The link starts with www
- A link contained in a single-value or multi-value property along with text content (RTE). The link starts with www
An internal link is considered valid if the resource matching the link location is present in a repository. Parallel streams are leveraged to improve performance during the validation.
The following types of the internal links are considered:
- A link stored in a single-value or multi-value property. The link starts with /content/
- A link contained in a single-value or multi-value property along with text content (RTE). The link is present in an html attribute (such as href, src, action, etc) thus it should start with "/content/ (leading by double quote)
If an internal link, retrieved from the content, contains the .html extension, the extension is removed prior to validation.
The full report in the CSV format can be downloaded via clicking the Download Full Report button:
The report contains all found broken links and has the same structure as the UI grid:
If the report and data feed have not generated yet, the corresponding warning message is displayed while attempting to download the report:
The links containing locale specific characters are properly encoded and displayed in the grid as well as in the CSV report.
Notes:
- The report is located at /content/etoolbox-link-inspector/download/report.csv
- The UI grid has a limit of 500 items currently, all the collected items are available in the CSV report
The feature is available for a single selection and allows to replace the selected link with the specified one
User should have read and write permissions for the selected path in order to see the Fix broken link button:
Otherwise, the button is hidden.
The input link should not be empty nor equal to the current link:
The input link is validated at the server side (please see the section Links Validation for more details) after submitting the dialog:
The message contains details (status code, status message) about the reason of the failed validation.
If the checkbox Skip input link check before replacement is checked, the server side validation of the input link will be omitted, so that any link which passes the Client side validation (non-empty and non-equal to the current one link) can be entered and replacement won't be interrupted by the validation:
The checkbox was introduced for taking into account the cases, when the input link doesn't match the internal link (starts with /content/) nor external link (starts with https:// or http://) patterns, e.g. vanity urls (/my-vanity-path).
The feature allows applying replacement by a regex pattern within the detected broken links scope.
It is strongly recommended using the 'Replace by Pattern' feature by privileged users only since improper use of it might imply broad content updates and as a consequence high load of an AEM instance along with undesired changes in the content .
The replacement is done by the servlet mapped to the resource /content/etoolbox-link-inspector/servlet/replaceLinksByPattern, so appropriate ACLs should be applied for this path.
The number of processed items is limited (10k by default) to avoid implications caused by massive content updates, the limit is configurable at /system/console/configMgr/com.exadel.etoolbox.linkinspector.core.servlets.ReplaceByPatternServlet:
The button Replace By Pattern is disabled, if a user has no sufficient read permissions for triggering the servlet /content/etoolbox-link-inspector/servlet/replaceLinksByPattern, that encloses the functionality of the replacement by pattern.
The button is disabled as well, if the grid has no items:
During searching links by pattern within the broken links scope, the ACL check for content paths takes place. If a user doesn't have read/write permissions for updating the link within the content resource, the item will be excluded from the processing.
The input fields should not be empty nor equal to each other:
If the Dry Run mode selected, changes won't be applied in content. The purpose of the Dry Run is to validate the upcoming changes without actual modifications in the repository.
The feature, allowing to download the CSV output containing the details of replacement by pattern, was introduced in order to make it possible to review the updated items, especially for the large content volume updates:
The content of the CSV output:
If the checkbox Download CSV with updated items is not checked, the success message will contain the number of updated items.
The backup package is generated before the replacement and can be further used for reverting any unexpected results:
The package belongs to the group EToolbox_Link_Inspector and has the name "replace_by_pattern_backup_" + generation date in milliseconds:
User should have sufficient permissions in order to create the backup package.
After updating any link the alert, indicating that further data feed regeneration is required in order to reflect the latest changes, is shown:
The popover contains the last generation stats along with filtering properties.