Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create broken link checker cron jobs #1018

Closed
wants to merge 1 commit into from

Conversation

zhannaklimanova
Copy link
Contributor

@zhannaklimanova zhannaklimanova commented Aug 30, 2023

Broken link checker cron job that will run at 08h08 every day and check for broken links on flatpages and created articles obtained from https://cantusdatabase.org/flatpages-list/ and https://cantusdatabase.org/articles-list/ websites.

@zhannaklimanova
Copy link
Contributor Author

Temporarily putting this PR on hold because of the below GitHub error which is preventing the link checker from being used. We will need to either enable group permissions or rewrite some parts of the code.

Error: .github#L1
actions/checkout@v3 and lycheeverse/[email protected] are not allowed to be used in DDMAL/CantusDB. Actions in this workflow must be: within a repository owned by DDMAL.

Comment on lines +16 to +18
flatpages=$(curl https://cantusdatabase.org/flatpages-list/ | awk '{ gsub (" ", "\",\"", $0); print}')
articles=$(curl https://cantusdatabase.org/articles-list/ | awk '{ gsub (" ", "\",\"", $0); print}')
Copy link
Contributor

@jacobdgm jacobdgm Aug 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it help if the lists of links were comma-separated rather than space-separated, so we don't need to use awk and gsub here? I can make this change easily if it would clean up this code.

Copy link
Contributor Author

@zhannaklimanova zhannaklimanova Aug 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code works in a personal repository. The error appears to have nothing to do with this. It's the fact that a template is used for the lychee link checker that is not allowed. I'll investigate and let you know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right - I didn't mean to suggest that this was causing the link checker to not run. This just jumped out as a place where we could easily clean up the code with a little bit of coordination.

@zhannaklimanova zhannaklimanova force-pushed the link-checker-jobs branch 2 times, most recently from cc3ef0d to 024d16d Compare August 30, 2023 21:36

# If link checker doesn't have any errors exit gracefully
if not Path(FILE_LOCATION).exists():
print("# ✅ No Broken Link")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print("# ✅ No Broken Link")
print("# ✅ No Broken Links")

print("# ✅ No Broken Link")
sys.exit(0)
else:
print("# Broken Link found, parsing needed", file=sys.stderr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print("# Broken Link found, parsing needed", file=sys.stderr)
print("# Broken Links found, parsing needed", file=sys.stderr)

listOfFailure = link_checker_result['fail_map']

if not listOfFailure:
print("# ✅ No Broken Link")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print("# ✅ No Broken Link")
print("# ✅ No Broken Links")

skipErrors = []

for failureWebSite in listOfFailure: # looping through tested websites
for failure in listOfFailure[failureWebSite]: # looping through broken links
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just loop through failureWebSite directly?

skipErrors.append(failure)

if RealErrors:
print("# Broken Link")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print("# Broken Link")
print("# Broken Links")

print(f"* {error['url']}: {error['status']['code']}")

if skipErrors:
print("# Skippable error Link")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print("# Skippable error Link")
print("# Skippable error Links")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants