New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Create broken link checker cron jobs #1018

Closed

zhannaklimanova wants to merge 1 commit into develop from link-checker-jobs

Contributor

zhannaklimanova commented Aug 30, 2023 •

edited

Loading

Broken link checker cron job that will run at 08h08 every day and check for broken links on flatpages and created articles obtained from https://cantusdatabase.org/flatpages-list/ and https://cantusdatabase.org/articles-list/ websites.

Contributor Author

zhannaklimanova commented Aug 30, 2023

Temporarily putting this PR on hold because of the below GitHub error which is preventing the link checker from being used. We will need to either enable group permissions or rewrite some parts of the code.

Error: .github#L1
actions/checkout@v3 and lycheeverse/[email protected] are not allowed to be used in DDMAL/CantusDB. Actions in this workflow must be: within a repository owned by DDMAL.

jacobdgm reviewed

View reviewed changes

.github/workflows/broken-link-checker.yaml

Comment on lines +16 to +18

		flatpages=$(curl https://cantusdatabase.org/flatpages-list/ \| awk '{ gsub (" ", "\",\"", $0); print}')
		articles=$(curl https://cantusdatabase.org/articles-list/ \| awk '{ gsub (" ", "\",\"", $0); print}')

Contributor

jacobdgm Aug 30, 2023 •

edited

Loading

would it help if the lists of links were comma-separated rather than space-separated, so we don't need to use awk and gsub here? I can make this change easily if it would clean up this code.

Contributor Author

zhannaklimanova Aug 30, 2023 •

edited

Loading

This code works in a personal repository. The error appears to have nothing to do with this. It's the fact that a template is used for the lychee link checker that is not allowed. I'll investigate and let you know.

Contributor

jacobdgm Aug 30, 2023

right - I didn't mean to suggest that this was causing the link checker to not run. This just jumped out as a place where we could easily clean up the code with a little bit of coordination.

zhannaklimanova force-pushed the link-checker-jobs branch 2 times, most recently from cc3ef0d to 024d16d Compare

August 30, 2023 21:36


          Create broken link checker cron jobs

41750d8

zhannaklimanova force-pushed the link-checker-jobs branch from 024d16d to 41750d8 Compare

August 30, 2023 22:08

ahankinson reviewed

View reviewed changes

scripts/parse_broken_link_checker_output.py Show resolved Hide resolved

ahankinson reviewed

View reviewed changes

scripts/parse_broken_link_checker_output.py

+              # If link checker doesn't have any errors exit gracefully
+              if not Path(FILE_LOCATION).exists():
+                print("# ✅ No Broken Link")

Member

ahankinson Aug 31, 2023

Suggested change

      
              print("# ✅ No Broken Link")
          
              print("# ✅ No Broken Links")

scripts/parse_broken_link_checker_output.py

+                print("# ✅ No Broken Link")
+                sys.exit(0)
+              else:
+                print("# Broken Link found, parsing needed", file=sys.stderr)

Member

ahankinson Aug 31, 2023

Suggested change

      
              print("# Broken Link found, parsing needed", file=sys.stderr)
          
              print("# Broken Links found, parsing needed", file=sys.stderr)

scripts/parse_broken_link_checker_output.py

+              listOfFailure = link_checker_result['fail_map']
+              if not listOfFailure:
+                print("# ✅ No Broken Link")

Member

ahankinson Aug 31, 2023

Suggested change

      
              print("# ✅ No Broken Link")
          
              print("# ✅ No Broken Links")

scripts/parse_broken_link_checker_output.py

+              skipErrors = []
+              for failureWebSite in listOfFailure: # looping through tested websites
+                for failure in listOfFailure[failureWebSite]: # looping through broken links

Member

ahankinson Aug 31, 2023

Why not just loop through failureWebSite directly?

scripts/parse_broken_link_checker_output.py

+                    skipErrors.append(failure)
+              if RealErrors:
+                print("# Broken Link")

Member

ahankinson Aug 31, 2023

Suggested change

      
              print("# Broken Link")
          
              print("# Broken Links")

scripts/parse_broken_link_checker_output.py

+                  print(f"* {error['url']}: {error['status']['code']}")
+              if skipErrors:
+                print("# Skippable error Link")

Member

ahankinson Aug 31, 2023

Suggested change

      
              print("# Skippable error Link")
          
              print("# Skippable error Links")

jacobdgm mentioned this pull request

Add updated google analytics tag to base.html #1093

Merged

jacobdgm closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet