Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding various Canvas functions #33

Merged
merged 24 commits into from
Oct 23, 2023
Merged

Adding various Canvas functions #33

merged 24 commits into from
Oct 23, 2023

Conversation

star-nox
Copy link
Member

@star-nox star-nox commented Aug 10, 2023

added function for getting canvas user emails

Here's our development course: https://canvas.illinois.edu/courses/37348

@lintrule-review
Copy link

Lintrule is disabled.

  You can fix that by putting in a card [here.](https://lintrule.com/dashboard)

@railway-app
Copy link

railway-app bot commented Aug 10, 2023

This PR is being deployed to Railway 🚅

flask: ◻️ REMOVED

Copy link
Member

@KastanDay KastanDay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn’t test it, but the code structure & quality looks very nice.

@KastanDay
Copy link
Member

KastanDay commented Aug 14, 2023

How do we know we were successful?

  1. All documents on Canvas are uploaded to course, with diff mechanism. Try to make this diff mechanism reusable for @jkmin3 web scraper.
  2. All students in the course are auto-added to course_users. Email the UIUC help desk. We want to talk tot the canvas people about creating a student that profs can add to their course. Should be a regular student. ✉️ [email protected]
  3. Diff mechanism: Input:
    1. course_metadata object containing course_name & base_url
    2. A path to a folder, pointing to a collection of NEWLY scraped documents

Auth?

  • How do professors auth? Can they add a special UIUC-chat canvas user as a student? Or provider their own API keys.

No event listeners on Canvas, so use cron job to do DAILY uploads.

Diff mechanism (no file types considerations).
Just ask chatGPT to do it...

import hashlib

def generate_checksum(file_path):
    sha256_hash = hashlib.sha256()
    with open(file_path, "rb") as file:
        # Read and update the hash string value in blocks
        for byte_block in iter(lambda: file.read(4096), b""):
            sha256_hash.update(byte_block)
    return sha256_hash.hexdigest()

@gitguardian
Copy link

gitguardian bot commented Aug 15, 2023

⚠️ GitGuardian has uncovered 2 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request
GitGuardian id Secret Commit Filename
7724259 Supabase Service Role JWT deceb15 ai_ta_backend/nomic.ipynb View secret
7724259 Supabase Service Role JWT 07238a2 ai_ta_backend/nomic.ipynb View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secrets safely. Learn here the best practices.
  3. Revoke and rotate these secrets.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Our GitHub checks need improvements? Share your feedbacks!

@KastanDay
Copy link
Member

We also want to grab all "page content" which is defined as Syllabus page, Rubrics page, maybe others too. Only if they exist.
We want these pages if they're helpful.

Remember the "black hole" ingest philosophy.

CleanShot 2023-08-21 at 14 20 00

@KastanDay
Copy link
Member

Adding a link to the current Canvas apps on Illinois Canvas@Illinois, General LTI Integrations (uillinois.edu)

@star-nox
Copy link
Member Author

  1. Added uiuc.chat as a Teacher to the dev course and replaced access token. Student doesn't have access to user IDs, so can't implement add_users() with a Student profile.
  2. Need to pass the collected user emails to the function which adds users to uiuc.chat.
  3. Ingest and Update functions for Canvas files are done. Looking into scraping the page content next.

@star-nox
Copy link
Member Author

We are now able to scrape all of the content in Canvas (syllabus, pages, etc.)!

@KastanDay
Copy link
Member

Wow fantastic, can we do a live demo on next meeting?

@KastanDay
Copy link
Member

Make sure to finish any details necessary so professors can start using it. Just LMK if you have any questions, the more questions the better.

@star-nox
Copy link
Member Author

Current progress on the feature: Currently it exports all course content irrespective of whether anything is unpublished. I am working on creating a list of published content first and then passing it to the content export API, so only those files are exported.

@KastanDay
Copy link
Member

KastanDay commented Sep 19, 2023 via email

@star-nox
Copy link
Member Author

These might be helpful @star-nox https://community.canvaslms.com/t5/Canvas-Question-Forum/Use-API-to-export-only-pages/m-p/460735 https://community.canvaslms.com/t5/Canvas-Developers-Group/Using-Content-Migrations-API-to-copy-courses-w-o-Announcements/m-p/470539

Selective content export yes! I was trying exactly that before Kastan suggested the individual files route. I have uiuc.chat as a user in the course. I have the bearer token set. But the content_exports API call returns an authorization error as a student, but works as a teacher.

@star-nox
Copy link
Member Author

Yeah but we need to use the student profile, not teacher. We verified that this should work before... please try to fix.

Don't you remember how we debated endlessly between:

  1. ❌ Forcing course creators to supply their Canvas API key and
  2. ✅ Using a student account that professors invite into their course? We need to use the student account.

Oh, by teacher I don't mean the course creators. I am referring to using the UIUC Chat account which professors will add to their course. Instead of adding it as a student, I was adding it as a teacher.

@star-nox
Copy link
Member Author

@KastanDay I have updated the ingest and add_users functions here.
I have modified the workflow of ingest such that it takes a dictionary containing boolean values for what content to ingest and acts accordingly. I am currently working on modifying the update function to reflect this workflow.

@KastanDay
Copy link
Member

Hi @star-nox, do you think any of this is ready for testing & review? I'd love to get an MVP live ASAP so I can give it to George to better understand what users will want. Thanks.

Copy link
Member

@KastanDay KastanDay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a minor change to update_files(), see below

ai_ta_backend/vector_database.py Show resolved Hide resolved
ai_ta_backend/update_materials.py Outdated Show resolved Hide resolved
@star-nox
Copy link
Member Author

Hi @star-nox, do you think any of this is ready for testing & review? I'd love to get an MVP live ASAP so I can give it to George to better understand what users will want. Thanks.

I was facing some JSON error when I passed this dictionary as a parameter in the API call. It is supposed to come from the front-end. For now I have hardcoded it into the function - it ingests all of the below things. It works well for now though, you can test it.

            # a dictionary of all contents we want to ingest - files, pages, modules, syllabus, assignments, discussions.
            content_to_ingest = {
                'files': True,
                'pages': True,
                'modules': True,
                'syllabus': True,
                'assignments': True,
                'discussions': True
            }

@star-nox
Copy link
Member Author

@KastanDay update_files() is also ready for review.



# Access checksum of s3 files
s3_client = boto3.client('s3', aws_access_key_id=os.getenv('AWS_ACCESS_KEY_ID'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is really expensive to download all files from S3.

In the end, it's the exact same as using the Document Contexts from Supabase SQL. I think we have to re-implement this with Supabase instead of S3. It'll be hundreds of times faster and cheaper and probably conceptually simpler too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh okay. I thought get_object() just reads the files without downloading them. I use it to access the ETag attribute which is the md5 hash of the file.
Won't the Supabase contexts give us a different checksum than the original non-manipulated file?

ai_ta_backend/main.py Outdated Show resolved Hide resolved
ai_ta_backend/main.py Outdated Show resolved Hide resolved
Copy link
Member

@KastanDay KastanDay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes requested in main & update_materials.

@star-nox
Copy link
Member Author

star-nox commented Oct 2, 2023

@KastanDay I have separated the file update functions into a different PR. This one now contains the addCanvasUsers and ingestCanvas APIs which are ready for testing and demo. I have added these APIs to our Postman workspace too.

@star-nox star-nox requested a review from KastanDay October 2, 2023 23:46
Copy link
Member

@KastanDay KastanDay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fully working in my end-to-end testing. Merging!

@KastanDay KastanDay merged commit 89f61aa into main Oct 23, 2023
1 check failed
@KastanDay KastanDay deleted the canvas branch October 23, 2023 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

4 participants