Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor slot and professors spiders in light of sigarra schedule html page changes #112

Merged
merged 6 commits into from
Jul 24, 2024

Conversation

tomaspalma
Copy link
Member

@tomaspalma tomaspalma commented Jul 4, 2024

Closes #110

Currently the sigarra schedule page has an API link which is present in an attribute of an HTML element, from which we are able to fetch the classes, slots and professors.

Due to this, the following spiders were removed:

  • class_spider
  • slot_professor_spider
  • professor_spider

They were merged into the slot_spider as to avoid to make requests to the same API link more than one time unecessarily

@tomaspalma tomaspalma self-assigned this Jul 4, 2024
@tomaspalma tomaspalma force-pushed the refactor/use-sigarra-api-link-on-slot-fetching branch from 4103cb6 to b343501 Compare July 7, 2024 21:29
@tomaspalma tomaspalma marked this pull request as ready for review July 16, 2024 08:11
@tomaspalma tomaspalma force-pushed the refactor/use-sigarra-api-link-on-slot-fetching branch from bde541a to b975076 Compare July 16, 2024 08:26
@tomaspalma tomaspalma force-pushed the refactor/use-sigarra-api-link-on-slot-fetching branch from b975076 to 0bf499b Compare July 16, 2024 08:52
Copy link
Member

@thePeras thePeras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh boy, what a simplification

src/scrapper/spiders/slot_spider.py Outdated Show resolved Hide resolved
Comment on lines 197 to 202
(sigarra_id, name) = self.get_professor_info(
teacher) # (sigarra_id | None, teacher_name)

yield SlotProfessor(
slot_id=current_class["sigarra_id"]
professor_id=teacher["sigarra_id"]
slot_id=current_class_id,
professor_id=sigarra_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is None a possible value? What happens in these cases?

Copy link
Member Author

@tomaspalma tomaspalma Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None was a possible value in the first version of the get_professor_info method at the time that comment was written. Now that you pointed it out, I removed that outdated comment (# (sigarra_id | None, teacher_name))

In the current version, None is not a possible value. When the teacher["name"] does not respect the format <professor_id> - <professor_name>, instead of retrieving the professor_id on the left side of the -, it justs uses the teacher["sigarra_id"] number from the values returned from sigarra api.

To clarify, the teacher["sigarra_id"] field is not really the id that appears on the link of the page of the teacher, which is extremely weird but not up to us since we didn't developed that API.

Copy link
Member

@thePeras thePeras Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the professor_id is used in the link, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I just see that there is also a sigarra_url, which redirect to teacher profile. Can we save that? I think it could be usefull

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the professor_id is used in the link.

We can save that, good point!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the URL is safest

Copy link
Member Author

@tomaspalma tomaspalma Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure I understood what you suggested

Do you mean retrieving the professor_id from the sigarra_url?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, save the sigarra_url in the database as, for example, teacher_link

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that maybe it is probably best to do it in a separate issue from this PR, as it is a lower priority issue and this PR should be merged as soon as possible.

@tomaspalma tomaspalma merged commit 932cb95 into master Jul 24, 2024
@tomaspalma tomaspalma deleted the refactor/use-sigarra-api-link-on-slot-fetching branch July 24, 2024 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor slot and professor spiders due to changes to sigarra HTML
2 participants