-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor slot and professors spiders in light of sigarra schedule html page changes #112
Refactor slot and professors spiders in light of sigarra schedule html page changes #112
Conversation
4103cb6
to
b343501
Compare
bde541a
to
b975076
Compare
b975076
to
0bf499b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh boy, what a simplification
src/scrapper/spiders/slot_spider.py
Outdated
(sigarra_id, name) = self.get_professor_info( | ||
teacher) # (sigarra_id | None, teacher_name) | ||
|
||
yield SlotProfessor( | ||
slot_id=current_class["sigarra_id"] | ||
professor_id=teacher["sigarra_id"] | ||
slot_id=current_class_id, | ||
professor_id=sigarra_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is None
a possible value? What happens in these cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None
was a possible value in the first version of the get_professor_info
method at the time that comment was written. Now that you pointed it out, I removed that outdated comment (# (sigarra_id | None, teacher_name)
)
In the current version, None
is not a possible value. When the teacher["name"]
does not respect the format <professor_id> - <professor_name>
, instead of retrieving the professor_id
on the left side of the -
, it justs uses the teacher["sigarra_id"]
number from the values returned from sigarra api.
To clarify, the teacher["sigarra_id"]
field is not really the id that appears on the link of the page of the teacher, which is extremely weird but not up to us since we didn't developed that API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the professor_id
is used in the link, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I just see that there is also a sigarra_url
, which redirect to teacher profile. Can we save that? I think it could be usefull
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the professor_id
is used in the link.
We can save that, good point!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the URL is safest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure I understood what you suggested
Do you mean retrieving the professor_id
from the sigarra_url
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, save the sigarra_url
in the database as, for example, teacher_link
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that maybe it is probably best to do it in a separate issue from this PR, as it is a lower priority issue and this PR should be merged as soon as possible.
Closes #110
Currently the sigarra schedule page has an API link which is present in an attribute of an HTML element, from which we are able to fetch the classes, slots and professors.
Due to this, the following spiders were removed:
class_spider
slot_professor_spider
professor_spider
They were merged into the
slot_spider
as to avoid to make requests to the same API link more than one time unecessarily