-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: document automatic celery tasks #246
base: main
Are you sure you want to change the base?
Conversation
""" | ||
Task to run spider jobs that are in the IN_QUEUE_STATUS. | ||
Selects jobs from the queue and updates their status to WAITING_STATUS. | ||
Initializes job execution using the job manager. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we utilizing this task? It seems like we’re launching jobs immediately when they are created, as seen here: estela-api/api/views/job.py#L136-L146. For cron jobs, we’re using the launch_job
task.
If I recall correctly, this task was intended to throttle the number of jobs and prevent overloading the Kubernetes cluster. Could you confirm if this is the case? If so, can you provide more context or explain the reasons behind this task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joaquingx You are right. The task is not actively used since the request to launch jobs is usually sent without an async parameter. However, the logic is present here in case jobs are launched with the async parameter here:
estela/estela-api/api/views/job.py
Lines 147 to 153 in 5da817d
else: | |
serializer.save( | |
spider=spider, | |
status=SpiderJob.IN_QUEUE_STATUS, | |
data_status=data_status, | |
data_expiry_days=data_expiry_days, | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add to this documentation that it will be used for asyncs jobs launched please?
""" | |
Task to run spider jobs that are in the IN_QUEUE_STATUS. | |
Selects jobs from the queue and updates their status to WAITING_STATUS. | |
Initializes job execution using the job manager. | |
""" | |
""" | |
Task to run spider jobs that are in the IN_QUEUE_STATUS. | |
Selects jobs from the queue and updates their status to WAITING_STATUS. | |
Initializes job execution using the job manager. | |
Note: This task will be used for async jobs, although job requests are typically sent without the async parameter. | |
""" |
""" | ||
Task to launch a spider job with the provided data and optional token. | ||
Creates a job using SpiderJobCreateSerializer and passes the job to the job manager. | ||
|
||
Args: | ||
sid_ (int): Spider ID. | ||
data_ (dict): Job data to be serialized and saved. | ||
data_expiry_days (int, optional): Number of days before data expiry. | ||
token (str, optional): Authentication token. If not provided, a default token is used. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what scenarios would we want to provide the auth token as an argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joaquingx I'm not sure about it. I don't see anywhere in the code where the code is passed manually to this function. @Adriana618 I remember you created this task, do you remember the purpose for adding the token argument?
Description
Documented estela core tasks and tasks that run on a schedule
Issue
Checklist before requesting a review