docs: document automatic celery tasks #246

mgonnav · 2024-10-23T16:11:43Z

Description

Documented estela core tasks and tasks that run on a schedule

Issue

Ticket

Checklist before requesting a review

I have performed a self-review of my code.
My code follows the style guidelines of this project.
I have made corresponding changes to the documentation.
New and existing tests pass locally with my changes.
If this change is a core feature, I have added thorough tests.
If this change affects or depends on the behavior of other estela repositories, I have created pull requests with the relevant changes in the affected repositories. Please, refer to our official documentation.
I understand that my pull request may be closed if it becomes obvious or I did not perform all of the steps above.

joaquingx · 2024-10-23T16:40:33Z

estela-api/core/tasks.py

+    """
+    Task to run spider jobs that are in the IN_QUEUE_STATUS.
+    Selects jobs from the queue and updates their status to WAITING_STATUS.
+    Initializes job execution using the job manager.
+    """


Are we utilizing this task? It seems like we’re launching jobs immediately when they are created, as seen here: estela-api/api/views/job.py#L136-L146. For cron jobs, we’re using the launch_job task.

If I recall correctly, this task was intended to throttle the number of jobs and prevent overloading the Kubernetes cluster. Could you confirm if this is the case? If so, can you provide more context or explain the reasons behind this task?

@joaquingx You are right. The task is not actively used since the request to launch jobs is usually sent without an async parameter. However, the logic is present here in case jobs are launched with the async parameter here:

estela/estela-api/api/views/job.py

Lines 147 to 153 in 5da817d

else:

serializer.save(

spider=spider,

status=SpiderJob.IN_QUEUE_STATUS,

data_status=data_status,

data_expiry_days=data_expiry_days,

)

Could you add to this documentation that it will be used for asyncs jobs launched please?

Suggested change

"""

Task to run spider jobs that are in the IN_QUEUE_STATUS.

Selects jobs from the queue and updates their status to WAITING_STATUS.

Initializes job execution using the job manager.

"""

"""

Task to run spider jobs that are in the IN_QUEUE_STATUS.

Selects jobs from the queue and updates their status to WAITING_STATUS.

Initializes job execution using the job manager.

Note: This task will be used for async jobs, although job requests are typically sent without the async parameter.

"""

joaquingx · 2024-10-23T16:54:13Z

estela-api/core/tasks.py

+    """
+    Task to launch a spider job with the provided data and optional token.
+    Creates a job using SpiderJobCreateSerializer and passes the job to the job manager.
+
+    Args:
+        sid_ (int): Spider ID.
+        data_ (dict): Job data to be serialized and saved.
+        data_expiry_days (int, optional): Number of days before data expiry.
+        token (str, optional): Authentication token. If not provided, a default token is used.
+    """


In what scenarios would we want to provide the auth token as an argument?

@joaquingx I'm not sure about it. I don't see anywhere in the code where the code is passed manually to this function. @Adriana618 I remember you created this task, do you remember the purpose for adding the token argument?

estela-api/core/tasks.py

docs: document automatic celery tasks

d74854e

mgonnav requested a review from joaquingx October 23, 2024 16:11

mgonnav self-assigned this Oct 23, 2024

joaquingx requested changes Oct 23, 2024

View reviewed changes

docs: document signal on SpiderJob update

9416add

joaquingx approved these changes Oct 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: document automatic celery tasks #246

docs: document automatic celery tasks #246

mgonnav commented Oct 23, 2024 •

edited

Loading

joaquingx Oct 23, 2024

mgonnav Oct 24, 2024

joaquingx Oct 24, 2024

joaquingx Oct 23, 2024

mgonnav Oct 24, 2024

	else:
	serializer.save(
	spider=spider,
	status=SpiderJob.IN_QUEUE_STATUS,
	data_status=data_status,
	data_expiry_days=data_expiry_days,
	)

docs: document automatic celery tasks #246

Are you sure you want to change the base?

docs: document automatic celery tasks #246

Conversation

mgonnav commented Oct 23, 2024 • edited Loading

Description

Issue

Checklist before requesting a review

joaquingx Oct 23, 2024

Choose a reason for hiding this comment

mgonnav Oct 24, 2024

Choose a reason for hiding this comment

joaquingx Oct 24, 2024

Choose a reason for hiding this comment

joaquingx Oct 23, 2024

Choose a reason for hiding this comment

mgonnav Oct 24, 2024

Choose a reason for hiding this comment

mgonnav commented Oct 23, 2024 •

edited

Loading