Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: document automatic celery tasks #246

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

docs: document automatic celery tasks #246

wants to merge 2 commits into from

Conversation

mgonnav
Copy link
Contributor

@mgonnav mgonnav commented Oct 23, 2024

Description

Documented estela core tasks and tasks that run on a schedule

Issue

Checklist before requesting a review

  • I have performed a self-review of my code.
  • My code follows the style guidelines of this project.
  • I have made corresponding changes to the documentation.
  • New and existing tests pass locally with my changes.
  • If this change is a core feature, I have added thorough tests.
  • If this change affects or depends on the behavior of other estela repositories, I have created pull requests with the relevant changes in the affected repositories. Please, refer to our official documentation.
  • I understand that my pull request may be closed if it becomes obvious or I did not perform all of the steps above.

@mgonnav mgonnav self-assigned this Oct 23, 2024
Comment on lines +50 to +54
"""
Task to run spider jobs that are in the IN_QUEUE_STATUS.
Selects jobs from the queue and updates their status to WAITING_STATUS.
Initializes job execution using the job manager.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we utilizing this task? It seems like we’re launching jobs immediately when they are created, as seen here: estela-api/api/views/job.py#L136-L146. For cron jobs, we’re using the launch_job task.

If I recall correctly, this task was intended to throttle the number of jobs and prevent overloading the Kubernetes cluster. Could you confirm if this is the case? If so, can you provide more context or explain the reasons behind this task?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joaquingx You are right. The task is not actively used since the request to launch jobs is usually sent without an async parameter. However, the logic is present here in case jobs are launched with the async parameter here:

else:
serializer.save(
spider=spider,
status=SpiderJob.IN_QUEUE_STATUS,
data_status=data_status,
data_expiry_days=data_expiry_days,
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add to this documentation that it will be used for asyncs jobs launched please?

Suggested change
"""
Task to run spider jobs that are in the IN_QUEUE_STATUS.
Selects jobs from the queue and updates their status to WAITING_STATUS.
Initializes job execution using the job manager.
"""
"""
Task to run spider jobs that are in the IN_QUEUE_STATUS.
Selects jobs from the queue and updates their status to WAITING_STATUS.
Initializes job execution using the job manager.
Note: This task will be used for async jobs, although job requests are typically sent without the async parameter.
"""

Comment on lines +106 to +115
"""
Task to launch a spider job with the provided data and optional token.
Creates a job using SpiderJobCreateSerializer and passes the job to the job manager.

Args:
sid_ (int): Spider ID.
data_ (dict): Job data to be serialized and saved.
data_expiry_days (int, optional): Number of days before data expiry.
token (str, optional): Authentication token. If not provided, a default token is used.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what scenarios would we want to provide the auth token as an argument?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joaquingx I'm not sure about it. I don't see anywhere in the code where the code is passed manually to this function. @Adriana618 I remember you created this task, do you remember the purpose for adding the token argument?

estela-api/core/tasks.py Show resolved Hide resolved
estela-api/core/tasks.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants