-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BLOCKED] datastreams: implement asynchronous writer #140
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# -*- coding: utf-8 -*- | ||
# | ||
# Copyright (C) 2022 CERN. | ||
# | ||
# Invenio-Vocabularies is free software; you can redistribute it and/or | ||
# modify it under the terms of the MIT License; see LICENSE file for more | ||
# details. | ||
|
||
"""Data Streams Celery tasks.""" | ||
|
||
from celery import shared_task | ||
|
||
from ..datastreams import StreamEntry | ||
from ..datastreams.factories import WriterFactory | ||
|
||
|
||
@shared_task(ignore_result=True) | ||
def write_entry(writer, entry): | ||
"""Write an entry. | ||
|
||
:param writer: writer configuration as accepted by the WriterFactory. | ||
:param entry: dictionary, StreamEntry is not serializable. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note this. |
||
""" | ||
writer = WriterFactory.create(config=writer) | ||
writer.write(StreamEntry(entry)) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
# -*- coding: utf-8 -*- | ||
# | ||
# Copyright (C) 2021 CERN. | ||
# Copyright (C) 2021-2022 CERN. | ||
# | ||
# Invenio-Vocabularies is free software; you can redistribute it and/or | ||
# modify it under the terms of the MIT License; see LICENSE file for more | ||
|
@@ -18,6 +18,7 @@ | |
|
||
from .datastreams import StreamEntry | ||
from .errors import WriterError | ||
from .tasks import write_entry | ||
|
||
|
||
class BaseWriter: | ||
|
@@ -110,3 +111,21 @@ def write(self, stream_entry, *args, **kwargs): | |
yaml.safe_dump([stream_entry.entry], file) | ||
|
||
return stream_entry | ||
|
||
|
||
class AsyncWriter(BaseWriter): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From what I understand, this approach is to use this writer as a container for other writer configs. I was thinking if it would be possible to have a My fear is that we end up with a very complex datastream config structure, where especially in YAML it's easy to make indentation mistakes, which as a result would eventually lead to more difficult UX/DX. Other approaches to discuss:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with you. However, changes are not trivial. In both cases we would need to implement a way to either save the configuration (type + args) or make writers JSON serializable. Otherwise, we won't be able to send them to the task. IMO this fits more at data stream level, i.e. the writers writes, but is the data stream who decides how/when. |
||
"""Writes the entries asynchronously (celery task).""" | ||
|
||
def __init__(self, writer, *args, **kwargs): | ||
"""Constructor. | ||
|
||
:param writer: writer to use. | ||
""" | ||
self._writer = writer | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Accepts a writer configuration, we cannot accept the object because that would make it difficult to pass via config file/definition. Would only work for a programmatic API. This config will be sent to the
Everything is too generic.... |
||
super().__init__(*args, **kwargs) | ||
|
||
def write(self, stream_entry, *args, **kwargs): | ||
"""Launches a celery task to write an entry.""" | ||
write_entry.delay(self._writer, stream_entry.entry) | ||
|
||
return stream_entry |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# -*- coding: utf-8 -*- | ||
# | ||
# Copyright (C) 2021-2022 CERN. | ||
# | ||
# Invenio-Vocabularies is free software; you can redistribute it and/or | ||
# modify it under the terms of the MIT License; see LICENSE file for more | ||
# details. | ||
|
||
"""Data Streams tasks tests.""" | ||
|
||
from pathlib import Path | ||
|
||
import yaml | ||
|
||
from invenio_vocabularies.datastreams import StreamEntry | ||
from invenio_vocabularies.datastreams.tasks import write_entry | ||
|
||
|
||
def test_write_entry(app): | ||
filepath = 'writer_test.yaml' | ||
yaml_writer_config = { | ||
"type": "yaml", | ||
"args": { | ||
"filepath": filepath | ||
} | ||
} | ||
entry = {"key_one": [{"inner_one": 1}]} | ||
write_entry(yaml_writer_config, entry) | ||
|
||
filepath = Path(filepath) | ||
with open(filepath) as file: | ||
assert yaml.safe_load(file) == [entry] | ||
|
||
filepath.unlink() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's passed to
WriterFactory.create(config=...)