-
Notifications
You must be signed in to change notification settings - Fork 55
Add iterator adapters #150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -221,12 +221,10 @@ Iterator support | |
|
|
||
| In many situations the direct input users want to pass to ijson | ||
| is an iterator (e.g., a generator) rather than a file-like object. | ||
| To bridge this gap users need to adapt the iterator into a file-like object. | ||
| Examples of this can be found | ||
| `here <https://github.com/ICRAR/ijson/issues/44#issuecomment-1771013830>`__ | ||
| and `here <https://github.com/ICRAR/ijson/issues/58#issuecomment-917655522>`__. | ||
| Future versions of ijson might provide built-in adapters for this, | ||
| and/or support iterators without the need to adapt them first. | ||
| ijson provides built-in adapters to bridge this gap: | ||
|
|
||
| - ``ijson.from_iter(iterable_of_bytes)`` | ||
| - ``ijson.from_aiter(async_iterable_of_bytes)`` | ||
|
|
||
|
|
||
| ``asyncio`` support | ||
|
|
@@ -635,23 +633,42 @@ FAQ | |
| by passing the ``multiple_values=True`` to the ijson function in use. | ||
| See the options_ section for details. | ||
|
|
||
| #. How do I use ijson with the ``requests`` library | ||
| #. **Q**: How do I use ijson with ``requests`` or ``httpx`` | ||
|
|
||
| The ``requests`` library downloads the body of the HTTP response immediately by default. | ||
| Users wanting to feed the response into ijson | ||
| will need to override this behaviour | ||
| by using the ``requests.get(..., stream=True)`` parameter. | ||
| Then they have at least two options: | ||
| To stream JSON into ijson, pass ``stream=True`` and adapt the byte iterator: | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| import requests | ||
| import ijson | ||
|
|
||
| with requests.get('https://jsonplaceholder.typicode.com/posts', stream=True) as resp: | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Popular as it may be, I'd rather not reference any specific website in particular; instead I'd keep it intentionally vague like in the other examples where only |
||
| resp.raise_for_status() | ||
| f = ijson.from_iter(resp.iter_content(chunk_size=64*1024)) | ||
| for post in ijson.items(f, 'item'): | ||
| print(f"post id = {post['id']}, \t title: {post['title']}") | ||
mattmorgis marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| You can also pass ``Response.raw`` directly (it's a file-like object), | ||
| but using ``iter_content`` is preferred because ``requests`` will transparently | ||
| handle HTTP transfer encodings (e.g., gzip, chunked). | ||
|
|
||
|
|
||
| For async usage with ``httpx``: | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| * Wrap the ``Response.iter_content()`` iterator into a file-like object, | ||
| then give that to ijson. | ||
| import httpx, ijson, asyncio | ||
|
|
||
| * Pass the ``Response.raw`` object (the underlying ``socket.socket``) to ijson. | ||
| async def main(): | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a very simple code sample, I think we an remove this main (and it's invocation with asyncii.run) to remove some.indentation and keep the code a bit easier to the eye. |
||
| async with httpx.AsyncClient() as client: | ||
| async with client.stream('GET', 'https://jsonplaceholder.typicode.com/posts') as resp: | ||
| resp.raise_for_status() | ||
| f = ijson.from_aiter(resp.aiter_bytes()) | ||
| async for item in ijson.items(f, 'item'): | ||
| print(f"post id = {post['id']}, \t title: {post['title']}") | ||
|
|
||
| The first alternative is best, since ``requests`` will automatically decode | ||
| any HTTP transfer encodings, which doesn't happen with ``Response.raw``. | ||
| See `Iterator support`_ for how to wrap ``Response.iter_content()`` | ||
| into a file-like object. | ||
| asyncio.run(main()) | ||
|
|
||
|
|
||
| Acknowledgements | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,35 @@ | ||||||||||
| from typing import AsyncIterable, AsyncIterator, Iterable, Iterator | ||||||||||
|
|
||||||||||
|
|
||||||||||
| class IterReader: | ||||||||||
| """File-like object backed by a byte iterator.""" | ||||||||||
|
|
||||||||||
| def __init__(self, byte_iter: Iterator[bytes]): | ||||||||||
| self._iter = byte_iter | ||||||||||
|
|
||||||||||
| def read(self, n: int) -> bytes: | ||||||||||
| if n == 0: | ||||||||||
| return b"" | ||||||||||
| return next(self._iter, b"") | ||||||||||
mattmorgis marked this conversation as resolved.
Show resolved
Hide resolved
Comment on lines
+11
to
+13
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggestion (code-quality): We've found these issues:
Suggested change
|
||||||||||
|
|
||||||||||
|
|
||||||||||
| class AiterReader: | ||||||||||
| """Async file-like object backed by an async byte iterator.""" | ||||||||||
|
|
||||||||||
| def __init__(self, byte_aiter: AsyncIterator[bytes]): | ||||||||||
| self._aiter = byte_aiter | ||||||||||
|
|
||||||||||
| async def read(self, n: int) -> bytes: | ||||||||||
| if n == 0: | ||||||||||
| return b"" | ||||||||||
| return await anext(self._aiter, b"") | ||||||||||
mattmorgis marked this conversation as resolved.
Show resolved
Hide resolved
Comment on lines
+23
to
+25
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggestion (code-quality): We've found these issues:
Suggested change
|
||||||||||
|
|
||||||||||
|
|
||||||||||
| def from_iter(byte_iter: Iterable[bytes]) -> IterReader: | ||||||||||
| """Convert a synchronous byte iterable to a file-like object.""" | ||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We support both str and bytes for file -like objects, it'd be nice to support both too here. Shouldn't be too hard? Just need to encode strings with utf8, with a warning like we do in the utils.module. |
||||||||||
| return IterReader(iter(byte_iter)) | ||||||||||
|
|
||||||||||
|
|
||||||||||
| def from_aiter(byte_aiter: AsyncIterable[bytes]) -> AiterReader: | ||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Like with the main entry points for ijson, I'd rather we had a single |
||||||||||
| """Convert an asynchronous byte iterable to an async file-like object.""" | ||||||||||
| return AiterReader(aiter(byte_aiter)) | ||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,116 @@ | ||
| import asyncio | ||
| import ijson | ||
| import pytest | ||
|
|
||
| from .test_base import JSON, JSON_EVENTS, JSON_PARSE_EVENTS, JSON_OBJECT | ||
|
|
||
| CHUNK_SIZE = 10 | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def chunks(): | ||
| return [JSON[i : i + CHUNK_SIZE] for i in range(0, len(JSON), CHUNK_SIZE)] | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def async_chunks(): | ||
| async def chunks(): | ||
| for i in range(0, len(JSON), CHUNK_SIZE): | ||
| yield JSON[i : i + CHUNK_SIZE] | ||
|
|
||
| return chunks() | ||
|
|
||
|
|
||
| def test_from_iter_read0_does_not_consume(): | ||
| chunks = [b'{"key":', b'"value"}'] | ||
| file_obj = ijson.from_iter(iter(chunks)) | ||
| assert file_obj.read(0) == b"" | ||
| assert file_obj.read(1) == b'{"key":' | ||
| assert file_obj.read(1) == b'"value"}' | ||
| assert file_obj.read(1) == b"" | ||
|
|
||
|
|
||
| def test_from_iter_accepts_iterable(): | ||
| chunks = [b'{"key":', b'"value"}'] | ||
| file_obj = ijson.from_iter(chunks) # no iter(...) | ||
| assert file_obj.read(1) == b'{"key":' | ||
| assert file_obj.read(1) == b'"value"}' | ||
| assert file_obj.read(1) == b"" | ||
mattmorgis marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
| def test_from_iter_basic_parse(backend, chunks): | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the two first tests above that test from_iter directly are fine here. Same for the rest of the test below that do the same. However for the rest I would have gone very differently. Please have a look at contest.py, where test cases are dynamically generated for each backend based on different "backend adaptor", each basically feeding ijson data in a different way (str, bytes, sync/async file-like objects) and for each backend. These are then used to run a number of tests that venter around the actual.functions (basic_parse, parse, etc). I think most tests that follow in this module are (most probably because I didn't actually check) a small subset of those function-oriented test cases. So what I'd do is provide two new of these "backend adaptors" for the sync/async cases. They'd take the test input and create iterables out of them (iterating over 1 character/byte at a time, or maybe even by random amounts), then pass those through ijson.from_iter, and finally feed that to the ijson routine under test. That would make many of these tests below unnecessary, and would ensure all test cases work as expected with iterable inputs. Let me know if you need any guidance with this. |
||
| file_obj = ijson.from_iter(iter(chunks)) | ||
| assert JSON_EVENTS == list(backend.basic_parse(file_obj)) | ||
|
|
||
|
|
||
| def test_from_iter_parse(backend, chunks): | ||
| file_obj = ijson.from_iter(iter(chunks)) | ||
| assert JSON_PARSE_EVENTS == list(backend.parse(file_obj)) | ||
|
|
||
|
|
||
| def test_from_iter_items(backend, chunks): | ||
| file_obj = ijson.from_iter(iter(chunks)) | ||
| assert [JSON_OBJECT] == list(backend.items(file_obj, "")) | ||
|
|
||
|
|
||
| def test_from_iter_kvitems(backend, chunks): | ||
| file_obj = ijson.from_iter(iter(chunks)) | ||
| kv = list(backend.kvitems(file_obj, "")) | ||
| assert len(kv) == 1 | ||
| key, value = kv[0] | ||
| assert key == "docs" | ||
| assert value == JSON_OBJECT["docs"] | ||
|
|
||
|
|
||
| def test_from_aiter_read0_does_not_consume(): | ||
| async def chunks(): | ||
| yield b'{"key":' | ||
| yield b'"value"}' | ||
|
|
||
| async def main(): | ||
| file_obj = ijson.from_aiter(chunks()) | ||
| assert await file_obj.read(0) == b"" | ||
| assert await file_obj.read(1) == b'{"key":' | ||
| assert await file_obj.read(1) == b'"value"}' | ||
| assert await file_obj.read(1) == b"" | ||
|
|
||
| asyncio.run(main()) | ||
|
|
||
|
|
||
| def test_from_aiter_basic_parse(backend, async_chunks): | ||
| async def main(): | ||
| file_obj = ijson.from_aiter(async_chunks) | ||
| events = [e async for e in backend.basic_parse(file_obj)] | ||
| assert JSON_EVENTS == events | ||
|
|
||
| asyncio.run(main()) | ||
|
|
||
|
|
||
| def test_from_aiter_parse(backend, async_chunks): | ||
| async def main(): | ||
| file_obj = ijson.from_aiter(async_chunks) | ||
| events = [e async for e in backend.parse(file_obj)] | ||
| assert JSON_PARSE_EVENTS == events | ||
|
|
||
| asyncio.run(main()) | ||
|
|
||
|
|
||
| def test_from_aiter_items(backend, async_chunks): | ||
| async def main(): | ||
| file_obj = ijson.from_aiter(async_chunks) | ||
| items = [obj async for obj in backend.items(file_obj, "")] | ||
| assert [JSON_OBJECT] == items | ||
|
|
||
| asyncio.run(main()) | ||
|
|
||
|
|
||
| def test_from_aiter_kvitems(backend, async_chunks): | ||
| async def main(): | ||
| file_obj = ijson.from_aiter(async_chunks) | ||
| kv = [kv async for kv in backend.kvitems(file_obj, "")] | ||
| assert len(kv) == 1 | ||
| key, value = kv[0] | ||
| assert key == "docs" | ||
| assert value == JSON_OBJECT["docs"] | ||
|
|
||
| asyncio.run(main()) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency please also fix the starting line of the answer to start with
**A**