Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 35 additions & 18 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -221,12 +221,10 @@ Iterator support

In many situations the direct input users want to pass to ijson
is an iterator (e.g., a generator) rather than a file-like object.
To bridge this gap users need to adapt the iterator into a file-like object.
Examples of this can be found
`here <https://github.com/ICRAR/ijson/issues/44#issuecomment-1771013830>`__
and `here <https://github.com/ICRAR/ijson/issues/58#issuecomment-917655522>`__.
Future versions of ijson might provide built-in adapters for this,
and/or support iterators without the need to adapt them first.
ijson provides built-in adapters to bridge this gap:

- ``ijson.from_iter(iterable_of_bytes)``
- ``ijson.from_aiter(async_iterable_of_bytes)``


``asyncio`` support
Expand Down Expand Up @@ -635,23 +633,42 @@ FAQ
by passing the ``multiple_values=True`` to the ijson function in use.
See the options_ section for details.

#. How do I use ijson with the ``requests`` library
#. **Q**: How do I use ijson with ``requests`` or ``httpx``
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency please also fix the starting line of the answer to start with **A**


The ``requests`` library downloads the body of the HTTP response immediately by default.
Users wanting to feed the response into ijson
will need to override this behaviour
by using the ``requests.get(..., stream=True)`` parameter.
Then they have at least two options:
To stream JSON into ijson, pass ``stream=True`` and adapt the byte iterator:

.. code-block:: python

import requests
import ijson

with requests.get('https://jsonplaceholder.typicode.com/posts', stream=True) as resp:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Popular as it may be, I'd rather not reference any specific website in particular; instead I'd keep it intentionally vague like in the other examples where only http://... is shown. Same for the actual loop where each item is processed.

resp.raise_for_status()
f = ijson.from_iter(resp.iter_content(chunk_size=64*1024))
for post in ijson.items(f, 'item'):
print(f"post id = {post['id']}, \t title: {post['title']}")

You can also pass ``Response.raw`` directly (it's a file-like object),
but using ``iter_content`` is preferred because ``requests`` will transparently
handle HTTP transfer encodings (e.g., gzip, chunked).


For async usage with ``httpx``:

.. code-block:: python

* Wrap the ``Response.iter_content()`` iterator into a file-like object,
then give that to ijson.
import httpx, ijson, asyncio

* Pass the ``Response.raw`` object (the underlying ``socket.socket``) to ijson.
async def main():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very simple code sample, I think we an remove this main (and it's invocation with asyncii.run) to remove some.indentation and keep the code a bit easier to the eye.

async with httpx.AsyncClient() as client:
async with client.stream('GET', 'https://jsonplaceholder.typicode.com/posts') as resp:
resp.raise_for_status()
f = ijson.from_aiter(resp.aiter_bytes())
async for item in ijson.items(f, 'item'):
print(f"post id = {post['id']}, \t title: {post['title']}")

The first alternative is best, since ``requests`` will automatically decode
any HTTP transfer encodings, which doesn't happen with ``Response.raw``.
See `Iterator support`_ for how to wrap ``Response.iter_content()``
into a file-like object.
asyncio.run(main())


Acknowledgements
Expand Down
1 change: 1 addition & 0 deletions src/ijson/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
also two other backends using the C library yajl in ``ijson.backends`` that have
the same API and are faster under CPython.
'''
from ijson.adapters import from_aiter, from_iter
from ijson.common import JSONError, IncompleteJSONError, ObjectBuilder

from ijson.utils import coroutine, sendable_list
Expand Down
35 changes: 35 additions & 0 deletions src/ijson/adapters.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
from typing import AsyncIterable, AsyncIterator, Iterable, Iterator


class IterReader:
"""File-like object backed by a byte iterator."""

def __init__(self, byte_iter: Iterator[bytes]):
self._iter = byte_iter

def read(self, n: int) -> bytes:
if n == 0:
return b""
return next(self._iter, b"")
Comment on lines +11 to +13
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): We've found these issues:

Suggested change
if n == 0:
return b""
return next(self._iter, b"")
return b"" if n == 0 else next(self._iter, b"")



class AiterReader:
"""Async file-like object backed by an async byte iterator."""

def __init__(self, byte_aiter: AsyncIterator[bytes]):
self._aiter = byte_aiter

async def read(self, n: int) -> bytes:
if n == 0:
return b""
return await anext(self._aiter, b"")
Comment on lines +23 to +25
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): We've found these issues:

Suggested change
if n == 0:
return b""
return await anext(self._aiter, b"")
return b"" if n == 0 else await anext(self._aiter, b"")



def from_iter(byte_iter: Iterable[bytes]) -> IterReader:
"""Convert a synchronous byte iterable to a file-like object."""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We support both str and bytes for file -like objects, it'd be nice to support both too here. Shouldn't be too hard? Just need to encode strings with utf8, with a warning like we do in the utils.module.

return IterReader(iter(byte_iter))


def from_aiter(byte_aiter: AsyncIterable[bytes]) -> AiterReader:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like with the main entry points for ijson, I'd rather we had a single from_iter function here that handled both sync and async cases, returning the correct Reader instance depending on the input iterable type (you can check for the presence of __aiter__ for instance).

"""Convert an asynchronous byte iterable to an async file-like object."""
return AiterReader(aiter(byte_aiter))
116 changes: 116 additions & 0 deletions tests/test_adapters.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
import asyncio
import ijson
import pytest

from .test_base import JSON, JSON_EVENTS, JSON_PARSE_EVENTS, JSON_OBJECT

CHUNK_SIZE = 10


@pytest.fixture
def chunks():
return [JSON[i : i + CHUNK_SIZE] for i in range(0, len(JSON), CHUNK_SIZE)]


@pytest.fixture
def async_chunks():
async def chunks():
for i in range(0, len(JSON), CHUNK_SIZE):
yield JSON[i : i + CHUNK_SIZE]

return chunks()


def test_from_iter_read0_does_not_consume():
chunks = [b'{"key":', b'"value"}']
file_obj = ijson.from_iter(iter(chunks))
assert file_obj.read(0) == b""
assert file_obj.read(1) == b'{"key":'
assert file_obj.read(1) == b'"value"}'
assert file_obj.read(1) == b""


def test_from_iter_accepts_iterable():
chunks = [b'{"key":', b'"value"}']
file_obj = ijson.from_iter(chunks) # no iter(...)
assert file_obj.read(1) == b'{"key":'
assert file_obj.read(1) == b'"value"}'
assert file_obj.read(1) == b""


def test_from_iter_basic_parse(backend, chunks):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the two first tests above that test from_iter directly are fine here. Same for the rest of the test below that do the same.

However for the rest I would have gone very differently. Please have a look at contest.py, where test cases are dynamically generated for each backend based on different "backend adaptor", each basically feeding ijson data in a different way (str, bytes, sync/async file-like objects) and for each backend. These are then used to run a number of tests that venter around the actual.functions (basic_parse, parse, etc). I think most tests that follow in this module are (most probably because I didn't actually check) a small subset of those function-oriented test cases.

So what I'd do is provide two new of these "backend adaptors" for the sync/async cases. They'd take the test input and create iterables out of them (iterating over 1 character/byte at a time, or maybe even by random amounts), then pass those through ijson.from_iter, and finally feed that to the ijson routine under test. That would make many of these tests below unnecessary, and would ensure all test cases work as expected with iterable inputs. Let me know if you need any guidance with this.

file_obj = ijson.from_iter(iter(chunks))
assert JSON_EVENTS == list(backend.basic_parse(file_obj))


def test_from_iter_parse(backend, chunks):
file_obj = ijson.from_iter(iter(chunks))
assert JSON_PARSE_EVENTS == list(backend.parse(file_obj))


def test_from_iter_items(backend, chunks):
file_obj = ijson.from_iter(iter(chunks))
assert [JSON_OBJECT] == list(backend.items(file_obj, ""))


def test_from_iter_kvitems(backend, chunks):
file_obj = ijson.from_iter(iter(chunks))
kv = list(backend.kvitems(file_obj, ""))
assert len(kv) == 1
key, value = kv[0]
assert key == "docs"
assert value == JSON_OBJECT["docs"]


def test_from_aiter_read0_does_not_consume():
async def chunks():
yield b'{"key":'
yield b'"value"}'

async def main():
file_obj = ijson.from_aiter(chunks())
assert await file_obj.read(0) == b""
assert await file_obj.read(1) == b'{"key":'
assert await file_obj.read(1) == b'"value"}'
assert await file_obj.read(1) == b""

asyncio.run(main())


def test_from_aiter_basic_parse(backend, async_chunks):
async def main():
file_obj = ijson.from_aiter(async_chunks)
events = [e async for e in backend.basic_parse(file_obj)]
assert JSON_EVENTS == events

asyncio.run(main())


def test_from_aiter_parse(backend, async_chunks):
async def main():
file_obj = ijson.from_aiter(async_chunks)
events = [e async for e in backend.parse(file_obj)]
assert JSON_PARSE_EVENTS == events

asyncio.run(main())


def test_from_aiter_items(backend, async_chunks):
async def main():
file_obj = ijson.from_aiter(async_chunks)
items = [obj async for obj in backend.items(file_obj, "")]
assert [JSON_OBJECT] == items

asyncio.run(main())


def test_from_aiter_kvitems(backend, async_chunks):
async def main():
file_obj = ijson.from_aiter(async_chunks)
kv = [kv async for kv in backend.kvitems(file_obj, "")]
assert len(kv) == 1
key, value = kv[0]
assert key == "docs"
assert value == JSON_OBJECT["docs"]

asyncio.run(main())
Loading