|
21 | 21 |
|
22 | 22 | @docs_group('Storage clients') |
23 | 23 | class ApifyStorageClient(StorageClient): |
24 | | - """Apify storage client.""" |
| 24 | + """Apify platform implementation of the storage client. |
| 25 | +
|
| 26 | + This storage client provides access to datasets, key-value stores, and request queues that persist data |
| 27 | + to the Apify platform. Each storage type is implemented with its own specific Apify client that stores data |
| 28 | + in the cloud, making it accessible from anywhere. |
| 29 | +
|
| 30 | + The communication with the Apify platform is handled via the Apify API client for Python, which is an HTTP API |
| 31 | + wrapper. For maximum efficiency and performance of the storage clients, various caching mechanisms are used to |
| 32 | + minimize the number of API calls made to the Apify platform. Data can be inspected and manipulated through |
| 33 | + the Apify console web interface or via the Apify API. |
| 34 | +
|
| 35 | + The request queue client supports two access modes controlled by the `request_queue_access` parameter: |
| 36 | +
|
| 37 | + ### Single mode |
| 38 | +
|
| 39 | + The `single` mode is optimized for scenarios with only one consumer. It minimizes API calls, making it faster |
| 40 | + and more cost-efficient compared to the `shared` mode. This option is ideal when a single Actor is responsible |
| 41 | + for consuming the entire request queue. Using multiple consumers simultaneously may lead to inconsistencies |
| 42 | + or unexpected behavior. |
| 43 | +
|
| 44 | + In this mode, multiple producers can safely add new requests, but forefront requests may not be processed |
| 45 | + immediately, as the client relies on local head estimation instead of frequent forefront fetching. Requests can |
| 46 | + also be added or marked as handled by other clients, but they must not be deleted or modified, since such changes |
| 47 | + would not be reflected in the local cache. If a request is already fully cached locally, marking it as handled |
| 48 | + by another client will be ignored by this client. This does not cause errors but can occasionally result in |
| 49 | + reprocessing a request that was already handled elsewhere. If the request was not yet cached locally, marking |
| 50 | + it as handled poses no issue. |
| 51 | +
|
| 52 | + ### Shared mode |
| 53 | +
|
| 54 | + The `shared` mode is designed for scenarios with multiple concurrent consumers. It ensures proper synchronization |
| 55 | + and consistency across clients, at the cost of higher API usage and slightly worse performance. This mode is safe |
| 56 | + for concurrent access from multiple processes, including Actors running in parallel on the Apify platform. It |
| 57 | + should be used when multiple consumers need to process requests from the same queue simultaneously. |
| 58 | + """ |
25 | 59 |
|
26 | 60 | def __init__(self, *, request_queue_access: Literal['single', 'shared'] = 'single') -> None: |
27 | | - """Initialize the Apify storage client. |
| 61 | + """Initialize a new instance. |
28 | 62 |
|
29 | 63 | Args: |
30 | | - request_queue_access: Controls the implementation of the request queue client based on expected scenario: |
31 | | - - 'single' is suitable for single consumer scenarios. It makes less API calls, is cheaper and faster. |
32 | | - - 'shared' is suitable for multiple consumers scenarios at the cost of higher API usage. |
33 | | - Detailed constraints for the 'single' access type: |
34 | | - - Only one client is consuming the request queue at the time. |
35 | | - - Multiple producers can put requests to the queue, but their forefront requests are not guaranteed to |
36 | | - be handled so quickly as this client does not aggressively fetch the forefront and relies on local |
37 | | - head estimation. |
38 | | - - Requests are only added to the queue, never deleted by other clients. (Marking as handled is ok.) |
39 | | - - Other producers can add new requests, but not modify existing ones. |
40 | | - (Modifications would not be included in local cache) |
| 64 | + request_queue_access: Defines how the request queue client behaves. Use `single` mode for a single |
| 65 | + consumer. It has fewer API calls, meaning better performance and lower costs. If you need multiple |
| 66 | + concurrent consumers use `shared` mode, but expect worse performance and higher costs due to |
| 67 | + the additional overhead. |
41 | 68 | """ |
42 | 69 | self._request_queue_access = request_queue_access |
43 | 70 |
|
|
0 commit comments