From 6d3009a2bd0796ab7a4f9671e9eee1a73fe95308 Mon Sep 17 00:00:00 2001 From: spwoodcock Date: Tue, 12 Dec 2023 12:07:26 +0000 Subject: [PATCH] docs: update info about web apis and frameworks --- docs/dev-guide/web-apis.md | 224 +++++++++++++++++++++++++++++++ docs/dev-guide/web-frameworks.md | 187 ++------------------------ mkdocs.yml | 1 + 3 files changed, 237 insertions(+), 175 deletions(-) create mode 100644 docs/dev-guide/web-apis.md diff --git a/docs/dev-guide/web-apis.md b/docs/dev-guide/web-apis.md new file mode 100644 index 0000000..612c1a6 --- /dev/null +++ b/docs/dev-guide/web-apis.md @@ -0,0 +1,224 @@ +# Web APIs + +## Types + +As a small aside, REST is not the only standard available when it +comes to web APIs. + +### REST + +REST has dominated the scene for quite a few years. + +URLs are mapped to different HTTP methods (GET, POST, PUT, DELETE) +to perform an action when called. + +Responses can be divided between Data APIs (return JSON) vs Hypermedia +APIs (return HTML). + +### GraphQL + +Without going into the details, this standard has many advantages over +REST Data APIs, with much more efficient queries being possible. + +### Others + +SOAP is a historic API design using XML, and is no longer recommended. + +### What To Choose + +As of 2023, Data APIs have been key for the adoption of Single Page +Applications (SPA) and Javascript frameworks (where JSON data is manipulated +by the frontend). + +Going forward, Hypermedia APIs are re-emerging as an increasingly important +alternative, where the entire page is rendered before being returned (reducing +the need for things like Server Side Rendering (SSR)). + +The [HTMX](https://htmx.org/essays/) website has many interesting essays +on this topic. + +**In summary, it is probably best to default to a Hypermedia REST API, with a +simple web framework like HTMX. If a much more complex frontend is required +(such as a word processor, graphics editor, complex map), then a Data REST API +is the best option**. + +## Frameworks + +API Frameworks are generally divided into synchronous and asynchronous. + +Async is a newer paradigm in Python, often slightly more complex to code, +but should be faster and more suited to a web API. + +Synchronous frameworks include **flask**, **Django**, etc. + +The asynchronous framework we recommend at HOT, as of 2024, is **FastAPI**. +It's what we use for most projects. + +There is a great [comparison](https://fastapi.tiangolo.com/alternatives/) +with other frameworks in the ecosystem available. + +Another contender would be [LiteStar](https://github.com/litestar-org/litestar), +a project spawned from some frustrations with the governance of FastAPI. + +### FastAPI + +These docs provide some helpful info for FastAPI best practices. + +#### Async Programming + +Asynchronous programming can be a learning curve for Python developers. + +- FastAPI is an asynchronous web framework that is built to use async code. +- Using async (`async def`) function with await is more scalable than + using synchronous code `def`, so this is always the preferred default + approach. +- Using synchronous code is possible, but devs should be aware of the pitfalls: + if the code runs for a long time, it will block the async event loop + (i.e. block the thread until the process completes). + - Bear in mind that 'synchronous' code could be from what you write + in the crud functions, OR could be from a library that you use + (e.g. osm-fieldwork is synchronous for the most part). + +#### Workers & Thread Blocking + +- We run FastAPI (uvicorn) with a number of workers defined. This is the + number of threads available to run processes. +- If a process blocks a thread (as described above), then the remaining threads + are available to take new requests. +- If all of the workers/threads are blocked by tasks, the server will hang / be unresponsive! + +##### Using Synchronous Code + +It is of course possible to use synchronous code, but if necessary, be +sure to run this in another thread. + +To do this you have several options. + +#### Options + +##### 1) Using sync code within an `async def` function + +- Use the BackgroundTasks implementation we have, with polling for the + task completion. +- The task should be written as a standard `def`. FastAPI will handle + this automatically and ensure it runs in a separate thread. +- Alternatively, if you wish to run the task in the foreground and return + the response, use the FastAPI helper `run_in_threadpool`: + +```python +from fastapi.concurrency import run_in_threadpool + +def long_running_sync_task(time_to_sleep): + sleep(time_to_sleep) + +async def some_func(): + data = await run_in_threadpool(lambda: long_running_sync_task(time_to_sleep)) +``` + +##### 2) Running multiple standard `def` from within an `async def` function + +- Sometimes you need to run multiple `def` functions in parallel. +- To do this, you can use ThreadPoolExecutor: + +```python +from concurrent.futures import ThreadPoolExecutor, wait + +def a_synchronous_function(db): + # Run with expensive task via threadpool + def wrap_generate_task_files(task): + """Func to wrap and return errors from thread. + + Also passes it's own database session for thread safety. + If we pass a single db session to multiple threads, + there may be inconsistencies or errors. + """ + try: + generate_task_files( + next(get_db()), + project_id, + task, + xlsform, + form_type, + odk_credentials, + ) + except Exception as e: + log.exception(str(e)) + + # Use a ThreadPoolExecutor to run the synchronous code in threads + with ThreadPoolExecutor() as executor: + # Submit tasks to the thread pool + futures = [ + executor.submit(wrap_generate_task_files, task) + for task in tasks_list + ] + # Wait for all tasks to complete + wait(futures) +``` + +Note that in the above example, we cannot pass the db object from the parent +function into the functions spawned in threads. A single database +connection should not be written to by multiple processes at the same time, +as you may get data inconsistencies. To solve this we generate a new +db connection within the pool for each separate task we run in a thread. + +> To avoid issues, look into limiting the thread usage via: +> + +##### 3) Running an `async def` within a sync `def` + +- As we try to write most functions async for FastAPI, sometime we need to + run some `async def` logic within a sync `def`. This is not possible normally. +- To avoid having to write a duplicated `def` equivalent of the `async def` + code, we can use the package `asgiref`: + +```python +from asgiref.sync import async_to_sync + +async def get_project(db, project_id): + return something + +def a_sync_function(): + get_project_sync = async_to_sync(get_project) + project = get_project_sync(db, project_id) + return project +``` + +##### 4) Efficiency running batch async tasks + +- Sometime you may have a very efficient async task you need to call + within a for loop. +- Instead of that, you can use `asyncio.gather` to much more efficiently + collect and return the async data (e.g. async web requests, or async + file requests, or async db requests): + +```python +from asyncio import gather + +async def parent_func(db, project_id, data, no_of_buildings, has_data_extracts): + ... some other code + + async def split_multi_geom_into_tasks(): + # Use asyncio.gather to concurrently process the async generator + split_poly = [ + split_polygon_into_tasks( + db, project_id, data, no_of_buildings, has_data_extracts + ) + for data in boundary_geoms + ] + + # Use asyncio.gather with list to collect results from the + # async generator + return ( + item for sublist in await gather(*split_poly) + for item in sublist if sublist + ) + + geoms = await split_multi_geom_into_tasks() +``` + +#### Note + +- If you regularly find you are running out of workers/threads and the + server is overloaded, it may be time to add a task queuing system to your stack. +- Celery is made for just this - defer tasks to a queue, and run gradually + to reduce the immediate load. diff --git a/docs/dev-guide/web-frameworks.md b/docs/dev-guide/web-frameworks.md index a385f65..e9fd54a 100644 --- a/docs/dev-guide/web-frameworks.md +++ b/docs/dev-guide/web-frameworks.md @@ -1,184 +1,21 @@ -# Web Frameworks +# JavaScript Frameworks -## Python +## SSG, SPA, SSR, etc -Frameworks are generally divided into synchronous and asynchronous. +TODO -Async is a newer paradigm in Python, often slightly more complex to code, -but should be faster and more suited to a web API. +## Frameworks -Synchronous frameworks include **flask**, **Django**, etc. +### React -The asynchronous framework we recommend at HOT, as of 2024, is **FastAPI**. -It's what we use for most projects. +As of 2023, it is no longer recommended to design new applications using React. -There is a great [comparison](https://fastapi.tiangolo.com/alternatives/) -with other frameworks in the ecosystem available. +React is not going anywhere soon, with many experienced developers in the field +and a lot of technical debt accumulated by organizations. -Another contender would be [LiteStar](https://github.com/litestar-org/litestar), -a project spawned from some frustrations with the governance of FastAPI. +An interesting article on this topic can be found +[here](https://joshcollinsworth.com/blog/self-fulfilling-prophecy-of-react). -### FastAPI +### Web Components -These docs provide some helpful info for FastAPI best practices. - -#### Async Programming - -Asynchronous programming can be a learning curve for Python developers. - -- FastAPI is an asynchronous web framework that is built to use async code. -- Using async (`async def`) function with await is more scalable than - using synchronous code `def`, so this is always the preferred default - approach. -- Using synchronous code is possible, but devs should be aware of the pitfalls: - if the code runs for a long time, it will block the async event loop - (i.e. block the thread until the process completes). - - Bear in mind that 'synchronous' code could be from what you write - in the crud functions, OR could be from a library that you use - (e.g. osm-fieldwork is synchronous for the most part). - -#### Workers & Thread Blocking - -- We run FastAPI (uvicorn) with a number of workers defined. This is the - number of threads available to run processes. -- If a process blocks a thread (as described above), then the remaining threads - are available to take new requests. -- If all of the workers/threads are blocked by tasks, the server will hang / be unresponsive! - -##### Using Synchronous Code - -It is of course possible to use synchronous code, but if necessary, be -sure to run this in another thread. - -To do this you have several options. - -#### Options - -##### 1) Using sync code within an `async def` function - -- Use the BackgroundTasks implementation we have, with polling for the - task completion. -- The task should be written as a standard `def`. FastAPI will handle - this automatically and ensure it runs in a separate thread. -- Alternatively, if you wish to run the task in the foreground and return - the response, use the FastAPI helper `run_in_threadpool`: - -```python -from fastapi.concurrency import run_in_threadpool - -def long_running_sync_task(time_to_sleep): - sleep(time_to_sleep) - -async def some_func(): - data = await run_in_threadpool(lambda: long_running_sync_task(time_to_sleep)) -``` - -##### 2) Running multiple standard `def` from within an `async def` function - -- Sometimes you need to run multiple `def` functions in parallel. -- To do this, you can use ThreadPoolExecutor: - -```python -from concurrent.futures import ThreadPoolExecutor, wait - -def a_synchronous_function(db): - # Run with expensive task via threadpool - def wrap_generate_task_files(task): - """Func to wrap and return errors from thread. - - Also passes it's own database session for thread safety. - If we pass a single db session to multiple threads, - there may be inconsistencies or errors. - """ - try: - generate_task_files( - next(get_db()), - project_id, - task, - xlsform, - form_type, - odk_credentials, - ) - except Exception as e: - log.exception(str(e)) - - # Use a ThreadPoolExecutor to run the synchronous code in threads - with ThreadPoolExecutor() as executor: - # Submit tasks to the thread pool - futures = [ - executor.submit(wrap_generate_task_files, task) - for task in tasks_list - ] - # Wait for all tasks to complete - wait(futures) -``` - -Note that in the above example, we cannot pass the db object from the parent -function into the functions spawned in threads. A single database -connection should not be written to by multiple processes at the same time, -as you may get data inconsistencies. To solve this we generate a new -db connection within the pool for each separate task we run in a thread. - -> To avoid issues, look into limiting the thread usage via: -> - -##### 3) Running an `async def` within a sync `def` - -- As we try to write most functions async for FastAPI, sometime we need to - run some `async def` logic within a sync `def`. This is not possible normally. -- To avoid having to write a duplicated `def` equivalent of the `async def` - code, we can use the package `asgiref`: - -```python -from asgiref.sync import async_to_sync - -async def get_project(db, project_id): - return something - -def a_sync_function(): - get_project_sync = async_to_sync(get_project) - project = get_project_sync(db, project_id) - return project -``` - -##### 4) Efficiency running batch async tasks - -- Sometime you may have a very efficient async task you need to call - within a for loop. -- Instead of that, you can use `asyncio.gather` to much more efficiently - collect and return the async data (e.g. async web requests, or async - file requests, or async db requests): - -```python -from asyncio import gather - -async def parent_func(db, project_id, data, no_of_buildings, has_data_extracts): - ... some other code - - async def split_multi_geom_into_tasks(): - # Use asyncio.gather to concurrently process the async generator - split_poly = [ - split_polygon_into_tasks( - db, project_id, data, no_of_buildings, has_data_extracts - ) - for data in boundary_geoms - ] - - # Use asyncio.gather with list to collect results from the - # async generator - return ( - item for sublist in await gather(*split_poly) - for item in sublist if sublist - ) - - geoms = await split_multi_geom_into_tasks() -``` - -#### Note - -- If you regularly find you are running out of workers/threads and the - server is overloaded, it may be time to add a task queuing system to your stack. -- Celery is made for just this - defer tasks to a queue, and run gradually - to reduce the immediate load. - -## JavaScript +TODO diff --git a/mkdocs.yml b/mkdocs.yml index f845eda..1e8b92a 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -105,6 +105,7 @@ nav: - Dependency Management: dev-guide/dep-management.md - Version Control: dev-guide/version-control.md - Generating Docs: dev-guide/doc-gen.md + - Web APIs: dev-guide/web-apis.md - Web Frameworks: dev-guide/web-frameworks.md - Testing: dev-guide/testing.md - E2E Diagrams: diagrams.md