-
-
Notifications
You must be signed in to change notification settings - Fork 813
Browser Data Store: Retransmission Avoidance (-40% for docs page alone) #4796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Test script: from nicegui import json, ui, app
import uuid
def generate_important_data() -> str:
"""Generates a unique identifier for important data."""
# This function generates a UUID to represent important data.
# It can be used to track or reference specific data points in the application.
return f"IMPORTANT_DATA_{uuid.uuid4()}"
def generate_long_string() -> str:
"""Generates a long string for testing purposes."""
return generate_important_data() + 'This is a very long string that you don\'t want to transmit again and again. ' * 100
def put_long_string_in_browser_data_store() -> None:
"""Button handler to put a long string in the browser data store."""
app.browser_data_store['long_string'] = generate_long_string()
def generate_large_tree() -> list:
"""Generates a large tree structure for testing purposes."""
return [
{'id': f'node_{i} {generate_important_data()}', 'children': [{'id': f'child_{i}_{j}'} for j in range(20)]}
for i in range(5)
]
def put_tree_in_browser_data_store() -> None:
"""Button handler to put a large tree in the browser data store."""
app.browser_data_store['large_tree'] = json.dumps(generate_large_tree())
def generate_large_table() -> dict:
"""Generates a large table structure for testing purposes."""
return {
'defaultColDef': {'flex': 1},
'columnDefs': [
{'headerName': 'Name', 'field': 'name'},
{'headerName': 'Age', 'field': 'age'},
{'headerName': 'Parent', 'field': 'parent', 'hide': True},
],
'rowData': [
{
'name': f'Name {i} {generate_important_data()}',
'age': 20 + i,
'parent': f'Parent {i}'
}
for i in range(50)
],
'rowSelection': 'multiple',
}
def put_large_table_in_browser_data_store() -> None:
"""Button handler to put a large table in the browser data store."""
app.browser_data_store['large_table'] = json.dumps(generate_large_table())
@ui.page('/')
def main_page() -> None:
"""Main page of the application."""
if not app.browser_data_store.get('long_string'):
put_long_string_in_browser_data_store()
if not app.browser_data_store.get('large_tree'):
put_tree_in_browser_data_store()
if not app.browser_data_store.get('large_table'):
put_large_table_in_browser_data_store()
ui.label('Welcome to the Browser Data Store Test Page!')
ui.label("Label with Browser Data Store")
ui.button('Put New Long String in Browser Data Store', on_click=lambda: (
put_long_string_in_browser_data_store(), ui.navigate.reload()))
ui.label(ui.context.client.fetch_string_from_browser_data_store(
'long_string')).classes('max-h-40 overflow-y-scroll')
ui.label("Tree with browser data store")
ui.button('Put New Tree in Browser Data Store', on_click=lambda: (
put_tree_in_browser_data_store(), ui.navigate.reload()))
ui.tree(ui.context.client.fetch_list_from_browser_data_store('large_tree'), label_key='id')
ui.label("AG Grid with browser data store")
ui.button('Put New Large Table in Browser Data Store', on_click=lambda: (
put_large_table_in_browser_data_store(), ui.navigate.reload()))
ui.aggrid(ui.context.client.fetch_dict_from_browser_data_store('large_table')).classes('max-h-40')
if app.browser_data_store.get('key_sometimes_missing'):
ui.button(
'Remove Key from Browser Data Store',
on_click=lambda: (
app.browser_data_store.pop('key_sometimes_missing', None),
ui.navigate.reload()
)
)
else:
ui.button(
'Put Key in Browser Data Store',
on_click=lambda: (
app.browser_data_store.update({'key_sometimes_missing': 'value'}),
ui.navigate.reload()
)
)
ui.run() Notice the size of the response and console logs for the following:
-> You may need to Disable Cahce! |
Caching individual page content is an interesting idea, @evnchn! We'll certainly need some time to look into it and give feedback. But can you show a brief "hello world" demo on how to use the new feature? This way we can start with a high-level overview before getting into implementation details. Thanks! |
Glad you like this idea. First, let me clarify: No, it is not individual page content, but rather content global to the entire app (since the storage is at For the hello world, maybe we can start with deciphering the test script:
I also think that working on the documentation would make this clearer. Stay tuned! |
TODO:
|
Your idea has a lot of potential @evnchn. But I think the API with def some_static_ui():
ui.label('Hello NiceGUI')
@ui.page('/')
def index():
ui.label('Index Page')
ui.cached_content(some_static_ui)
@ui.page('/other')
def other():
ui.label('Other Page')
ui.cached_content(some_static_ui) What do you think? Is that possible? |
I was pondering that exact idea (caching an element), and I had a similar API in mind (I was thinking But, there are some problems we need to face, meaning the reusing of JSON definition for an element is not as simple as it seems.
It means that, if we want to reuse the JSON, it is not as simple as a search-and-replace, and more advanced logic has to be done to extract ONLY the changed items, and inform the browser to apply as much of the cached content while tossing in the IDs and the UUIDs. Thus, it may be much of an uphill fight, and this is why I decide to go for tackling large string / dict / list use case first, to immediately address the NiceGUI documentation page's pain point. If you can suggest any API improvement for tackling large string / dict / list use case, I'm also much happy to hear. I'm personally considering to move the JSON dumping task to whoever reads |
Managed to got it working with 2 core changes:
Notably:
Apparently calling it an "uphill battle" is enough to drive up my spirits to get things done! Test script: from nicegui import ui, app
import uuid
if not app.storage.general.get('special_uuid'):
app.storage.general['special_uuid'] = str(uuid.uuid4())
@ui.page('/')
def main_page():
my_cached_label = ui.label('This should be cached'+app.storage.general['special_uuid'])
my_cached_label.cache_name = 'cached_label'
columns = [
{'name': 'name', 'label': 'Name', 'field': 'name', 'required': True, 'align': 'left'},
{'name': 'age', 'label': 'Age', 'field': 'age', 'sortable': True},
]
rows = [
{'name': app.storage.general['special_uuid'], 'age': 18},
{'name': 'Bob', 'age': 21},
{'name': 'Carol', 'age': 25},
]
my_cached_table = ui.table(columns=columns, rows=rows, row_key='name')
my_cached_table.cache_name = 'cached_table'
my_cached_button = ui.button('Click me (here is some long text to test caching)'+app.storage.general['special_uuid'],
on_click=lambda: ui.notify('Button clicked!'))
my_cached_button.cache_name = 'cached_button'
with ui.card() as my_cached_card:
my_cached_card.cache_name = 'cached_card'
ui.label('This is a cached card'+app.storage.general['special_uuid'])
my_cached_button_inside_card = ui.button('Click me', on_click=lambda: ui.notify('Card button clicked!'))
my_cached_button_inside_card.cache_name = 'cached_button_inside_card'
# another cahced label
my_cached_label2 = ui.label('This is another cached label'+app.storage.general['special_uuid'])
my_cached_label2.cache_name = 'cached_label2'
# button to change the special UUID and reload the page
ui.button('Change UUID and reload', on_click=lambda: (app.storage.general.update({'special_uuid': str(uuid.uuid4())}),
ui.navigate.reload()))
ui.run() |
I'm still looking for a simple example demonstrating normal usage for this new data store. Assuming I have a page with a big |
As of 2ad9e3c: long_markdown_element = ui.markdown("""# This is a long markdown text""")
long_markdown_element.cache_name = 'long_markdown_cache' # makes it cached Originally I propose: long_markdown_text = """# This is a long markdown text"""
app.browser_data_store['long_markdown_text'] = long_markdown_text
ui.markdown(ui.context.client.fetch_string_from_browser_data_store('long_markdown_text')) But that doesn't work, since markdown mangles the innerHTML into |
Why do you need a cache name, @evnchn? Would not an internal uuid be sufficient? Maybe something like # on the hidden auto-index page, or anywhere else
with ui.column() as my_content:
ui.label('A')
ui.label('B')
@ui.page('/')
def index():
ui.label('main')
my_content.from_cache()
@ui.page('/other')
def other():
ui.label('other')
my_content.from_cache() |
In my API design, an element which wishes to be cached must, despite being initialized many times in the code (so as to conform with the rest of the NiceGUI framework), share a consistent identity ( Since if you consider my case, where the cached element is inside the page decorator, which is run on every page load, I don't see any invariant that I can use, besides the cache name. In your API design, however, you are recommending to define element once in auto-index page, and then copy the element into the definition of different pages. This could be a workable approach, but this PR does not explore this (I never did copy anywhere). However, last time I heard, NiceGUI elements cannot be easily copied (I think mentioned in #4656 and somewhere else as well)? In conclusion, our mindsets are different.
I can perhaps explore your idea, but I can imagine it'll be harder than this PR, since we're messing with the NiceGUI element defintion, not the transport (as in this PR). |
And therefore, |
I'm still thinking about a way to utilize the browser change. Can't we define some element like with ui.cached_element([id: str]): # working title
ui.markdown('A very big element...') which overwrites I'm far from sure this can work. But it seems simpler than the current PR. Actually, the same concept should be possible with a |
Can we first agree that we need some invariant cache_name / ID, that is NOT dynamically derived from randomness / content being cached? Therefore If you're OK with this agreement, 👍 and we can move on, or else 👎 and I will continue my half-written analogy to (hopefully) get you to understand) |
Hmmm.... I thought about using the Python id of the element ( |
This was exactly what I was thinking of in #4794 (comment), but it's not the best design, since you need to make another access to the endpoint to see if you can use such data, slowing page load down (and potentially causing TOCTOU - Time Of Check to Time Of Use errors, since you did the check 100ms later than the initial page load) This is why in my PR, I shove things into the cookies, in a sense that when you access any NiceGUI page, it basically bundles the response of the responses that the API would otherwise have to make But I will ponder the API design of Perhaps |
@rodja Using those dynamically generated IDs, which are ephemeral in nature, as it stands in this PR, will not work. Let me offer you 2 counter-examples: Using
|
OK, analogy time. It'd be a "bedtime story" when you read this PR at 12am. Suppose Zauberzeug GmbH hires me as a security guard, and my job is to remember everyone's faces to let people in. I need to know everybody's name (like I can tell Rodja is Rodja by the name Rodja, Falko is Falko by the name Falko), such that I can associate the face with the name.
Imagine I don't have names to my disposal.
This analogy maps the concept as follows: Objects
Actions
|
I just sat down with Rodja and tried to identify the different aspects of this new feature. Let's check if we're on the same page: Use casesWe need to distinguish what use cases we want to address:
APISo far we have discussed two or three different APIs. While the APIs 1a) and 1b) need IDs, Rodja's proposal should work without them because he's referring to the original object itself.
StorageYou seem to create an extra data store for all cached content on the server. My idea was to let cached element define an endpoint serving their content on demand, so that the client can fetch it whenever it needs to.
|
I had a hard time to understand your statement and only after contemplating "user's injection attempts" and the analogy to CSRF I now might have an idea of what you mean. Let me rephrase in my own words: The server sends a json object with elements to the browser and needs a way to mark parts which should be loaded from browser storage. A malicious user might sneak some string into the persistence, which then gets loaded on a new page and tricks the javascript in thinking that it should load some browser storage data. Right? But why not use a well known key like "CACHE" to mark cached elements. They appear in the json structure as placeholders for entire objects/arrays or root-level values. So even if user enters "CACHE":evil_key" I don't see how the js could confuse this with a cache loading instruction because the entered text will only appear as string value in a property.
What would be scenarios where you want to cache the parent but not the children? It would simplify the API if we always cache children (and hence get rid of
We are always looking for minimal, straight forward APIs. So, yes I think its good to remove these methods. |
Let's gather for a bit, since dropping Originally:
For example:
Now:
I think this is getting somewhere soon. Let me work on it later. |
I was thinking about this for _ in range(10): # point 2
with ui.card().cache('repetitive-card', apply_to_child=True)
ui.label('This is some boilerplate label that barely changes') # and some other elements which don't really change a lot
ui.label(dynamic_data()).disable_cache() # point 1
|
Proposed new signature:
with ui.card().cache('cache-just-the-card'), ui.column().cache(None):
ui.label('This label is not cached, since the `ui.column().cache(None)` stops the propagation of applying the cache to the child') |
Ok. The main reason for not wanting to cache something is the 5 mb limit in the browser. The documentation should educate the users about that limitation. And I'm curious what @falkoschindler thinks about this API. Maybe we should use the plural |
Very interesting development so far. Status:
@rodja Do you think we should re-think whether we do the |
I would rather not like to go back to tokens and the |
A custom cache strategy per element is a very interesting idea indeed, especially when the old I'll take a look at that idea Meanwhile, I have these thoughts:
|
Initially, I'm quite perplexed by the idea of a "cache strategy" since that would imply executable code on the browser, so each element need to have its own JavaScript snippet, which hardly seems ideal. Now, I'm thinking perhaps the "strategy" is not executable code, but a list of props root keys for which (1) value composes of most of the element's size, (2) unlikely to be mutated, (3) when mutated, it could be argued that it's an entirely different element. In this case, for I think that this strategy makes more sense, but we need to make this for every element, which could be quite a burden and could take a while (or I can ask some LLMs to help once I've laid out the groundwork). Still it's better than the only-working-sometimes |
@rodja Thank you for your pointer in the right direction! Now we are back to the 30KB -> 18KB for documentation (specifically, testing on /documentation/section_text_elements), while:
It went much smoother than what I expected. Shall we review the API? I think manually calling |
Why are you using |
Just to be clear, it is not expected for userspace code to write to Hopefully, each element should have its sensible default, and manual configuration should be rare and far between. However, the need to configure still exists (one shared element with shared styles, or the element may have differing styles, as we have seen with the happy face SVG), so there still needs to be some configurability, but not often. For that, I think we'd need to builder pattern to continue, so we may need some helper methods in the same calling manner as classes and props. So I'm waiting for confirmation before adding caching awareness to all elements, since the workload would be huge. |
6ecb1c5 shows caching applied to For caching of all elements to work*, I have to set So that's why I am hesitant on continuing, and I'm not sure if this PR is in progress if that's the case @falkoschindler *by work, I mean actually doing caching. An element which haven't set the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the downside this implementation adds significant complexity across multiple core aspects of NiceGUI:
- Backend: New caching logic in Element, Client, and App classes
- Frontend: Complex localStorage management with hash synchronization
- Communication: Cookie-based cache invalidation protocol
Still, I think this complexity justified because the caching can significantly improve loading performance and other approaches like the discussed http-based element caching would need multiple requests.
Here are some thoughts from looking closer at the current implementation:
- For
element.py
it would be great to explore an architecture, where we do not need to introduce six new members for every element (even if they are not cached). Maybe a caching object per client which takes all the responsibility? - Naming of
_to_dict_internal
and_to_dict
is not intuitive. Also_to_dict_internal
returns a complex datatype which might be better off described in a data class (if needed at all). - Cookies have 4KB size limits; could the hash data exceed this?
- In
nicegui.js
:Synchronous
localStorage` operations might freeze UI- could the async hash computation lead to inconsistencies?
- we need tests to ensure caching works as expected
- how can we help developers to understand / manage cache size limits
All good suggestions above. I generally have no disagreements with it. For the implementation wise:
As we have seen, this simple concept has grown to something huge (has been a tendency so far but it's good I think). Let's let this sit in the oven for a bit longer 💪 |
Between internship, university, this, and #4900, I need quite a long period of time to sort this out. Also, given SPA in 2.22.1, NiceGUI is fast enough in most use cases already. Let's focus on other PRs in preparation for 3.0 release. |
Motivation
As outlined in #4794, I felt that the NiceGUI site became more sluggish after #4732. Be it true or not, I found that over 30% of the page's content data is the sidebar hierarchy, which is sent again and again on every page load.
This PR introduces a two new mechanisms:
app.browser_data_store
):localStorage
client.fetch_xxx_from_browser_data_store
, wherexxx
can bestring
,list
, ordict
),nicegui.js
parseElements
, it will replace such placeholders with whatever's in the corresponding Browser Data Store key, achieving the outcome of retransmission avoidance.Real motivation: @falkoschindler "I'm not sure how this can work in general." You know I like to challenge the impossible.
Implementation
Browser Data Store synchronization routine:
nicegui_data_store
with the keys it has and the hash of those values to the server on every request.updateBrowserDataStore
and updatesnicegui_data_store
such that next request (if nothing has changed) it gets to save a re-transmission.Notably:
None
(which isnull
in JSON-form) so as to delete old keys.Fetch from Browser Data Store routine:
client.browser_data_store_token
which is also transmitted to the browser. In such a manner, people who control the arbitrary input to elements (e.g.ui.label(user_provided_input)
can NOT trigger the fetch, since they don't know the random token.user_provided_input
must readui.context.client.browser_data_store_token
dynamically on client creation, which is impossible for a string / dict / list.nicegui.js
parseElements
JSON.parse
is passed with a customreviver
, which detects for these values and replace them (using Python notation for clarity):"BDS-TOKEN-sometoken:long_string"
, it replaces the string with the value of keylong_string
in localStorage["BDS-TOKEN-sometoken", "long_string"]
, it replaces the list with the JSON-decoded value of keylong_string
in localStorage{"BDS-TOKEN-sometoken": "long_string"}
, it replaces the dict with the JSON-decoded value of keylong_string
in localStorageProgress
-> We need to think about whether we need pytest. Documentation seems necessary but I'd like to get someone else to look at the idea before doing documentation, since this idea is quite big!
Results showcase
Before:
-> 30.3 KB
After:
-> 18.3 KB