Skip to content

Conversation

evnchn
Copy link
Collaborator

@evnchn evnchn commented May 26, 2025

Motivation

As outlined in #4794, I felt that the NiceGUI site became more sluggish after #4732. Be it true or not, I found that over 30% of the page's content data is the sidebar hierarchy, which is sent again and again on every page load.

This PR introduces a two new mechanisms:

  • Browser Data Store (at app.browser_data_store):
    • [User usage] which the user can write strings (strings only, for ease of hashing) to,
    • [End result] that they can rest assured NiceGUI will help them keep synchronized with what is in the browser's localStorage
  • Fetch from Browser Data Store: (at client.fetch_xxx_from_browser_data_store, where xxx can be string, list, or dict),
    • [User usage] which the user can use as a placeholder of the large amount of data,
    • [End result] and upon nicegui.js parseElements, it will replace such placeholders with whatever's in the corresponding Browser Data Store key, achieving the outcome of retransmission avoidance.

Real motivation: @falkoschindler "I'm not sure how this can work in general." You know I like to challenge the impossible.

Implementation

Browser Data Store synchronization routine:

  • If populated, browser sends cookie nicegui_data_store with the keys it has and the hash of those values to the server on every request.
  • Server inspects such cookie if present, checks the hash, and sends only the keys it needs to send to update the browser with the latest data from the Browser Data Store.
  • Browser receives the data from updateBrowserDataStore and updates nicegui_data_store such that next request (if nothing has changed) it gets to save a re-transmission.

Notably:

  • Server also sends value None (which is null in JSON-form) so as to delete old keys.

Fetch from Browser Data Store routine:

  • To avoid plain-text injection, the fetching is triggered with a random token in client.browser_data_store_token which is also transmitted to the browser. In such a manner, people who control the arbitrary input to elements (e.g. ui.label(user_provided_input) can NOT trigger the fetch, since they don't know the random token.
    • For injection to occur, the user_provided_input must read ui.context.client.browser_data_store_token dynamically on client creation, which is impossible for a string / dict / list.
  • nicegui.js parseElements JSON.parse is passed with a custom reviver, which detects for these values and replace them (using Python notation for clarity):
    • If it finds string "BDS-TOKEN-sometoken:long_string", it replaces the string with the value of key long_string in localStorage
    • If it finds list ["BDS-TOKEN-sometoken", "long_string"], it replaces the list with the JSON-decoded value of key long_string in localStorage
    • If it finds dict {"BDS-TOKEN-sometoken": "long_string"}, it replaces the dict with the JSON-decoded value of key long_string in localStorage

Progress

  • I chose a meaningful title that completes the sentence: "If applied, this PR will..."
  • The implementation is complete.
  • Pytests have been added (or are not necessary).
  • Documentation has been added (or is not necessary).

-> We need to think about whether we need pytest. Documentation seems necessary but I'd like to get someone else to look at the idea before doing documentation, since this idea is quite big!

Results showcase

Before:

image

-> 30.3 KB

After:

image

-> 18.3 KB

@evnchn evnchn added feature Type/scope: New feature or enhancement 🟡 medium Priority: Relevant, but not essential labels May 26, 2025
@evnchn
Copy link
Collaborator Author

evnchn commented May 26, 2025

Test script:

from nicegui import json, ui, app
import uuid


def generate_important_data() -> str:
    """Generates a unique identifier for important data."""
    # This function generates a UUID to represent important data.
    # It can be used to track or reference specific data points in the application.
    return f"IMPORTANT_DATA_{uuid.uuid4()}"


def generate_long_string() -> str:
    """Generates a long string for testing purposes."""
    return generate_important_data() + 'This is a very long string that you don\'t want to transmit again and again. ' * 100


def put_long_string_in_browser_data_store() -> None:
    """Button handler to put a long string in the browser data store."""
    app.browser_data_store['long_string'] = generate_long_string()


def generate_large_tree() -> list:
    """Generates a large tree structure for testing purposes."""
    return [
        {'id': f'node_{i} {generate_important_data()}', 'children': [{'id': f'child_{i}_{j}'} for j in range(20)]}
        for i in range(5)
    ]


def put_tree_in_browser_data_store() -> None:
    """Button handler to put a large tree in the browser data store."""
    app.browser_data_store['large_tree'] = json.dumps(generate_large_tree())


def generate_large_table() -> dict:
    """Generates a large table structure for testing purposes."""
    return {
        'defaultColDef': {'flex': 1},
        'columnDefs': [
            {'headerName': 'Name', 'field': 'name'},
            {'headerName': 'Age', 'field': 'age'},
            {'headerName': 'Parent', 'field': 'parent', 'hide': True},
        ],
        'rowData': [
            {
                'name': f'Name {i} {generate_important_data()}',
                'age': 20 + i,
                'parent': f'Parent {i}'
            }
            for i in range(50)
        ],
        'rowSelection': 'multiple',
    }


def put_large_table_in_browser_data_store() -> None:
    """Button handler to put a large table in the browser data store."""
    app.browser_data_store['large_table'] = json.dumps(generate_large_table())


@ui.page('/')
def main_page() -> None:
    """Main page of the application."""

    if not app.browser_data_store.get('long_string'):
        put_long_string_in_browser_data_store()
    if not app.browser_data_store.get('large_tree'):
        put_tree_in_browser_data_store()
    if not app.browser_data_store.get('large_table'):
        put_large_table_in_browser_data_store()

    ui.label('Welcome to the Browser Data Store Test Page!')

    ui.label("Label with Browser Data Store")
    ui.button('Put New Long String in Browser Data Store', on_click=lambda: (
        put_long_string_in_browser_data_store(), ui.navigate.reload()))
    ui.label(ui.context.client.fetch_string_from_browser_data_store(
        'long_string')).classes('max-h-40 overflow-y-scroll')

    ui.label("Tree with browser data store")
    ui.button('Put New Tree in Browser Data Store', on_click=lambda: (
        put_tree_in_browser_data_store(), ui.navigate.reload()))
    ui.tree(ui.context.client.fetch_list_from_browser_data_store('large_tree'), label_key='id')

    ui.label("AG Grid with browser data store")
    ui.button('Put New Large Table in Browser Data Store', on_click=lambda: (
        put_large_table_in_browser_data_store(), ui.navigate.reload()))
    ui.aggrid(ui.context.client.fetch_dict_from_browser_data_store('large_table')).classes('max-h-40')

    if app.browser_data_store.get('key_sometimes_missing'):
        ui.button(
            'Remove Key from Browser Data Store',
            on_click=lambda: (
                app.browser_data_store.pop('key_sometimes_missing', None),
                ui.navigate.reload()
            )
        )
    else:
        ui.button(
            'Put Key in Browser Data Store',
            on_click=lambda: (
                app.browser_data_store.update({'key_sometimes_missing': 'value'}),
                ui.navigate.reload()
            )
        )


ui.run()

Notice the size of the response and console logs for the following:

  • Fresh browser, no cookies
  • Just eload
  • Press any of the button and the reload afterwards

-> You may need to Disable Cahce!

@falkoschindler
Copy link
Contributor

Caching individual page content is an interesting idea, @evnchn! We'll certainly need some time to look into it and give feedback. But can you show a brief "hello world" demo on how to use the new feature? This way we can start with a high-level overview before getting into implementation details. Thanks!

@evnchn
Copy link
Collaborator Author

evnchn commented May 26, 2025

Glad you like this idea.

First, let me clarify:

No, it is not individual page content, but rather content global to the entire app (since the storage is at app.browser_data_store. I'd love to have a mechanism to have individual browser data store per-page, but it'd be tricky (imagine one page cached a lot of content, and then you visit another page, then we need to throw stuff away from the cache to fit the 5MB... Much easier if it's just one central store, where if it fits it fits, if it doesn't you're screwed)

For the hello world, maybe we can start with deciphering the test script:

  • generate_important_data(): Generate some identifiable but random data. To be written to the browser data store to show that it works as intended (changes not lost, cache invalidated, etc...)
  • generate_long_string(), generate_large_tree(), generate_large_table(): Generate some large, identifiable but random data in the expected format of ui.label() (takes a string), ui.tree() (takes a list) and ui.aggrid() (takes a dict)
  • put_xxx_in_browser_data_store(): Store the data into the browser data store. JSON-serializes it for list and dict.
  •   if not app.browser_data_store.get('long_string'):
          put_long_string_in_browser_data_store()
      if not app.browser_data_store.get('large_tree'):
          put_tree_in_browser_data_store()
      if not app.browser_data_store.get('large_table'):
          put_large_table_in_browser_data_store()
    • Ensures that the browser data store is not empty (would break the test otherwise)
  •   ui.label(ui.context.client.fetch_string_from_browser_data_store(
          'long_string')).classes('max-h-40 overflow-y-scroll')
    
      ui.tree(ui.context.client.fetch_list_from_browser_data_store('large_tree'), label_key='id')
    
      ui.aggrid(ui.context.client.fetch_dict_from_browser_data_store('large_table')).classes('max-h-40')
    • Spawn the 3 elements, instructing them to fetch from browser data store using the helper functions which internally put in a placeholder.
  •   ui.button('Put New Long String in Browser Data Store', on_click=lambda: (
          put_long_string_in_browser_data_store(), ui.navigate.reload()))
    
      ui.button('Put New Tree in Browser Data Store', on_click=lambda: (
          put_tree_in_browser_data_store(), ui.navigate.reload()))
    
      ui.button('Put New Large Table in Browser Data Store', on_click=lambda: (
          put_large_table_in_browser_data_store(), ui.navigate.reload()))
    • Puts new data in the browser data store and swiftly reloads, tests if the cache invalidation works
  •   if app.browser_data_store.get('key_sometimes_missing'):
          ui.button(
              'Remove Key from Browser Data Store',
              on_click=lambda: (
                  app.browser_data_store.pop('key_sometimes_missing', None),
                  ui.navigate.reload()
              )
          )
      else:
          ui.button(
              'Put Key in Browser Data Store',
              on_click=lambda: (
                  app.browser_data_store.update({'key_sometimes_missing': 'value'}),
                  ui.navigate.reload()
              )
          )
    • Toggles the key key_sometimes_missing from existing in the browser data store, test ability to add new keys and delete non-existing keys.

I also think that working on the documentation would make this clearer. Stay tuned!

@evnchn
Copy link
Collaborator Author

evnchn commented May 26, 2025

TODO:

  • If the page does indeed serve over 5MB of content to be shoved into browser data store, the data which has exceeded capacity can't fit in localStorage, and thus on the replace, it fails. We should still server best-effort by putting the data that can't fit in a dictionary, such that although yes we aren't caching the content, it means that we are not dropping content.

@rodja
Copy link
Member

rodja commented May 27, 2025

Your idea has a lot of potential @evnchn. But I think the API with app.browser_data_store['...'] = json.dumps(...) and ui.context.client.fetch_string_from_browser_data_store a bit complicated. When doing wishful programming, I could imagine to do:

def some_static_ui():
    ui.label('Hello NiceGUI')

@ui.page('/')
def index():
    ui.label('Index Page')
    ui.cached_content(some_static_ui)

@ui.page('/other')
def other():
    ui.label('Other Page')
    ui.cached_content(some_static_ui)

What do you think? Is that possible?

@evnchn
Copy link
Collaborator Author

evnchn commented May 27, 2025

I was pondering that exact idea (caching an element), and I had a similar API in mind (I was thinking element.cache = True)

But, there are some problems we need to face, meaning the reusing of JSON definition for an element is not as simple as it seems.

  1. Elements have their children list (list of IDs), and the children ID varies depending on the sequence they are defined in the page.
  2. Event handlers have their UUID which is tightly coupled with the server.

It means that, if we want to reuse the JSON, it is not as simple as a search-and-replace, and more advanced logic has to be done to extract ONLY the changed items, and inform the browser to apply as much of the cached content while tossing in the IDs and the UUIDs.

Thus, it may be much of an uphill fight, and this is why I decide to go for tackling large string / dict / list use case first, to immediately address the NiceGUI documentation page's pain point.

If you can suggest any API improvement for tackling large string / dict / list use case, I'm also much happy to hear. I'm personally considering to move the JSON dumping task to whoever reads app.browser_data_store, so that writes are less painful. I'm also considering combining the write and the fetch in ui.context.client.fetch_xxx_from_browser_data_store into ui.context.client.cached_xxx(..., name="my_long_list")

@evnchn
Copy link
Collaborator Author

evnchn commented May 27, 2025

Managed to got it working with 2 core changes:

  1. Separate the element' keys into static and dynamic, cache only the static ones, and serving only the dynamic ones on cache hit.
  2. On browser side, if it's a dict, apply the rest of the original keys (aka the dynamic keys of an element) if we got a hit.

Notably:

  • element._populate_browser_data_store_if_needed() is needed otherwise the cache will be empty on first page load, instructing the client to drop all keys, while also serving a request which uses said keys (aka it doesn't load)

Apparently calling it an "uphill battle" is enough to drive up my spirits to get things done!

Test script:

from nicegui import ui, app
import uuid

if not app.storage.general.get('special_uuid'):
    app.storage.general['special_uuid'] = str(uuid.uuid4())


@ui.page('/')
def main_page():
    my_cached_label = ui.label('This should be cached'+app.storage.general['special_uuid'])
    my_cached_label.cache_name = 'cached_label'

    columns = [
        {'name': 'name', 'label': 'Name', 'field': 'name', 'required': True, 'align': 'left'},
        {'name': 'age', 'label': 'Age', 'field': 'age', 'sortable': True},
    ]
    rows = [
        {'name': app.storage.general['special_uuid'], 'age': 18},
        {'name': 'Bob', 'age': 21},
        {'name': 'Carol', 'age': 25},
    ]
    my_cached_table = ui.table(columns=columns, rows=rows, row_key='name')
    my_cached_table.cache_name = 'cached_table'

    my_cached_button = ui.button('Click me (here is some long text to test caching)'+app.storage.general['special_uuid'],
                                 on_click=lambda: ui.notify('Button clicked!'))
    my_cached_button.cache_name = 'cached_button'

    with ui.card() as my_cached_card:
        my_cached_card.cache_name = 'cached_card'
        ui.label('This is a cached card'+app.storage.general['special_uuid'])
        my_cached_button_inside_card = ui.button('Click me', on_click=lambda: ui.notify('Card button clicked!'))
        my_cached_button_inside_card.cache_name = 'cached_button_inside_card'

    # another cahced label
    my_cached_label2 = ui.label('This is another cached label'+app.storage.general['special_uuid'])
    my_cached_label2.cache_name = 'cached_label2'

    # button to change the special UUID and reload the page
    ui.button('Change UUID and reload', on_click=lambda: (app.storage.general.update({'special_uuid': str(uuid.uuid4())}),
                                                          ui.navigate.reload()))


ui.run()

@falkoschindler
Copy link
Contributor

I'm still looking for a simple example demonstrating normal usage for this new data store. Assuming I have a page with a big ui.markdown. How can I cache it?

@evnchn
Copy link
Collaborator Author

evnchn commented May 27, 2025

Assuming I have a page with a big ui.markdown. How can I cache it?

As of 2ad9e3c:

long_markdown_element = ui.markdown("""# This is a long markdown text""")
long_markdown_element.cache_name = 'long_markdown_cache'  # makes it cached

Originally I propose:

long_markdown_text = """# This is a long markdown text"""
app.browser_data_store['long_markdown_text'] = long_markdown_text
ui.markdown(ui.context.client.fetch_string_from_browser_data_store('long_markdown_text'))

But that doesn't work, since markdown mangles the innerHTML into <p>BDS-TOKEN-sometoken:long<em>markdown</em>text</p>, making it impossible to recognize by nicegui.js to do the substitution...

@rodja
Copy link
Member

rodja commented May 27, 2025

Why do you need a cache name, @evnchn? Would not an internal uuid be sufficient? Maybe something like

# on the hidden auto-index page, or anywhere else
with ui.column() as my_content:
   ui.label('A')
   ui.label('B')

@ui.page('/')
def index():
    ui.label('main')
    my_content.from_cache()

@ui.page('/other')
def other():
    ui.label('other')
    my_content.from_cache()

@evnchn
Copy link
Collaborator Author

evnchn commented May 27, 2025

In my API design, an element which wishes to be cached must, despite being initialized many times in the code (so as to conform with the rest of the NiceGUI framework), share a consistent identity (.cache_name) and a consistent content (checked by the Browser Data Storage mechanism).

Since if you consider my case, where the cached element is inside the page decorator, which is run on every page load, I don't see any invariant that I can use, besides the cache name. I am considering, if I have element.cached = True, that the SHA256 hash be automatically set as the cache name Nevermind since it breaks, but you see that the concept of a cache name permeates my PR design as the identifier to find and always cache the correct element.


In your API design, however, you are recommending to define element once in auto-index page, and then copy the element into the definition of different pages.

This could be a workable approach, but this PR does not explore this (I never did copy anywhere).

However, last time I heard, NiceGUI elements cannot be easily copied (I think mentioned in #4656 and somewhere else as well)?


In conclusion, our mindsets are different.

  • I focus on "deduplication of JSON content across page loads"
  • You focus on "deduplication of element definition across pages"

I can perhaps explore your idea, but I can imagine it'll be harder than this PR, since we're messing with the NiceGUI element defintion, not the transport (as in this PR).

@falkoschindler
Copy link
Contributor

@evnchn I think @rodja doesn't want to copy elements. Instead from_cache() would insert some kind of marker with enough information for the client to insert the element from memory.

@evnchn
Copy link
Collaborator Author

evnchn commented May 27, 2025

Having SHA256 has the automatically populated cache name doesn't work.

Assume I have this in elements.py

def cached(self) -> Self:
    """Mark the element as cached.

    This will store the element's data in the browser data store.
    The element can then be referenced by its cache name.
    """
    self.cache_name = f'auto_{hash_data_store_entry(json.dumps(self._to_dict_internal()[0]))}'
    self._populate_browser_data_store_if_needed()
    return self

Then, note what happens in the browser console.

{D89902C6-A8EF-4B4F-B096-68226F0CB389}

After a couple updates to the content, we see that we have more and more old unused cache entries. This is because, when the element is created again with the new content, it has a new cache_name, and the old entries are thus not updated, and left in memory.

Test script:

from nicegui import ui, app
import uuid

if not app.storage.general.get('special_uuid'):
    app.storage.general['special_uuid'] = str(uuid.uuid4())


@ui.page('/')
def main_page():
    markdown1 = ui.markdown("# This is markdown element 1 " + app.storage.general['special_uuid'])
    markdown1.cache_name = 'long_markdown_cache'  # makes it cached

    markdown2 = ui.markdown("# This is markdown element 2 " + app.storage.general['special_uuid']).cached()  # auto-cached
    print(markdown2.cache_name)

    # Button to change the special UUID and reload the page
    ui.button('Change UUID and reload', on_click=lambda: (app.storage.general.update({'special_uuid': str(uuid.uuid4())}),
                                                          ui.navigate.reload()))


ui.run()

This goes to show the importance of an unchanging cache_name for element for caching to properly work, otherwise cache invalidation becomes an issue, which is "one of two hardest things in computer science" 😅

@evnchn
Copy link
Collaborator Author

evnchn commented May 27, 2025

And therefore, from_cache() can't possibly be enough to insert the appropriate marker (aka cache_name) for caching to occur, since it has no invariant information that stays the same despite element initialized many times, with potentially slightly-changed content in the element definition accounting for content updates.

@falkoschindler
Copy link
Contributor

I'm still thinking about a way to utilize the browser change. Can't we define some element like

with ui.cached_element([id: str]):  # working title
    ui.markdown('A very big element...')

which overwrites _to_dict to (1) create a route serving the content dictionary and (2) to inform the client to fetch the content from this route. Then the client automatically uses its cache when possible and we don't have to manage it on the server. And an optional cache ID could be used to share the element across multiple pages. Alternatively it can be generated as the hash of the content dictionary.

I'm far from sure this can work. But it seems simpler than the current PR.


Actually, the same concept should be possible with a .cache([id: str]) method. Independent of whether to apply a cache to an element or using a container element for the same purpose, I guess my point is whether we need a server-side data storage or if we can simply generate content on demand.

@evnchn
Copy link
Collaborator Author

evnchn commented May 27, 2025

Can we first agree that we need some invariant cache_name / ID, that is NOT dynamically derived from randomness / content being cached? Therefore .cached() and .from_cache() wont suffice.

If you're OK with this agreement, 👍 and we can move on, or else 👎 and I will continue my half-written analogy to (hopefully) get you to understand)

@rodja
Copy link
Member

rodja commented May 27, 2025

Can we first agree that we need some invariant cache_name / ID, that is NOT dynamically derived from randomness / content being cached? Therefore .cached() and .from_cache() wont suffice.

Hmmm.... I thought about using the Python id of the element (self.cache_name = id(self)) or a uuid generated on first use (self.cache_name = uuid4() if self.cache_name == None else self.cache_name).

@evnchn
Copy link
Collaborator Author

evnchn commented May 27, 2025

(1) create a route serving the content dictionary and (2) to inform the client to fetch the content from this route

This was exactly what I was thinking of in #4794 (comment), but it's not the best design, since you need to make another access to the endpoint to see if you can use such data, slowing page load down (and potentially causing TOCTOU - Time Of Check to Time Of Use errors, since you did the check 100ms later than the initial page load)

This is why in my PR, I shove things into the cookies, in a sense that when you access any NiceGUI page, it basically bundles the response of the responses that the API would otherwise have to make

But I will ponder the API design of with ui.cached_element([id: str]):. Seems nice, but the issue is, if you notice with this PR, that if you currently set .cache_name for the parent element, it does nothing to its childrens (since we also can't easily replace the children in nicegui.js, since the JSON.parse reviver works on individual value, and the parent&child would otherwise span several values.

Perhaps with ui.cached_element([id: str]): can auto-set the cache_name of the childrens. I'll look into it, but IMO the case of repeated childrens, you can just always manually assign the children some cache_name...

@evnchn
Copy link
Collaborator Author

evnchn commented May 27, 2025

@rodja Using those dynamically generated IDs, which are ephemeral in nature, as it stands in this PR, will not work. Let me offer you 2 counter-examples:

Using id(self)

Since the element was initialized many times, in Python it has different IDs. It means we would create a new key in app.browser_data_store every time, but we don't clean up the old one.

Using str(uuid4()))

Since the element was initialized many times, it has different UUID. It means we would create a new key in app.browser_data_store every time, but we don't clean up the old one.


HOWEVER I just noticed a possibility: What if when the element is deallocated, we automatically delete the entry in app.browser_data_storage? That way, we can use ephemeral IDs, and make the API simplier. However, I am not sure if we will face the issue of, we're trying to delete the key while writing to the same key in another thread? (I don't know asyncio enough to answer this question...)

Wait: if you keep changing where the key is placed in app.browser_data_store, then no caching occurs! It won't work, then.

@evnchn
Copy link
Collaborator Author

evnchn commented May 27, 2025

OK, analogy time. It'd be a "bedtime story" when you read this PR at 12am.

Suppose Zauberzeug GmbH hires me as a security guard, and my job is to remember everyone's faces to let people in.

I need to know everybody's name (like I can tell Rodja is Rodja by the name Rodja, Falko is Falko by the name Falko), such that I can associate the face with the name.

  • When the face of whoever is at the front door matches the face I recall by the name, I go "hey {name}, come on in"

  • When the face changes (say Falko shaved), I then need to re-associate this new face with the name Falko, and delete the old face from my memory, to keep my brain from exploding due to out-of-memory.

Imagine I don't have names to my disposal.

  • Then, when I see a new face, I can't delete the old face, since I think it's a new one!

This analogy maps the concept as follows:

Objects

  • Zauberzeug GmbH -> NiceGUI website
  • Security Guard -> the new Browser Data Store feature
  • Brain of Security Guard -> app.browser_data_store
  • Face -> Content of elements
  • Name -> cache_name (the invariant)

Actions

  • "Associate the face with the name" -> Build up Browser Data Store, key cache_names value element_dict_static
  • "face at the front door matches face by name" -> Encounter element with same element_dict_static during outgoing page transmission
  • "hey {name}, come on in" -> Simplify the transmitted JSON data
  • "face changes" -> Content of cached element changes
  • "Re-associate this new face with the name and delete old face from memory" -> Update app.browser_data_store by key element.cachen_name value element_dict_static, overwriting the old value in the process
  • "don't have names to my disposal, when I see a new face, I can't delete the old face!" -> Failure to invalidate old cache, when use .cached(), .from_cache(), or ephemeral ID implementations, due to missing invariant.

@falkoschindler
Copy link
Contributor

I just sat down with Rodja and tried to identify the different aspects of this new feature. Let's check if we're on the same page:

Use cases

We need to distinguish what use cases we want to address:

  1. Cache unchanged (static) content when reloading a page
  2. Cache same static content on multiple pages
  3. Cache single elements, or even element hierarchies

API

So far we have discussed two or three different APIs. While the APIs 1a) and 1b) need IDs, Rodja's proposal should work without them because he's referring to the original object itself.

  1. Recognize identical elements via user-defined IDs:
    1a) Use a new caching element as container:
    with ui.cached_element(id=...):
        ui.markdown('My content')
    1b) Cache the element itself with a new method:
    ui.markdown('My content').cache(id=...)
  2. Re-use once-defined element objects:
    my_content = ui.element()
    
    @ui.page('/')
    def page():
        my_content.from_cache()

Storage

You seem to create an extra data store for all cached content on the server. My idea was to let cached element define an endpoint serving their content on demand, so that the client can fetch it whenever it needs to.

  1. Use a server-side data store
  2. Provide HTTP-cachable endpoints

@rodja
Copy link
Member

rodja commented Jun 7, 2025

So, immediately, we need some token, or else we'd be replacing plain strings, lists, and dicts, even when no replacement is desired.

I had a hard time to understand your statement and only after contemplating "user's injection attempts" and the analogy to CSRF I now might have an idea of what you mean. Let me rephrase in my own words:

The server sends a json object with elements to the browser and needs a way to mark parts which should be loaded from browser storage. A malicious user might sneak some string into the persistence, which then gets loaded on a new page and tricks the javascript in thinking that it should load some browser storage data. Right?

But why not use a well known key like "CACHE" to mark cached elements. They appear in the json structure as placeholders for entire objects/arrays or root-level values. So even if user enters "CACHE":evil_key" I don't see how the js could confuse this with a cache loading instruction because the entered text will only appear as string value in a property.

Can enable caching for all child elements, if pass .cache(..., apply_to_child=True)
it is possible to deviate from parent by manually calling cache() or disable_cache()

What would be scenarios where you want to cache the parent but not the children? It would simplify the API if we always cache children (and hence get rid of apply_to_child and disable_cache).

Are [the client.fetch_*] methods, which doesn't even work half of the time, safe to be deleted, now that we have element.cache?

We are always looking for minimal, straight forward APIs. So, yes I think its good to remove these methods.

@evnchn
Copy link
Collaborator Author

evnchn commented Jun 7, 2025

Let's gather for a bit, since dropping client.fetch_* methods shifts the landscape.

Originally:

  • Need to detect placeholders in any level of the JSON hierarchy, of which it is user-controlled
  • Need to have a random token to discern
  • Cached elements leverages the above mechanism

For example:

ui.json_editor(USER_CONTROLLED_DICT) can be mistaken as ui.json_editor(ui.context.client.fetch_dict_from_browser_data_store('EVIL_KEY')), if USER_CONTROLLED_DICT turns out to be {'CACHE': 'EVIL_KEY'}


Now:

  • We drop the client.fetch_* methods, and now we need a dedicated mechanism for replacing placeholders for elements (and that only)
  • So, placeholders occur in root-level of the JSON hierarchy only, of which it is non-user-controllable (minus some cursed elements which override _to_dict() to spit out user-controlled content
  • We can discern by a simple CACHE key in the element's dictionary

I think this is getting somewhere soon. Let me work on it later.

@evnchn
Copy link
Collaborator Author

evnchn commented Jun 7, 2025

What would be scenarios where you want to cache the parent but not the children? It would simplify the API if we always cache children (and hence get rid of apply_to_child and disable_cache).

I was thinking about this

for _ in range(10): # point 2
    with ui.card().cache('repetitive-card', apply_to_child=True)
        ui.label('This is some boilerplate label that barely changes') # and some other elements which don't really change a lot
        ui.label(dynamic_data()).disable_cache() # point 1
  1. If dynamic_data() will be too large to fit in localStorage, .disable_cache() marks it out of the cache, and it'd work
  2. If we disable the warning in b3416e6 to allow the same cache to be used for multiple elements, then we can have several of these cards, and we share the rest of the boilerplate in the ui.card() definition, while the dynamic_data() remains different across cards.

@evnchn
Copy link
Collaborator Author

evnchn commented Jun 7, 2025

Proposed new signature:

  1. apply_to_child=True as a default for .cache(). Adopt manual opt-out for by disabling cache as below.
  2. For .cache(), accept three-way:
    a. .cache(name): Explicit name
    b. .cache(): Automatic name derived from hash
    c. .cache(False): New, for disabling cache (does the old functionality of .disable_cache())
  3. It is possible to opt-out of caching children by putting those elements inside a placeholder div with cache disabled
with ui.card().cache('cache-just-the-card'), ui.column().cache(None):
    ui.label('This label is not cached, since the `ui.column().cache(None)` stops the propagation of applying the cache to the child')

@rodja
Copy link
Member

rodja commented Jun 7, 2025

Ok. The main reason for not wanting to cache something is the 5 mb limit in the browser. The documentation should educate the users about that limitation. And I'm curious what @falkoschindler thinks about this API. Maybe we should use the plural apply_to_children.

@evnchn
Copy link
Collaborator Author

evnchn commented Jun 8, 2025

Very interesting development so far.

Status:

  • SVGs are mostly cached, except the variants of the happy face SVG with differing classes, which since I used automatic cache names, pops in and out of the cache as the elements flip-flip from being present and absent across pages.
  • Sidebar hierarchy caching is not working, since the nodes are in key nodes of props dictionary, and on an element level, we can't cache the element. <- Point of discussion
  1. Apparently, under the old code, if we update the element in any way via the Outbox, since we call element._to_dict() with nothing else passed, it means that we are not caching the element definition, and we'd send the entire element, making the caching useless if we want to use most of all functionalities which requires writing to the props of the element, including tree.expand of the sidebar hierarchy.
    a. In 9c3fcfa, you see that we serve the CACHE short-form no matter what, and it is now the responsibility of the user to register dynamic keys, in such a manner that they opt-out that part from the caching and into the transmission (still thinking about this since this is quite suboptimal... I am thinking if we do the check to see if the cache content is old, and send the full definition)
    b. Note that the nodes are in key nodes of props dictionary. We would otherwise need to deep-diff the two dictionaries (even more complex than Project BASE's once-had diffing in 21c2fad, which checks the root keys only.
  2. Apparently, there are 4 variations of the NiceGUI happy face SVG as used in the NiceGUI documentation website, as we applied .classes() shortly afterwards.
    a. Right now you see me use the automatic .cache(), since it would blow up if you do .cache('name'), or the elements would be different but share the same cache name. Issue is that, since not all pages have all 4 variations, the variations which do not occur in every page, they come in and out of the cache (since right now we drop the element from the cache immediately when it is not used)
    • I am thinking if we should not drop automatic cache name elements, and instead we clear the oldest unused elements when they are soon over 5MB total
    • We could manually assign cache names on each element like svg.face().classes(...).cache(...), but that is a tad bulky

@rodja Do you think we should re-think whether we do the fetch_* methods, otherwise we can't cache the sidebar hierarchy without much headache? Since we repeat content on a data-level, and element-level two ui.trees expanded differently are technically two different elements. Thanks.

@rodja
Copy link
Member

rodja commented Jun 8, 2025

I would rather not like to go back to tokens and the fetch_* methods. Let's explore other possibilities. What about allowing each element to overwrite .cache to define a custom cache strategy? ui.tree would produce just nodes, the SVG underlying ui.html would use innerHTML and so forth.

@evnchn
Copy link
Collaborator Author

evnchn commented Jun 8, 2025

A custom cache strategy per element is a very interesting idea indeed, especially when the old fetch_* methods suffer when the cached data is mutated before making its way to the JSON definition (e.g. ui.markdown as we have seen)

I'll take a look at that idea

Meanwhile, I have these thoughts:

  • We should not address the pain point of transmitting the entire element definition in the Socket.IO communication (aka in the Outbox stage) in this PR, since I think if we should enhance that aspect, then it may apply to non-cached elements as well.

@evnchn
Copy link
Collaborator Author

evnchn commented Jun 9, 2025

Initially, I'm quite perplexed by the idea of a "cache strategy" since that would imply executable code on the browser, so each element need to have its own JavaScript snippet, which hardly seems ideal.

Now, I'm thinking perhaps the "strategy" is not executable code, but a list of props root keys for which (1) value composes of most of the element's size, (2) unlikely to be mutated, (3) when mutated, it could be argued that it's an entirely different element. In this case, for ui.tree, it'd be "nodes" in props.

I think that this strategy makes more sense, but we need to make this for every element, which could be quite a burden and could take a while (or I can ask some LLMs to help once I've laid out the groundwork). Still it's better than the only-working-sometimes fetch_* I guess

@evnchn
Copy link
Collaborator Author

evnchn commented Jun 9, 2025

@rodja Thank you for your pointer in the right direction! Now we are back to the 30KB -> 18KB for documentation (specifically, testing on /documentation/section_text_elements), while:

  • No token nonsense, since we act on root keys only (I tapped onto replaceUndefinedAttributes, after all)
  • Able to overcome the different classes of the 4 variations of the NiceGUI happy face SVG, by excluding it from the cached keys with happy_face_svg_element.dynamic_keys.add('class'), meaning the classes will be dynamic across cache instances, and so we share 1 cache.
  • Able to cache ui.tree and ui.html post-change, with tree.static_prop_keys.add('nodes') and html_element.static_prop_keys.add('innerHTML')

It went much smoother than what I expected. Shall we review the API? I think manually calling .add on the sets breaks the builder pattern, and we should also review the unified .cache() API at #4796 (comment)

@rodja
Copy link
Member

rodja commented Jun 10, 2025

Why are you using static_prop_keys.add(..) in the user code (namely the documentation website). Would it not be much better to do this per Element in their respective constructors? I think it's more of an implementation detail and should not be cared about by normal developers.

@evnchn
Copy link
Collaborator Author

evnchn commented Jun 10, 2025

Just to be clear, it is not expected for userspace code to write to static_prop_key and dynamic_keys

Hopefully, each element should have its sensible default, and manual configuration should be rare and far between.

However, the need to configure still exists (one shared element with shared styles, or the element may have differing styles, as we have seen with the happy face SVG), so there still needs to be some configurability, but not often.

For that, I think we'd need to builder pattern to continue, so we may need some helper methods in the same calling manner as classes and props.

So I'm waiting for confirmation before adding caching awareness to all elements, since the workload would be huge.

@falkoschindler falkoschindler added the in progress Status: Someone is working on it label Jun 10, 2025
@evnchn
Copy link
Collaborator Author

evnchn commented Jun 11, 2025

6ecb1c5 shows caching applied to ui.tree, and ui.html (and other elements from ContentElement, though their docstring is not updated)

For caching of all elements to work*, I have to set static_props_keys for all of the elements.

So that's why I am hesitant on continuing, and I'm not sure if this PR is in progress if that's the case @falkoschindler

*by work, I mean actually doing caching. An element which haven't set the static_props_keys, when cached, will cache not a lot of stuff. It won't grind the page to a halt, but it'd be ineffective.

@rodja rodja self-requested a review June 12, 2025 03:55
Copy link
Member

@rodja rodja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the downside this implementation adds significant complexity across multiple core aspects of NiceGUI:

  • Backend: New caching logic in Element, Client, and App classes
  • Frontend: Complex localStorage management with hash synchronization
  • Communication: Cookie-based cache invalidation protocol

Still, I think this complexity justified because the caching can significantly improve loading performance and other approaches like the discussed http-based element caching would need multiple requests.

Here are some thoughts from looking closer at the current implementation:

  • For element.py it would be great to explore an architecture, where we do not need to introduce six new members for every element (even if they are not cached). Maybe a caching object per client which takes all the responsibility?
  • Naming of _to_dict_internal and _to_dict is not intuitive. Also _to_dict_internal returns a complex datatype which might be better off described in a data class (if needed at all).
  • Cookies have 4KB size limits; could the hash data exceed this?
  • In nicegui.js:
    • Synchronous localStorage` operations might freeze UI
    • could the async hash computation lead to inconsistencies?
  • we need tests to ensure caching works as expected
  • how can we help developers to understand / manage cache size limits

@evnchn
Copy link
Collaborator Author

evnchn commented Jun 16, 2025

All good suggestions above. I generally have no disagreements with it.

For the implementation wise:

  • I'll see if I have time left on the table for doing it in the weekends? Weekday's busy...
  • You can definitely edit my branch as maintainers.
  • To anyone reading, you can actually PR into my branch at https://github.com/evnchn/nicegui/tree/browser-data-store, and I'll review it ASAP (or just let it pass). This enables us to contribute to this PR.

As we have seen, this simple concept has grown to something huge (has been a tendency so far but it's good I think).

Let's let this sit in the oven for a bit longer 💪

@evnchn evnchn marked this pull request as draft August 7, 2025 05:06
@evnchn
Copy link
Collaborator Author

evnchn commented Aug 7, 2025

Between internship, university, this, and #4900, I need quite a long period of time to sort this out.

Also, given SPA in 2.22.1, NiceGUI is fast enough in most use cases already.

Let's focus on other PRs in preparation for 3.0 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Type/scope: New feature or enhancement in progress Status: Someone is working on it 🟡 medium Priority: Relevant, but not essential
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants