Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition when building single instance App gui #532

Open
etheller opened this issue Apr 23, 2024 · 0 comments
Open

Race condition when building single instance App gui #532

etheller opened this issue Apr 23, 2024 · 0 comments

Comments

@etheller
Copy link

etheller commented Apr 23, 2024

I'm working on a project I did not create that uses the remi library. We encountered a problem where the application malfunctioned in multiple unexpected ways. I investigated the issue and believe that I have pinpointed the problem to a race condition in this remi Python library. Although I may remain busy and forget to check back to this ticket, because I believe I have found a workaround, here is a description of the issue:

On the following line of code:
https://github.com/rawpython/remi/blob/master/remi/server.py#L646
... it appears that the system is trying to prevent the case of two UIs building simultaneously if there is only to be one client and the two clients share an update_lock. However, immediately preceding this, there is a non-threadsafe function called _instance() being called which locates or possibly creates the update_lock and some other components if none already exists. [This is assuming a simple application that does not use multiple sessions or instances of any kind, so all users who view the page are meant to see the same thing.]

Because _instance() looks inside the global clients list without any kind of locking mechanism, my current theory for the cause of the issue in the software I am working on is that two different calls are getting made to _instance() simultaneously from two threads for two incoming TCP connections in the default threaded server. So, essentially, if there are two users viewing the remi page in browser(s) simultaneously, and then the custom remi app is killed and restarted, there is some probability of the bug and the race condition may manifest.

After the two simultaneous calls to _instance() each mistakenly think that they are the first call, and each construct their own entry in the global clients list despite using session 0 and meaning to have one session for all users, the two different callers then each lock on their own update_lock instead of one update_lock for the one global session per the software's original intention. This, in turn, causes the gui to log the built UI line here twice instead of once despite the software using multiple_instance=False. After this, main() is called twice and then the idle() loop begins to spin twice at once, and as it happens with the software I am working on then the idle() loop that was expected to have only one running instance ends up spinning up a second time in the race-condition-spawned instance of the GUI and wreaking havoc on some internal program state that was only meant to communicate with one instance of the remi GUI.

Perhaps this bug was not previously reported because in some situations and on some browsers, encountering the race condition may be quite unlikely. To boost the likelihood of the issue (if you want to reproduce it), there are some possible actions to take: I found that running some small simple/basic nearly empty remi GUI would normally not show the issue, but would frequently show the issue if I connected to it several times from the same browser but where at least one of the browser tabs was not using my-private-network-hostname:8081 but instead some-other-hostname:8083 where port 8083 on this other hostname was routing back to 8081 on the actual remi server by using an SSH tunnel (ssh -NL \*:8083:localhost:8081 my-private-network-hostname). Perhaps having the same browser connect twice but while the browser thinks that these are two independent hosts increases the likelihood of near-perfectly-simultaneous connections and allowed us to manifest this uncommon issue.

Using the above described test to dramatically increase the likelihood of encountering the issue -- and some trimmed-down basic remi gui that used multiple_instance=False and displayed a basic textbox in a simple App subclass launched with remi.start(MyApp, ...) -- I was able to reproduce the issue frequently then create a temporary workaround for my case, then see that after the workaround the issue was no longer occurring.

The workaround that I used in my class was something like this (some pseudocode from memory):

class MyApp(App):
    instanceLock = threading.RLock()

    # ... some functions of MyApp here

    def _instance(self):
        with MyApp.instanceLock:
            App._instance(self)

By doing the above, and wrapping the private _instance() method in a global mutex lock copied off of the update_lock inside remi itself, I was able to get things running in a state where even if perhaps the threading is actually not wisely constructed and not perfectly robust, the probability of simultaneous creation of 2 instances of the GUI appeared to afterwards be zero.

It is possible that I will not have further time to follow up on this issue, but I wanted to report it here in case the original remi developer(s) wish to incorporate some better, more general case solution to this problem into the original library so that no one else must waste their time on this same problem again in the future. Thank you, and good luck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant