-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MWS: Use Promises for sql-tiddler-database.js #8885
Comments
Node v8 is when async await and promises was introduced. I don't remember what the minimum node version is we're supporting, but I'm assuming we're trying to support older browsers so we can't bring that into core code on the browser side. |
I still wish the tiddler file loading code in boot.js had been asynchronous because it would have allowed me to directly load wiki folders from Dropbox into a cloud connector and then used the tiddlyweb adaptor to save changes back to Dropbox. All of the file system operations had equivalent commands in Dropbox, and had boot.js been using async I/O, it would have been trivial to map them. I never made that connector because I couldn't see myself continually translating code every time there was an update. Even if the code hardly ever changes, I would never be able to keep up with checking every version to make sure nothing had changed anywhere. |
NodeJS processes are intended to run at most one per core. With TiddlyWiki that is ever more the case, as it loads a significant amount of data into memory. The thought of trying to scale that per user is slightly terrifying. Granted, you can't put too much load on a mediocre database server, but you can usually do a lot more than one connection! And if you have multiple users each working on their own section, there's no reason not to have concurrent connections. The bigger it gets the more likely you'll run into users waiting for their turn to use that one connection. And don't forget that literally the entire NodeJS process will be waiting on that one database connection. All parsing and tabulating and listing results and so forth has to happen while the connection is idle. So only one person's web request can be served at a time. From first byte in to last byte out, only one request can be served at a time. Only the actual data transfer gets offloaded. You can buffer bytes, but you can't process them (I should know how backpressure works, but I don't). |
I took a stab at it. #8888 |
There is a nice blog post about SQLite scalability: https://blog.wesleyac.com/posts/consider-sqlite, which you may find interesting. I found the link: https://use.expensify.com/blog/scaling-sqlite-to-4m-qps-on-a-single-server also very interesting, where they describe, how to get 4 million reads per second with a single server. Much cheaper and more reliable than what they could get from Amazon. There is some more performance related info at: https://sqlite.org/fasterthanfs.html -- read the full post.
Also the "latency" charts later in the post are interesting, where the SQLite reads are much faster then direct IO on Windows. Which still is the OS, that 80% of our users will use to run TW servers. Also see: 3 General Findings |
I know that Jeremy does not watch videos, but I did find this one very informative. It mainly is about HTMX, -- it has a lot of relations to TWs "values". I think we could replace HTMX with TiddlyWiki and give the exact same talk. Building the Hundred-Year Web Service with htmx - Alexander Petros |
That's not necessarily a "no go" problem. See: Write-Ahead Logging |
I'm not sure if your arguing for or against asynchronous SQLite calls. The problem I'm pointing out is that the entire NodeJS server thread is blocked while the SQLite call is being made, which means that the server can never process more than one web request at a time, so you can forget your 100 concurrent writers, because you'll never be able to have more than one.
The problem is that the current setup is blocking the JavaScript thread while the SQLite call is being made. Only one web request can be processed at a time. In order to use the WAL we still either need to use worker threads and promises or spawn multiple processes and forward requests from a parent process. I'm guessing the simplest way to handle this would be to put SQLite on worker threads using the worker-threads module introduced in Node v12 and use promises for database calls. But I guess the bigger reason I'm requesting this is because the moment I saw it, I immediately wanted to rewrite it to use Postgres, etc. It's not only about database performance. It's about being able to separate the application and database into separate processes and use whatever database you want. |
One thing I just ran into is transactions. Serving one request at a time vastly simplifies transaction handling. I think each request needs to be assigned a worker thread from a pool, which could be tuned based on server performance, and if no threads are available, a request would just wait to start until a worker is free. |
So, after some more tinkering and poking around, I've come to the conclusion that the entire request chain would have to be gutted and rewritten in order to support async IO. So much of the code relies on the atomicity of one request happening at a time that changing this requires actually rethinking the structure of the request chain. Here are some things that the request chain needs to implement.
|
I would recommend using an ORM, specifically Prisma. It includes its own bindings to the native C sqlite adapter. I'm working on seeing if we can get a wasm version as well. It takes care of all the relations and types, vastly simplifying database access, and enforcing various integrity checks. It supports all the basic stuff, plus multi field primary keys, unique indexes, relations, upserts, nested updates, generated defaults, etc. It has some limits, but I introspected the existing database setup and it didn't complain about anything. In short, it makes it harder to write bad code. And if anything is incorrect, you'll either get a compile time or runtime error pointing out exactly what's wrong. The entire client can be bundled, but installing it as a node_module is a breeze. And since we're already headed that direction with MWS, it seems like a logical route. Also, it is mostly platform agnostic so if someone wanted to switch to MySQL or postgres, really all they would need to do is change one line in the schema and regenerate the code. Because SQLite feels like the simplest subset of all database features, I don't think anything else would need to change, but I'm not familiar with SQLite enough to know that for sure. So changing to a different database would literally just be a drop-in replacement. |
After attempting to do a refactor of the MWS plugin, I think I've come to the conclusion that rewriting this for async SQLite access introduces so many different headaches that it isn't even worth it. Because sqlite is inherently synchronous, trying to do asynchronous work introduces a whole bunch of things you have to think through to prevent race conditions. There is a significant amount of code dedicated to managing users and permissions, and if someone cared enough about using a different database it's probably because they don't really have file system access, in which case they're going to need a solution for the admin wiki anyway, or it's because they're going to be running a website that functions like some form of Wikipedia, in which case they're absolutely going to totally rewrite the user management anyway, which is 95% of the complexity. That being said I don't know why anyone is trying to use the WAL. That's the exact same thing as what I was proposing. Pretty much any attempt to add any kind of concurrency I think is going to require a complete rewrite. You can't just shoehorn concurrency into an existing code base because it results in race conditions and deadlocks. Every single line of code you write has to be written with the question of how it relates to the database access. If the current code has already been done with that in mind then we're good, and we just need to hand requests off to worker threads which each open up their own copy of the database. Obviously requests which access the admin wiki don't get handed off, which satisfies the points I laid out in my previous comment. File system access still needs to be done inside an explicit begin immediate transaction. |
Can we use promises for sql-tiddler-database.js? Most modern database adapters should not be blocking the server thread, especially if doing so via a network request, since anything more than a memory store is inherently asynchronous anyway.
Even if the sqlite adaptor we use is synchronous, any attempt to use a different adapter with any other sql engine (mysql, postgres, etc) will probably be asynchronous. Some database adapters probably don't even give you a sync option and ONLY use promises.
I was digging around in the code, of course, because that's what I like to do, and when I saw that database calls were synchronous, it reminded me of all the headaches that resulted back when I was trying to implement TiddlyServer to serve multiple wikis and discovered that the entirety of the TiddlyWiki boot loader was using sync filesystem calls, making the entire server (for all wikis) freeze for the duration of any file system operation. The slower your file system, the more noticable this becomes.
Since this system is being intended as a robust mult-user scenario, there is no way we can afford to use blocking I/O for anything in the server. It will only be a matter of time before some corner case comes back to bite us and it will absolutely prevent this from being used at scale. Forcing the server to twiddle it's thumbs every time it makes a database call prevents any possibility of using modern database scaling and connection pooling since we can never make more than one connection at a time.
Now that we are implementing a new system for saving and loading tiddlers, it would be absolutely amazing if we could make this async from the start. I've seen talk of using very recent Node version features (the native sqlite adapter, if I recall), and async/await was already implemented back in ES2017. I don't really see a reason not to restrict the MWS plugin to use ES2017 at the very least, given how much the Javascript world has changed between es5 and es2017. I'm sure there's security considerations that would argue for this as well.
I feel like using Promises is super important, not only for performance, but also for the ability to build other systems around it as needed.
The text was updated successfully, but these errors were encountered: