-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
execute shell expression from within repl? #77
Comments
when you say
The general idea behind those notes in the experiments doc is that we'd like to extend TypeStream so you can store in a variable things like a "data stream" (like your example in #76) or a "list of paths" (lists are not supported yet, neither is So if I understand your suggestion correctly, you'd like to integrate "local" (as in your filesystem) data with TypeStream? Can you please provide an example of how you'd use this feature? (I just want to be sure I have enough context to think about this) |
yeah that's right. I don't have a ton of use-cases upfront, but my thought would be that allowing local structured data to be injected could be useful for filtering/aggregating/enriching purposes. I could join a local file's objects to filter to only each object that matches type and subtype and then also pair each match and return that result.
joined(key, subkey) to
returns
I feel like this could be a useful pattern, though I want to call out 'how to handle column name collision' could need better handling or possible configurability. Initially I was thinking like, csv/xml/json/kdl data formats, you would load the files up once and then use that data like a small stream, but I also think if you could run a sql query against a sqlite database, either upfront (cached results forever or re-queries/refreshes every N) or per-object, that could produce some interesting mechanics. (You could insert into the sql database too? Idk.) It would be nice if it could hot-reload on change, or operate on a wildcard pulling in all files that match the filter into the stream. |
this is very intriguing. The challenge here is how to provide a schema for external data sources that don't come with one (csv files are a good example but also sources with a "loose schema" like redis would fall here). I've been thinking about it a lot and, coincidentally, I have a meeting tomorrow that should help me push this forward. But there's more than one thing going in this issue, so let me try providing some context.
this makes a lot of sense to me. As I said, I'd want to make it as easy as possible to provide a schema for this. What you said next may be a good way of making this happen:
Interfacing with relational databases is very high on my personal list of priorities for TypeStream since it comes up a lot and, just now, you provided me with one more reason to focus on this. The basic idea here is to rely on the "filesystem metaphor". It would look like this: cat /media/sqlite3/my.db/tables/users | join /dev/kafka/cluster/topics/page_views I wrote about the implications on this approach here. There are some obvious challenges with the semantics of this (how does a table become a stream? And a stream a table?) but there are very solid solutions out there (like debezium) which, I'm sure, will provide me with the right guidance. In the context of your examples, what I'm thinking is that, once there's a "db mounting" feature in TypeStream, the workflow could be something like this:
If you're still with me (sorry I wrote a lot, I know 🤦♂️), I'd love to hear what you think about this workflow. Also very curious how you'd imagine "mount that db in typestream" look like (I have ideas but don't want to bias anyone into the way I'm thinking about it) |
It almost sounds like having a local "looking table" for aggregating with a stream. That certainly seems useful. |
that's true! I proposed the "sqlite workflow" because I've found that approach quite fast as a way to import/structure data from csv (sqlite-utils is just that good). The question that remains is how should that schema file look like of course |
I think sqlite sounds like a pretty good solution so you don't need a million adapters. I like it. I would prefer the libsql driver be used, so that the 'file' might be a url. 😈 You will want to document how to interact with a live sqlite database from 'another app' on the same file system, reading out of the file safely. I like the /tables concept. It might be convenient to allow a raw sql select too. Piping into a sqlite table, autocreate the table if it doesnt exist using the schema? Sqlite covers all my use cases. I could find use out of regular structured data but I think most of those could easily be imported into sqlite. It's not my niche, but you may find certain data types really want native columnar parquet ?or duck db? files. |
ah that's a nice one. I think this helps me shape the "mounting" concept!
Yes, I think this is the "mirror" feature of "cat /media/sqlite3/db/tables/foo" and both make sense. I haven't tried to spike this yet so I can't tell if we'll be able to ship both at the same time (not a big fat of giant pull requests :D) but I fully convinced we need both.
I say this a lot... this is what I like about TypeStream abstraction the most: once we lay down the work for "mounting sqlite", TypeStream will have enough infrastructure code that we'll be able to build new integrations (not totally convinced this is the right word but def best we have) quite quickly. Exciting times. I'm going to leave this open while I still have a "private roadmap" in my hands since there's no other place where we're talking publicly about "media mounting" |
from the repl, i'd like to get regular data to use later maybe?
is this like what FileSystem pre-compute examples are?
i could do a lot of this upfront outside type stream and then pipe into pipestream, but being able to access within the env may unlock some things.
The text was updated successfully, but these errors were encountered: