Idiomatic DML/CRUD? #8585
-
|
Sorry, I'm not quite sure how to frame that title, but in a nutshell, as my data changes I frequently find myself having to tear down and replace data wholesale. My current pattern is basically:
Incidentally, this works for me for most cases, since a lot of my data is static, but there are some datasets I am frequently modifying, and it feels "wasteful" to have to scrap entire tables if all I'm doing is deleting or modifying the occasional record. Am I looking at this wrong? Is there a "better" way to handle basic CRUD operations (including using the SQL interface); if not now, is it planned, or am I misusing/abusing DF by trying to do this in the first place? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
|
Here are some ideas Insert into the MemTableIf you have a This is the API that gets used to implement SQL Directory of FilesIf your data is in files in a directory, you can write new files into that same directory and they will be picked up on the next read (aka "hive" style) using ListingTable (e.g. Make your own TableProviderDepending on where your data is coming from, rather than creating a new MemTable for it, you could potentially implement your own TableProvider that would return the data on demand -- depending on what you are doing you could also implement predicate / projection pushdown too. |
Beta Was this translation helpful? Give feedback.
Here are some ideas
Insert into the MemTable
If you have a
MemTablealready, you can use the MemTable::insert_into API to add new batchesThis is the API that gets used to implement SQL
INSERT INTO <table> SELECT ....type queriesDirectory of Files
If your data is in files in a directory, you can write new files into that same directory and they will be picked up on the next read (aka "hive" style) using ListingTable (e.g.
read_csvwith a path to a directory)Make your own TableProvider
Depending on where your data is coming from, rather than creating a new MemTable for it, you could potentially implement your own TableProvider that would return the data on demand -- depending on what yo…