Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paginated access to file contents #55

Open
alisterburt opened this issue Mar 14, 2024 · 2 comments
Open

paginated access to file contents #55

alisterburt opened this issue Mar 14, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@alisterburt
Copy link
Member

alisterburt commented Mar 14, 2024

@delarosatrevin mentioned that his reason for not using starfile was inability to access the file in a paginated way typical of web API's

This was an intentional design decision but I'm wondering if we could expose this in a useful way from the StarParser and what the API should look like?

I've seen a nice API for this in SQLModel that we could try to replicate here, open to other suggestions too though

e.g. to get nrows rows after a certain offset from a data block called 'block' you would do something like

with StarParser.access('file.star') as star:
    df = star['block'].offset(offset).limit(nrows)

thoughts @delarosatrevin?

@alisterburt alisterburt added the enhancement New feature or request label Mar 14, 2024
@delarosatrevin
Copy link

@alisterburt , I haven't completely explored starfile, but since it is based on DataFrames I guessed some limitations that I found common in CryoEM starfiles (or metadata in general) handling/processing.

  1. Read all datablock names (if you are just reading a star file and want to know what blocks are for further reading)
  2. Know how many rows are in each data block
  3. The issue you mentioned about pagination, allowing starting from some index and reading N rows.
  4. Iterating over the rows without reading all of them in memory, which in some cases is useful for filtering and re-writing a subset of the star
  5. Read only the table definition without parsing, which it is useful for writing a similar table with different rows

I re-wrote some tools from a previous package (emtable) here: https://github.com/3dem/emtools/tree/main/emtools/metadata

There I separated a Table from the StarFile which handles the parsing. Maybe can you reuse the parsing part to generate the data frame the starfile library? I should be trivial to generate a DataFrame from Table there and it might be helpful to other type of metadata files.

I'm happy to hear any feedback or make any changes if you find inconsistency in the API, since it is still in a very alpha stage, although I have used for myself in emhub and some Scipion plugins.

@alisterburt
Copy link
Member Author

@delarosatrevin thanks for getting back! I'll take a look :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants