Reading specific rows from a large `sas7bdat` file #42

BERENZ · 2024-09-10T09:39:38Z

Is there a way to add functionality to read specific rows from a large sas7bdat file? The issue I'm facing is that I have large SAS files (around 10GB) along with text files (an exact, flat copy of the SAS file). Based on the text file, I can specify the subset of rows that I'm interested in (around 10% of the file).

Another option is to specify a filter while reading, for example, reading rows based on a column. However, I understand that this may be more challenging to implement.

The text was updated successfully, but these errors were encountered:

junyuan-chen · 2024-09-11T14:39:03Z

Hi! Have you tried the keyword arguments row_limit and row_offset? They should allow reading just a portion of the file.

BERENZ · 2024-09-11T14:50:18Z

Hi, yes, but it would only work if the rows I want to select are in order. In my case, they're spread out over the dataset.

junyuan-chen · 2024-09-11T15:01:24Z

@BERENZ All right. Now I see your point. Filtering the rows of the data file with a general condition is not something that is built into the parser. However, a work around could be that you try to cut the file into partitions of consecutive rows that are small enough to be fit into the memory and then filter each partition one by one. The entire file is therefore still read into the memory at some point.

BERENZ · 2024-09-12T06:28:42Z

Sure, this is what I actually do nowadays (split data into chunks). I understand that to make this possible is to make changes to the underlying ReadStat C library?

junyuan-chen · 2024-09-12T06:41:12Z

Yes. For reading the files, the iteration across rows is handled within the C library and there is no such an interface to skip rows depending on the values.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading specific rows from a large `sas7bdat` file #42

Reading specific rows from a large `sas7bdat` file #42

BERENZ commented Sep 10, 2024

junyuan-chen commented Sep 11, 2024

BERENZ commented Sep 11, 2024

junyuan-chen commented Sep 11, 2024

BERENZ commented Sep 12, 2024

junyuan-chen commented Sep 12, 2024

Reading specific rows from a large sas7bdat file #42

Reading specific rows from a large sas7bdat file #42

Comments

BERENZ commented Sep 10, 2024

junyuan-chen commented Sep 11, 2024

BERENZ commented Sep 11, 2024

junyuan-chen commented Sep 11, 2024

BERENZ commented Sep 12, 2024

junyuan-chen commented Sep 12, 2024

Reading specific rows from a large `sas7bdat` file #42

Reading specific rows from a large `sas7bdat` file #42