Pre-tokenization step outdated

Dear ETHOS Team,

First, congratulations on the project, it’s truly interesting work, and thank you for making it available to the community.

Following the README instructions, I’m trying to convert MIMIC-IV to the MEDS format. However, I’m unsure which instructions I should follow: those in the root folder’s README or those in scripts/meds/mimic/README.

In both cases, running either run_mimic.sh (from the root instructions) or run.sh (from the scripts folder) results in an error originating from the pre_MEDS.py script. The issue seems to come from these import lines:

```
from MEDS_transforms.extract.utils import get_supported_fp
from MEDS_transforms.utils import get_shard_prefix, write_lazyframe 
```

These functions no longer appear to exist in the current MEDS_transforms package. It looks like they were part of earlier releases.

Given this, I’m not sure what the intended workflow is for converting the data to the MEDS format with the current version of the code. Could you clarify which steps or scripts should be used?

Additionally, I wanted to ask why the pipeline does not rely on the official conversion pipeline at:
https://github.com/Medical-Event-Data-Standard/MIMIC_IV_MEDS
What are the differences between using the MEDS format produced by your pipeline versus this one?

Any guidance would be greatly appreciated. I’m looking forward to reproducing your work.

Thank you in advance, and happy holidays!

Nahuel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pre-tokenization step outdated #19

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pre-tokenization step outdated #19

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions