Skip to content

Pre-tokenization step outdated #19

@NahuelCostaCortez

Description

@NahuelCostaCortez

Dear ETHOS Team,

First, congratulations on the project, it’s truly interesting work, and thank you for making it available to the community.

Following the README instructions, I’m trying to convert MIMIC-IV to the MEDS format. However, I’m unsure which instructions I should follow: those in the root folder’s README or those in scripts/meds/mimic/README.

In both cases, running either run_mimic.sh (from the root instructions) or run.sh (from the scripts folder) results in an error originating from the pre_MEDS.py script. The issue seems to come from these import lines:

from MEDS_transforms.extract.utils import get_supported_fp
from MEDS_transforms.utils import get_shard_prefix, write_lazyframe 

These functions no longer appear to exist in the current MEDS_transforms package. It looks like they were part of earlier releases.

Given this, I’m not sure what the intended workflow is for converting the data to the MEDS format with the current version of the code. Could you clarify which steps or scripts should be used?

Additionally, I wanted to ask why the pipeline does not rely on the official conversion pipeline at:
https://github.com/Medical-Event-Data-Standard/MIMIC_IV_MEDS
What are the differences between using the MEDS format produced by your pipeline versus this one?

Any guidance would be greatly appreciated. I’m looking forward to reproducing your work.

Thank you in advance, and happy holidays!

Nahuel

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions