You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following a discussion in today's informatics scrum, I was thinking that it would be nice to be able to anonymize acquisition datetimes in the scans.tsv files (and potentially in sidecar JSON files). @mattcieslak thought this could be made part of the purge-metadata command.
Desiderata:
Set first scan's acquisition to 1800/01/01.
Give users the option to either anonymize the full datetime or just anonymize the date (i.e., retain the time of day).
Preserve relative timing between scans in each session.
Preserve relative timing between sessions.
The text was updated successfully, but these errors were encountered:
Here's some code I've used to do this in another project:
"""Anonymize acquisition datetimes for a dataset.Anonymize acquisition datetimes for a dataset. Works for both longitudinaland cross-sectional studies. The time of day is preserved, but the firstscan is set to January 1st, 1800. In a longitudinal study, each session isanonymized relative to the first session, so that time between sessions ispreserved.Overwrites scan tsv files in dataset. Only run this *after* data collectionis complete for the study, especially if it's longitudinal."""importosfromglobimportglobimportpandasaspdfromdateutilimportparserif__name__=="__main__":
dset_dir="/path/to/dset"bl_dt=parser.parse("1800-01-01")
subject_dirs=sorted(glob(os.path.join(dset_dir, "sub-*")))
forsubject_dirinsubject_dirs:
sub_id=os.path.basename(subject_dir)
print(f"Processing {sub_id}")
scans_files=sorted(glob(os.path.join(subject_dir, "ses-*/*_scans.tsv")))
fori_ses, scans_fileinenumerate(scans_files):
ses_dir=os.path.dirname(scans_file)
ses_name=os.path.basename(ses_dir)
print(f"\t{ses_name}")
df=pd.read_table(scans_file)
ifi_ses==0:
# Anonymize in terms of first scan for subject.first_scan=df["acq_time"].min()
first_dt=parser.parse(first_scan.split("T")[0])
diff=first_dt-bl_dtacq_times=df["acq_time"].apply(parser.parse)
acq_times= (acq_times-diff).astype(str)
df["acq_time"] =acq_timesdf["acq_time"] =df["acq_time"].str.replace(" ", "T")
# Delete the original file instead of just overwriting it, for Datalad.os.remove(scans_file)
df.to_csv(
scans_file,
sep="\t",
line_terminator="\n",
na_rep="n/a",
index=False,
)
Following a discussion in today's informatics scrum, I was thinking that it would be nice to be able to anonymize acquisition datetimes in the scans.tsv files (and potentially in sidecar JSON files). @mattcieslak thought this could be made part of the purge-metadata command.
Desiderata:
The text was updated successfully, but these errors were encountered: