-
Following on from #2953, I wanted to be able to change metadata in a set of table rows without having to validate/reencode the existing metadata. While there is a function for doing this for a single row, Is there a good way to do this for a set or rows? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
The most efficient way to change multiple rows is probably to add them onto the end and then run the code in #2953: import tskit
import numpy as np
def change_ragged_order(ragged_arr, ragged_offset, new_order):
# returns a tuple of a new ragged_col in a new order and a new offset array
ranges = np.array([ragged_offset[:-1], ragged_offset[1:]]).T
new_ranges = ranges[new_order, :]
idx = [np.arange(l, r, dtype=ragged_offset.dtype) for l, r in new_ranges if l != r]
select = [] if len(idx) == 0 else np.concatenate(idx)
return ragged_arr[select], np.insert(np.cumsum(np.diff(new_ranges, axis=1)), 0, 0)
def change_metadata(new_md_dict, table):
if table.metadata_schema.schema is not None:
for k, v in new_md_dict.items():
new_md_dict[k] = table.metadata_schema.validate_and_encode_row(v)
data = [table.metadata]
# add a list of new byte arrays, then concat
data += [np.array(bytearray(v), dtype=table.metadata.dtype) for v in new_md_dict.values()]
tmp_offset = np.cumsum([len(d) for d in data], dtype=table.metadata_offset.dtype)[1:]
tmp_offset = np.concatenate((table.metadata_offset, tmp_offset))
tmp_md = np.concatenate(data)
new_row_ids = np.arange(len(new_md_dict)) + table.num_rows
idx = np.arange(table.num_rows)
idx[list(new_md_dict.keys())] = new_row_ids
d = table.asdict()
d["metadata"], d["metadata_offset"] = change_ragged_order(tmp_md, tmp_offset, idx)
table.set_columns(**d) Test:
|
Beta Was this translation helpful? Give feedback.
-
Note that you can change individual row metadata by simple assignment. I had forgotten this! https://tskit.dev/tutorials/tables_and_editing.html#minor-edits tables.individuals[1] = tables.individuals[1].replace(metadata={"name": "Robert"}) |
Beta Was this translation helpful? Give feedback.
The most efficient way to change multiple rows is probably to add them onto the end and then run the code in #2953: