-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Script to find json mappings #7767
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
from pathlib import Path | ||
import argparse | ||
import sys | ||
import os | ||
|
||
# This script searches a binaryData folder for json mapping files. | ||
# We had hoped to convert all json files to hdf5, but this got deprioritized | ||
# because it turned out each oversegment group gets mapped to one of the ids of the group | ||
# agglomerate files on the other hand, are optimized for continuous agglomerate ids | ||
# This means to preserve the ids of some json mappings, huge and hugely inefficient | ||
# agglomerate files would be needed. | ||
|
||
|
||
def main(): | ||
parser = argparse.ArgumentParser() | ||
parser.add_argument("binary_data_dir", type=Path, help="WEBKNOSSOS binary data dir") | ||
parser.add_argument( | ||
"--plain", | ||
"-p", | ||
action="store_true", | ||
help="Print only the file names, not info output", | ||
) | ||
args = parser.parse_args() | ||
binary_data_dir = args.binary_data_dir | ||
|
||
if not args.plain: | ||
print(f"Scanning {binary_data_dir} for json mapping files...\n", file=sys.stderr) | ||
|
||
seen = [] | ||
|
||
for orga_dir in [ | ||
item for item in binary_data_dir.iterdir() if item.exists() and item.is_dir() and not item.name.startswith(".") | ||
]: | ||
for dataset_dir in orga_dir.iterdir(): | ||
try: | ||
if dataset_dir.exists() and dataset_dir.is_dir(): | ||
for layer_dir in [ | ||
item | ||
for item in dataset_dir.iterdir() | ||
if item.exists() and item.is_dir() | ||
]: | ||
mappings_dir = layer_dir.joinpath("mappings") | ||
if mappings_dir.exists(): | ||
for mapping_file in [ | ||
item | ||
for item in mappings_dir.iterdir() | ||
if item.name.lower().endswith(".json") | ||
]: | ||
realpath = mapping_file.resolve() | ||
if realpath not in seen: | ||
seen.append(realpath) | ||
size = os.stat(realpath).st_size | ||
print( | ||
f"{format_bytes(size)} {mapping_file}" | ||
) | ||
except Exception as e: | ||
if not args.plain: | ||
print( | ||
f"Exception while scanning dataset dir at {dataset_dir}: {e}", | ||
file=sys.stderr, | ||
) | ||
|
||
if not args.plain: | ||
print( | ||
f"\nDone scanning {binary_data_dir}, listed {len(seen)} json mappings.", | ||
file=sys.stderr, | ||
) | ||
|
||
|
||
def format_bytes(num): | ||
for unit in ("", "K", "M", "G", "T"): | ||
if abs(num) < 1000.0: | ||
return f"{int(num)}{unit}" | ||
num /= 1000.0 | ||
return f"{int(num)}P" | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whats the idea behind the
P
here? 🙈Edit: Maybe Petabyte? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed if this line is reached, we are at petabyte :D the function doesn’t go higher than that.