Update bulk image conversion script to support multiple formats #226
Update bulk image conversion script to support multiple formats #226Anushlinux wants to merge 2 commits intoUdayraj123:devfrom
Conversation
Udayraj123
left a comment
There was a problem hiding this comment.
@Anushlinux you have understood and handled a lot of code flows(in local scripts) to OMRChecker. Great work on your first PR! 🎉
I've finished my review and I think we will need some discussion to make it even more extensible. Let's discuss async in the discord channel once you go through my comments.
|
|
||
| # Usual pre-processing commands for speedups (useful during re-runs) | ||
| # python3 scripts/local/convert_images.py -i inputs/ --replace [--filter-ext png,jpg] --output-ext jpg | ||
| # python3 scripts/local/resize_images.py -i inputs/ -o outputs --max-width=1500 |
There was a problem hiding this comment.
Let's keep all these comments as a guide for the contributors to figure out what needs to be done here
| pages = convert_from_path(input_path) | ||
| for i, page in enumerate(pages): | ||
| output_path = os.path.join(output_dir, f"page_{i + 1}.jpg") | ||
| page.save(output_path, 'JPEG') |
There was a problem hiding this comment.
let's also include file name here
| pages = convert_from_path(input_path) | |
| for i, page in enumerate(pages): | |
| output_path = os.path.join(output_dir, f"page_{i + 1}.jpg") | |
| page.save(output_path, 'JPEG') | |
| pages = convert_from_path(input_path) | |
| input_path = Path(PurePosixPath(input_path).as_posix()) | |
| file_name = input_path.name | |
| for i, page in enumerate(pages): | |
| output_path = os.path.join(output_dir, f"{file_name}_page_{i + 1}.jpg") | |
| print(f"Saving page: {output_path}") | |
| page.save(output_path, 'JPEG') |
There was a problem hiding this comment.
Also convert_image and convert_pdf_to_jpg should come from image_utils instead of bulk_ops_common.py.
I am thinking of keeping this file only for operations that are common across all sorts of bulk scripts (bulk resize, convert, watermark, etc)
| """ | ||
| pages = convert_from_path(input_path) | ||
| for i, page in enumerate(pages): | ||
| output_path = os.path.join(output_dir, f"page_{i + 1}.jpg") |
There was a problem hiding this comment.
I have a feature extension idea here, see if you can implement -
Let's add a support to extract only the first page based on a flag (in that case the image's output_dir shouldn't be created and the single image should be output directly with same filename without the page_ prefix.
| os.makedirs(output_dir, exist_ok=True) | ||
| extensions = ['png', 'jpg', 'jpeg', 'tiff', 'tif', 'pdf'] | ||
|
|
||
| filepaths = walk_and_extract_files(input_dir, extensions) |
There was a problem hiding this comment.
snake case
| filepaths = walk_and_extract_files(input_dir, extensions) | |
| file_paths = walk_and_extract_files(input_dir, extensions) |
| Bulk converts images and PDFs to the specified format. | ||
| """ | ||
| os.makedirs(output_dir, exist_ok=True) | ||
| extensions = ['png', 'jpg', 'jpeg', 'tiff', 'tif', 'pdf'] |
There was a problem hiding this comment.
Let's keep pdf handling in a separate script (check other comments)
| extensions = ['png', 'jpg', 'jpeg', 'tiff', 'tif', 'pdf'] | |
| extensions = ['png', 'jpg', 'jpeg', 'tiff', 'tif'] |
| for input_path in filepaths: | ||
| relative_path = os.path.relpath(os.path.dirname(input_path), input_dir) | ||
| output_subdir = os.path.join(output_dir, relative_path) if not in_place else os.path.dirname(input_path) | ||
| os.makedirs(output_subdir, exist_ok=True) | ||
|
|
There was a problem hiding this comment.
The in_place flag handling can be a common bulk ops utility. I propose we define a function bulk_apply_on_files(func, input_dir, file_paths, in_place)
- The first arg
funccan beconvert_imageorconvert_pdf_to_image - The bulk_apply_on_files applies this passed function with the same arguments
- if in_place is
Trueand the conversion was successful, the input file gets removed (pdf or image) and the output files are added in the same directory - if in_place is
Falseand the conversion was successful, the output files are added in a relative directory output_dir =f"{input_dir}/outputs/" - If the conversion was unsuccessful, no file changes happen
- if in_place is
Let's discuss more about this in the discord group if needed.
|
|
||
| converted_count = convert_images_in_tree(args) | ||
| trigger_size = args["trigger_size"] | ||
| converted_count = convert_images(args.filenames, args.format.upper()) |
There was a problem hiding this comment.
Here we can call the function bulk_apply_on_files(convert_image, input_dir, file_paths, in_place) or something similar based on our discussion (check other comments below)
|
will be closing for inactivity. Feel free to reopen or if someone else wants to continue this task. |
I have updated the conversion code and implemented my function to the convert_image_hook as you required
Pls review the code
(ps: sorry for deleting the previous pr i did it bimistake)