Skip to content

Update bulk image conversion script to support multiple formats #226

Closed
Anushlinux wants to merge 2 commits intoUdayraj123:devfrom
Anushlinux:Anushlinux-dev-convertor_hook
Closed

Update bulk image conversion script to support multiple formats #226
Anushlinux wants to merge 2 commits intoUdayraj123:devfrom
Anushlinux:Anushlinux-dev-convertor_hook

Conversation

@Anushlinux
Copy link
Copy Markdown

@Anushlinux Anushlinux commented Oct 3, 2024

I have updated the conversion code and implemented my function to the convert_image_hook as you required

Pls review the code
(ps: sorry for deleting the previous pr i did it bimistake)

Comment thread scripts/hooks/convert_images_hook.py Outdated
Comment thread scripts/hooks/convert_images_hook.py Outdated
@Anushlinux Anushlinux requested a review from Udayraj123 October 5, 2024 21:07
Copy link
Copy Markdown
Owner

@Udayraj123 Udayraj123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Anushlinux you have understood and handled a lot of code flows(in local scripts) to OMRChecker. Great work on your first PR! 🎉
I've finished my review and I think we will need some discussion to make it even more extensible. Let's discuss async in the discord channel once you go through my comments.

Comment thread scripts/local/utils/bulk_ops_common.py

# Usual pre-processing commands for speedups (useful during re-runs)
# python3 scripts/local/convert_images.py -i inputs/ --replace [--filter-ext png,jpg] --output-ext jpg
# python3 scripts/local/resize_images.py -i inputs/ -o outputs --max-width=1500
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep all these comments as a guide for the contributors to figure out what needs to be done here

Comment thread scripts/local/utils/bulk_ops_common.py
Comment thread scripts/local/utils/bulk_ops_common.py
Comment on lines +85 to +88
pages = convert_from_path(input_path)
for i, page in enumerate(pages):
output_path = os.path.join(output_dir, f"page_{i + 1}.jpg")
page.save(output_path, 'JPEG')
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also include file name here

Suggested change
pages = convert_from_path(input_path)
for i, page in enumerate(pages):
output_path = os.path.join(output_dir, f"page_{i + 1}.jpg")
page.save(output_path, 'JPEG')
pages = convert_from_path(input_path)
input_path = Path(PurePosixPath(input_path).as_posix())
file_name = input_path.name
for i, page in enumerate(pages):
output_path = os.path.join(output_dir, f"{file_name}_page_{i + 1}.jpg")
print(f"Saving page: {output_path}")
page.save(output_path, 'JPEG')

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also convert_image and convert_pdf_to_jpg should come from image_utils instead of bulk_ops_common.py.
I am thinking of keeping this file only for operations that are common across all sorts of bulk scripts (bulk resize, convert, watermark, etc)

"""
pages = convert_from_path(input_path)
for i, page in enumerate(pages):
output_path = os.path.join(output_dir, f"page_{i + 1}.jpg")
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feature extension idea here, see if you can implement -
Let's add a support to extract only the first page based on a flag (in that case the image's output_dir shouldn't be created and the single image should be output directly with same filename without the page_ prefix.

os.makedirs(output_dir, exist_ok=True)
extensions = ['png', 'jpg', 'jpeg', 'tiff', 'tif', 'pdf']

filepaths = walk_and_extract_files(input_dir, extensions)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snake case

Suggested change
filepaths = walk_and_extract_files(input_dir, extensions)
file_paths = walk_and_extract_files(input_dir, extensions)

Bulk converts images and PDFs to the specified format.
"""
os.makedirs(output_dir, exist_ok=True)
extensions = ['png', 'jpg', 'jpeg', 'tiff', 'tif', 'pdf']
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep pdf handling in a separate script (check other comments)

Suggested change
extensions = ['png', 'jpg', 'jpeg', 'tiff', 'tif', 'pdf']
extensions = ['png', 'jpg', 'jpeg', 'tiff', 'tif']

Comment on lines +100 to +104
for input_path in filepaths:
relative_path = os.path.relpath(os.path.dirname(input_path), input_dir)
output_subdir = os.path.join(output_dir, relative_path) if not in_place else os.path.dirname(input_path)
os.makedirs(output_subdir, exist_ok=True)

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The in_place flag handling can be a common bulk ops utility. I propose we define a function bulk_apply_on_files(func, input_dir, file_paths, in_place)

  • The first arg func can be convert_image or convert_pdf_to_image
  • The bulk_apply_on_files applies this passed function with the same arguments
    • if in_place is True and the conversion was successful, the input file gets removed (pdf or image) and the output files are added in the same directory
    • if in_place is False and the conversion was successful, the output files are added in a relative directory output_dir = f"{input_dir}/outputs/"
    • If the conversion was unsuccessful, no file changes happen

Let's discuss more about this in the discord group if needed.


converted_count = convert_images_in_tree(args)
trigger_size = args["trigger_size"]
converted_count = convert_images(args.filenames, args.format.upper())
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we can call the function bulk_apply_on_files(convert_image, input_dir, file_paths, in_place) or something similar based on our discussion (check other comments below)

@Udayraj123
Copy link
Copy Markdown
Owner

will be closing for inactivity. Feel free to reopen or if someone else wants to continue this task.

@Udayraj123 Udayraj123 closed this May 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants