Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support reading from stdin, writing to stdout in HEIC2PNG #5

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

smontanaro
Copy link

Here's a first cut at my proposed changes. I don't do Windows (Mac & Linux–so far only tested on Mac), so you'll have to take care of seeing if things still work there. I will check to make sure it works on Linux.

I wouldn't assume I have this all correct. We should have a bit of back-and-forth in the review/comment section of the PR to make sure I haven't unwittingly broken something. At any rate, the current tests do pass.

@NatLee NatLee self-requested a review February 6, 2025 18:49
@NatLee NatLee added the enhancement New feature or request label Feb 6, 2025
eprint('Please report this issue with the full traceback.')
eprint('-> https://github.com/NatLee/HEIC2PNG/issues')

def eprint(*args, file=sys.stderr, **kwds):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider moving this function to execute before the function of cli.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you're suggesting, move the definition of eprint elsewhere, or the various checks for problems.

Copy link
Owner

@NatLee NatLee Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a problem; however, I recommend positioning the eprint function before the CLI function.


parser = argparse.ArgumentParser(description="Convert HEIC images to PNG.")
parser.add_argument("-i", "--input_path", required=True, help="Path to the input HEIC image.")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify the rationale behind removing the required flag?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the input comes from sys.stdin, there's not going to be an explicit input file, so requiring one doesn't make a ton of sense.

I'm not going to hold it up as best practice, but I have another package on GitHub, csvprogs which goes the other way. I had always supported using stdin and stdout for IO, only supporting filled in a hit-or-miss fashion. It has grown organically though, so different tools were organized differently. In a fit of virtual housecleaning, I tackled this inconsistency. I was able to come up with a relatively clean way to support both styles of consuming input and generating output (see the openio function and its usage).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. I believe it is necessary to add a check within the CLI function to verify whether input_path is None. Without this validation, the program might fail to locate the file and become unresponsive.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how Unix-style filters work. This isn't quite how heic2png is organized, but many Unix/Linux commands will read from input files or stdin. Effectively, you have a CLI which looks like this:

heic2png [ options ] [ infile [ outfile ] ]

Both the input file and output file are optional (though if you want to specify an output file you must also specify an input file, hence the nesting brackets). Ignoring the fact that your current CLI explicitly calls for -i and/or -o, I could run a more traditional Unix command like this:

heic2png ... ~/misc/myimage.heic ~/misc/myimage.png

or

heic2png ... ~/misc/myimage.heic | pngtopnm | pnmtoxwd > ~/misc/myimage.xwd

or even

heic2png ... < ~/misc/myimage.heic | pngtopnm | pnmtoxwd > ~/misc/myimage.xwd

If you look at the openio function in my csvprogs package, you'll see that's what it sorts out. If the input file has a fileno attribute, it assumes it's an open file, like sys.stdin:

>>> sys.stdin.fileno
<built-in method fileno of _io.TextIOWrapper object at 0x1008e5700>

The same goes for the output file. If not, it assumes it's a file which needs to be opened.

Clearly, if the user runs

heic2png

without any files or I/O redirection, it will appear to hang. It's not really hung though, just waiting for input from stdin. Sending in a KeyboardInterrupt solves that problem, after which hopefully the user will think to run it with the -h option. (I will tweak the help messages to make it clear what happens in the absence of the -i or -o options.)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your explanation. For this revision, I have only tested it in a non-Unix environment like Windows. I will test it later on the Mac.

open(outputfile, "rb") as outp):
# I don't really know how to compare images. If we assume that PIL
# does its thing properly, and PNG files will always be larger than
# their corresponding HEIC files, hopefully that's good enough.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be sufficient for comparing these two different images.

# TODO: Not sure this is correct, there's probably a better mapping
# between image file extensions and file formats. For example, see the
# output of `python -m PIL`.
self.image.save(output_path, format=extension.replace(".", "").upper())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could refer to this link to retrieve a list of supported formats with PIL: https://stackoverflow.com/a/71114152

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it in Python 3.12, but I'm unsure if this function works across all versions of PIL.

>>> from PIL import Image
>>> exts = Image.registered_extensions()
>>> supported_extensions = {ex for ex, f in exts.items() if f in Image.OPEN}
>>> supported_extensions
{'.wmf', '.icns', '.gif', '.grib', '.pnm', '.ftu', '.jpe', '.rgb', '.dib', '.tif', '.jfif', '.emf', '.jpc', '.pxr', '.ppm', '.dds', '.fit', '.icb', '.cur', '.pcx', '.bmp', '.h5', '.pfm', '.qoi', '.dcx', '.apng', '.iim', '.pbm', '.png', '.jpx', '.bw', '.psd', '.pcd', '.sgi', '.vda', '.im', '.mpeg', '.bufr', '.mpg', '.rgba', '.xpm', '.fli', '.fits', '.ps', '.flc', '.gbr', '.msp', '.j2c', '.jpg', '.tga', '.ico', '.xbm', '.pgm', '.j2k', '.jpf', '.jpeg', '.vst', '.hdf', '.ras', '.tiff', '.ftc', '.jp2', '.eps', '.webp', '.blp'}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice if there was a mapping from file extensions to formats. I think ultimately, that's what really matters.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can select a few from within and create a map like this:

class HEIC2PNG:
  FORMAT_MAPPING: Dict[str, str] = {
      '.png': 'PNG',
      '.jpg': 'JPEG',
      '.jpeg': 'JPEG',
      '.webp': 'WEBP',
      '.bmp': 'BMP',
      '.gif': 'GIF',
      '.tiff': 'TIFF',
  }
  def __init__(...):
    ...

We can make adjustments at this point.

self.image.save(output_path, format=extension.replace(".", "").upper())

This line can be adjusted using the following approach: the variable image_format is assigned based on the file extension mapping (self.FORMAT_MAPPING[ext]), and the image is subsequently saved to the specified output_path in the determined format.

The snippet appears as follows:

...
ext = extension.lower()
if ext not in self.FORMAT_MAPPING:
    supported_formats = ', '.join(self.FORMAT_MAPPING.keys())
    raise ValueError(f"Unsupported format: {extension}. Supported formats are: {supported_formats}")

image_format = self.FORMAT_MAPPING[ext]
self.image.save(output_path, format=image_format)
...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a method I came up with on short notice. It may not be the best approach, but feel free to use it as a reference.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking out loud here, but Python has a standard mimetypes module. The IANA has an official MIME type list as well. Maybe we should see if we can leverage these existing tools.

>>> import mimetypes
>>> mimetypes.guess_type("myfile.heic")
('image/heic', None)

Getting from image/heic to whatever PIL wants (assuming it barfs on the official MIME type name) shouldn't be difficult.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PIL.Image has a MIME dictionary which the various image plugins are supposed to populate. For example:

>>> from PIL import Image
>>> from pillow_heif import register_heif_opener
>>> 
>>> register_heif_opener()
>>> Image.MIME["HEIF"]
'image/heif'

Unfortunately the HEIF opener doesn't register image/heic. The IANA description is a bit confusing to me.

Still, this avenue might be worth pursuing.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting all formats might be overly meticulous. Could creating a map be a practical solution?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. I was just pointing out that given PIL's image promiscuity, restricting your tool to a single input format might be limiting.

I came to this as a decades long user of netpbm. It doesn't support HEIC as far as I can tell, so before rolling my own, I went looking and found yours.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. My original intention was simply to convert iPhone photos into PNG files. 😄
Switching to using map should work without any issues, in my opinion.


# TODO: Skip this step if the output format isn't PNG or pngquant can't
# be found?
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the other format specify parameters like quality or compression ratio? If not, we can skip this check.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some formats (like JPEG) support image quality. Generally speaking, this is only applicable to lossy formats. I wasn't aware that PNG could be lossy.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies if my wording wasn’t clear. PNG is a lossless format.
However, when we convert HEIC to PNG, we use tools like pngquant.
In this case, the quality I’m referring to is the parameter for this tool, not quality levels like in JPEG.
Tools such as pngquant utilize color reduction and lossy conversion techniques, which may compromise image quality—somewhat contradicting PNG's renowned lossless nature.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not take pngquant usage out of this tool altogether? My Unix-oriented mind would say the proper way to quantize the conversion output would be

heic2png < somefile.heic | pngquant ... > somefile.png

Again, those pesky pipes (and the desire to read from stdin and write to stdout) rear their ugly heads. FWIW, netpbm has pnmquant, which seems to perform the same function, albeit with a different input format.

@smontanaro
Copy link
Author

Not directly related to this PR, I put together a simple HEIC-to-PPM script which treats command-line files the way I typically expect things to work in a Unix pipeline filter application. (You'll have to rename it to switch "txt" to "py".) It uses the openio() function I mentioned from my csvprogs repo. It also uses libmagic through the python-magic front-end. I have no idea if libmagic is available on Windows. That should be a more robust way to determine the underlying input file type than just the file extensions.
heictopnm.txt

@NatLee
Copy link
Owner

NatLee commented Feb 13, 2025

... I have no idea if libmagic is available on Windows. That should be a more robust way to determine the underlying input file type than just the file extensions. heictopnm.txt

I tried installing python-magic on Windows but failed. According to the official instructions, use pip install python-magic-bin for installation on Windows.

Collecting python-magic-bin
  Downloading python_magic_bin-0.4.14-py2.py3-none-win_amd64.whl.metadata (710 bytes)
Downloading python_magic_bin-0.4.14-py2.py3-none-win_amd64.whl (409 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 409.3/409.3 kB 1.4 MB/s eta 0:00:00
Installing collected packages: python-magic-bin
Successfully installed python-magic-bin-0.4.14

The issue is that macOS and Ubuntu require additional installations (via apt or brew), which might cause inconvenience for users.

I wonder if there's a way to strike a balance.

@NatLee
Copy link
Owner

NatLee commented Feb 13, 2025

Your question just reminded me—I'm exploring alternative ways to further reduce the size of converted PNG files. While using pngquant, I encountered an issue: a UnicodeDecodeError indicating an invalid continuation byte in the scripts/pngquant file.
This might need to be addressed as a separate issue.

@smontanaro
Copy link
Author

Your question just reminded me—I'm exploring alternative ways to further reduce the size of converted PNG files. While using pngquant, I encountered an issue: a UnicodeDecodeError indicating an invalid continuation byte in the scripts/pngquant file. This might need to be addressed as a separate issue.

Sounds like at least a test case to capture this problem would be a good first step.

@smontanaro
Copy link
Author

The issue is that macOS and Ubuntu require additional installations (via apt or brew), which might cause inconvenience for users.

I wonder if there's a way to strike a balance.

I suspect that's a known issue for people who care about python-magic. I realized pretty quickly that it wasn't on my Mac, so brew install libmagic fixed it. I was mildly surprised the python-magic installation didn't sniff around and warn me the underlying library was missing. It's already installed on my very old Ubuntu laptop, perhaps as a side-effect of installing some other package. (I don't recall explicitly installing it, but my brain ain't what it used to be, so perhaps I did.)

I'd submit a bug report/feature request to the python-magic folks asking for a post-install check for libmagic.

@NatLee
Copy link
Owner

NatLee commented Feb 26, 2025

I apologize for the delayed response as work has been quite busy recently.
I suggest letting this PR go through the CI/CD process first, and if everything looks good, we can proceed with this version. I’ll make adjustments to any additional engineering aspects as needed.

@NatLee
Copy link
Owner

NatLee commented Feb 26, 2025

Oh no, I think we should remove Python 3.7 from the action.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should be able to read from stdin, write to stdout, messages must not go to stdout
2 participants