Skip to content

error / warning on images with an embedded thumbnail #125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eroux opened this issue Jun 3, 2021 · 10 comments
Closed

error / warning on images with an embedded thumbnail #125

eroux opened this issue Jun 3, 2021 · 10 comments
Assignees

Comments

@eroux
Copy link

eroux commented Jun 3, 2021

We do have some jpg images that embed thumbnails, for instance https://iiif.bdrc.io/bdr:I1NLM2232_001::I1NLM2232_0010001.jpg/full/full/0/default.jpg (this is a byte copy of the s3 image, no treatment by the IIIF server). You can extract the thumbnail from it using (under Linux at least):

wget https://iiif.bdrc.io/bdr:I1NLM2232_001::I1NLM2232_0010001.jpg/full/full/0/default.jpg -O I1NLM2232_0010001.jpg 
exiftool -b -ThumbnailImage I1NLM2232_0010001.jpg > thumbnail.jpg

Here it is attached for reference:

thumbnail

It's about 13kB in size, this means that 13k at the beginning of the image are taken by the thumbnail, but are not useful for our purpose... (unless I'm missing something?). I think we could make at least a warning in the asset manager if there are thumbnails in JPGs.

@jimk-bdrc
Copy link
Collaborator

That would actually be really helpful to you to have this information extracted and preserved somehow. I've been resisting having audit tool actually do anything, but there's certainly a case for starting to build a processing suite that extracts thumbnails and ICC profile for each object, so that IIIF server doesn't have to.

The problem with errors/warnings is that, in general, nobody sees them, and someone has to then post process (do what your shell script does).

If we had a workflow that read errors & warnings as part of a toolchain, and decided what to do with them, that workflow could pick up the notifications and extract + persist the ICC profile, thumbnail, or whatever else IIIF needs.

@jimk-bdrc
Copy link
Collaborator

@TBRC-Travis Another question is why NLM processing is doing this when (possibly) nobody else is. This gets back to the discussion Karma and I were having, and will continue to have, next week.

@eroux
Copy link
Author

eroux commented Jun 3, 2021

Oh actually just ignore the embedded thumbnails so ideally it wouldn't be there at all... if at some point I create thumbnails that will be a separate objects and it would be in separate files, not embedded. So my request would be more to issue a warning when there's an embedded thumbnail and ask the user to remove it

@TBRC-Travis
Copy link

@eroux agreed. in the case of NLM the inclusion of embedded thumbnails was not intentional. it's likely an artifact of using Adobe Lightroom for processing at NLM which is likely adding some of these extra bits under the hood. I can adjust the NLM process to stop generating the thumbnails.

@eroux
Copy link
Author

eroux commented Jun 3, 2021

ah great, thanks! Note that this could possibly be part of a processing script:

$ ls -al I1NLM2232_0010001.jpg 
-rw-r--r-- 1 eroux eroux 473078 juin   3 16:53 'I1NLM2232_0010001.jpg'
$ exiftool -ifd1:all= I1NLM2232_0010001.jpg
    1 image files updated
$ ls -al I1NLM2232_0010001.jpg 
-rw-r--r-- 1 eroux eroux 459512 juin   3 16:55 'I1NLM2232_0010001.jpg'

note the size reduction, also this doesn't reencode the jpg so it's fine (or at least I don't see why it would reencode it)

@jimk-bdrc jimk-bdrc self-assigned this Jun 3, 2021
@jimk-bdrc
Copy link
Collaborator

Why we need a platform

@jimk-bdrc
Copy link
Collaborator

@eroux Would it help if this info were encoded into dimensions.json? (Won't help the past, but might help the future) v_m_b can extract the 0x201 fields from a num,ber of dictionaries (see https://www.exiftool.org/TagNames/EXIF.html 0x201 0x202) and derive where the image thumbnail is (not create the node in dimensions.json if there is none.)

It might just help to have the warning in a place where IIIFPRES can use it, as well as where the creator sees it.

@eroux
Copy link
Author

eroux commented Feb 11, 2022

I don't think it will be useful no, I think my intention was to make sure that audittool complains if there's an embedded thumbnail

@jimk-bdrc
Copy link
Collaborator

From metadata-extractor-issue-262

  Metadata metadata = ImageMetadataReader.readMetadata(inStream);

            ExifSubIFDDirectory directory = metadata.getFirstDirectoryOfType(ExifSubIFDDirectory.class);

            int offset = directory.getInt(0x0201);
            int length = directory.getInt(0x0202);

            logger.info("Embedded jpeg offset: " + offset);
            logger.info("Embedded jpeg length: " + length);

            inStream.reset();
            inStream.skip(offset);

            byte[] jpegData = new byte[length];
            inStream.read(jpegData, 0, length);

            return jpegData;

@jimk-bdrc
Copy link
Collaborator

Closed in PR #161

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants