Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible Anime4K model for fast real-time upscaling? #1

Closed
Tama47 opened this issue Mar 6, 2023 · 33 comments
Closed

Possible Anime4K model for fast real-time upscaling? #1

Tama47 opened this issue Mar 6, 2023 · 33 comments

Comments

@Tama47
Copy link

Tama47 commented Mar 6, 2023

Most models are extremely slow, especially for videos, so it would be great to have Bloc97's Anime4K model. Also, what happened to Anime4KMetal? Are there any plans to release an official, functional app? It was such a promising app.

@imxieyi
Copy link
Owner

imxieyi commented Mar 6, 2023

As you already mentioned Anime4K is for real-time upscaling. If you want to produce a upscaled video file anyway, Real-ESRGAN animevideo models are probably the better choice.

I don't have any plan to create a fully-featured video player based on Anime4KMetal, since creating a video player is not trivial. I hoped that someone would pick up the library and integrate into existing players but looks like it never happened.

@Tama47
Copy link
Author

Tama47 commented Mar 6, 2023

The problem with Real-ESRGAN and other models is that they are much slower than real-time. A 12-second 1080p video clip does not need to take 3 minutes to upscale to 4K. It should be done in, well, 12 seconds, for example, with Anime4K. In reality, it would probably be even faster than that since there is no need for real-time video playback. Additionally, I prefer how upscaled Anime4K videos look through mpv, and I wish there was an easy way to save the output. So I hope you will consider making a model for it, especially since you have already worked on Anime4KMetal. I would gladly pay for the in-app purchase for the custom model.

Also, I wish I could create a great video player for iOS with Anime4KMetal integration. But since I'm here asking you to create an Anime4K model, clearly I'm not experienced enough to create anything. I do hope someone picks it up, though. OutPlayer is promising since they use mpv and have their own GPU processing integration, but I haven't been able to contact them.

@imxieyi
Copy link
Owner

imxieyi commented Mar 6, 2023

I don't have any techincal insights into mpv. But I guess they support glsl shaders instead of Metal.

IIRC I have seen somewhere that newer GAN-based Anime4K shaders are converted from some ML models. So in theory it should be possibe porting them to Core ML which will run even faster as it utilizes ANE. But we need to figure out where to find the original models first.

@imxieyi
Copy link
Owner

imxieyi commented Mar 6, 2023

The conversion code is under /tensorflow folder in Anime4K repo. Looks like they are not providing the original models. In theory we could restruct the model from glsl shaders but I can already feel a pain in the butt even before looking into it.

@Tama47
Copy link
Author

Tama47 commented Mar 6, 2023

Can't you just use the Anime4KMetal library that you've already converted for the model? Or is that not how it works?

@imxieyi
Copy link
Owner

imxieyi commented Mar 6, 2023

Anime4KMetal is not a converted version of Anime4K. It translates glsl to Metal on the fly.

Usually you would chain multiple glsl shaders instead of using a single one (see instructions in Anime4K repo), which is not supported yet. Even if we ignore this, there are still several non-trivial things needed to make it work in waifu2x-ios app:

  1. Fix all shaders since many of them don't work properly.
  2. Fix subpixel alignment. I don't know if this is a problem on mpv as well. But output frames from Anime4KMetal has a slight offset comparing to input frames.
  3. Support processing video frames without tiling, without which the performance will be reduced (probably by a lot). Since you mentioned that real-time speed is important, I consider this a requirement instead of "nice to have".

As you can see it will be a lot of effort to implement properly. I can deinitely throw out a half-baked version with less effort, but I don't think it would be a good idea.

Do you have any justification why real-time speed is important for non-real-time processing? You can add a bunch of videos to the app, hit start button and just forget about it until it's finished.

@imxieyi
Copy link
Owner

imxieyi commented Mar 6, 2023

Just added sudo_UltraCompact_2x_1.121.175_G model to this converter. From my testing it's about 50% faster than Real-ESRGAN animevideo model (6fps vs 4fps for 1080p->4K on my 48-core M1 Ultra). I know it's still not even close to real-time. But the results look definitely better than Anime4K.

@Tama47
Copy link
Author

Tama47 commented Mar 7, 2023

It translates glsl to Metal on the fly.

Ah I see. Sorry I don't really know how any of these work.

Do you have any justification why real-time speed is important for non-real-time processing? You can add a bunch of videos to the app, hit start button and just forget about it until it's finished.

Like I said before, a 12-second video clip shouldn't need to take 3 minutes to upscale to 4K. This problem only get worse when I want to upscale full episode (I use both the iOS and Mac version). It would take hours to just finish one, which is less than ideal... I can upscale on the flies with my mac. But on an iOS device, that's just not possible. So my only option is to upscale beforehand. Or hopefully wait for someone to integrate your Anime4KMetal library into some video player.

Just added sudo_UltraCompact_2x_1.121.175_G. the results look definitely better than Anime4K.

Thanks, but for me, I don't care as much about quality as speed so Anime4K has been amazing so far. I guess I will wait for if you ever get to creating an Anime4K Model. For now, I will keep using the app to upscale images.

@imxieyi
Copy link
Owner

imxieyi commented Mar 7, 2023

If you want to use it on iOS devices, one thing to keep in mind is that they are highly constrained by thermal limitations. To achieve real-time performance without thermal throttling heavily you will likely have to use very tiny shaders. You can already test this by installing Anime4KMetal.

@Tama47
Copy link
Author

Tama47 commented Mar 21, 2023

I had assumed that with the power of the new chips, such as the iPad Pro with an M2 chip (which is the same as the one in my MacBook), would have no problem achieving real-time performance. Anyway, thank you for the feedback.

@Tama47 Tama47 closed this as not planned Won't fix, can't repro, duplicate, stale Mar 21, 2023
@Tama47
Copy link
Author

Tama47 commented Mar 24, 2023

After attempting to upscale more videos, I was still disappointed by the processing time, even with an M2 chip. It's perplexing to me that a 7-second clip would take 480 seconds, and a 2-second clip would take 135 seconds to upscale. I still wish for a faster model, such as Anime4k, to be made available. So I have decided to reopen the issue for now.

Screenshot 2023-03-24 at 4 44 44 AM

Regarding some of the issues you mentioned:

Usually you would chain multiple glsl shaders instead of using a single one
Fix all shaders since many of them don't work properly.

I feel that simply getting the restore shader to work would make the most noticeable difference, as it is the main shader that makes the image look sharper. Although, it would be nice to have the other shaders as well.

Fix subpixel alignment. I don't know if this is a problem on mpv as well. But output frames from Anime4KMetal has a slight offset comparing to input frames.

I haven't experienced this issue with mpv, but I understand what you mean. However, the subpixel alignment seems minimal when upscaled by 2. Also, I noticed that the new app update allows force no scaling, which I assume could be used to improve the video's clarity without upscaling it?

Support processing video frames without tiling, without which the performance will be reduced (probably by a lot). Since you mentioned that real-time speed is important, I consider this a requirement instead of "nice to have".

This would probably be the biggest hurdle, but I would still prefer a model that could reduce processing time by just half compared to the current model. Although, preferably closer to real time would be better.

@Tama47 Tama47 reopened this Mar 24, 2023
@imxieyi
Copy link
Owner

imxieyi commented Mar 24, 2023

This doesn't look right. Which model are you using? If it's Real-ESRGAN 2x anime video model then it should at least have 2fps on M2 Pro, which means the processing time is roughly x12 instead of x60 in your case. Could you try running sudo powermetrics|grep Power in Terminal while processing to check if ANE Power stays very low (like 0mW)?

@Tama47
Copy link
Author

Tama47 commented Mar 24, 2023

Which model are you using?

I've tried multiple models that are available, but the results are not significantly different. Real-ESRGAN seems to be the fastest, but it was still quite slow.

Could you try running sudo powermetrics|grep Power in Terminal while processing to check if ANE Power stays very low (like 0mW)?

Screenshot 2023-03-24 at 5 49 56 AM

The ANE Power stays pretty high. Also, I'm using a regular M2, not an M2 Pro.

@imxieyi
Copy link
Owner

imxieyi commented Mar 24, 2023

Are your videos 60fps instead of 24fps? I just tried the Anime - 2x (video) variant of Real-ESRGAN on my M1 MBA to upscale 1080p -> 2160p and it was about 1-2fps, which means it's roughly 20x time, still way less than 60x. You can check frame rate using something like mediainfo (brew install media-info).

@Tama47
Copy link
Author

Tama47 commented Mar 24, 2023

No, they are 24fps. I get the sources directly from Crunchyroll. I appreciate that you are trying to help, but the main issue is that I would prefer a faster model. 2fps or 20x is still way less than desirable for video, especially if they are longer than just a few seconds, say a full 24-minute episode. So, I hope that you will bring a faster model to the app. For now, I would like the issue to remain open. However, if you have no plans to create a real-time Anime4K model, you can close this issue as not planned, and I will not reopen it again.

@imxieyi
Copy link
Owner

imxieyi commented Mar 24, 2023

Which shaders you have tested in Anime4KMetal that work for you to achieve "real time" performance? I can take a look if it's possible converting the glsl shader into Core ML model when I need something to do. Well it would be too much pain. I guess I'll try to build a minimum-effort Metal version in the app instead.

@Tama47
Copy link
Author

Tama47 commented Mar 25, 2023

Which shaders you have tested in Anime4KMetal that work for you to achieve "real time" performance

I have noticed that anything larger than the small shaders outputs less than 24 fps on my M2, and only the small shaders (Anime4K_Upscale_Denoise_CNN_x2_S.glsl and Anime4K_Restore_CNN_S.glsl) have real-time performance. I'm not sure why this is the case, as it seems slower than mpv, which can still achieve real-time performance when chaining multiple Medium shaders. I'm wondering if the v4.0.1 shaders might be faster.

However, since this is a post-processing task, it doesn't have to be in real-time. So, something larger than S, like M or L shaders, would still be plenty fast and a big improvement over the current available models.

@imxieyi
Copy link
Owner

imxieyi commented Mar 25, 2023

as it seems slower than mpv

Anime4KMetal is not well optimized. I spent some time implementing multiple optimizations which roughly double the performance. On my M1 Ultra the Upscale+Denoise VL variant can achieve real-time performance instead of the M variant before these optimizations. There are still rooms for improvement as the GPU is not 100% efficiently utilized (e.g. GPU L1 cache miss rate is 99.99%).

Can you test the latest code in Anime4KMetal and see if it's still not comparible to mpv?

@Tama47
Copy link
Author

Tama47 commented Mar 25, 2023

On my M1 Ultra the Upscale+Denoise VL variant can achieve real-time performance instead of the M variant before these optimizations.

Is this with Anime4KMetal? Or mpv? That’s still impressive, since I haven’t been able to achieve real-time performance with anything greater than the M variant with mpv on my base M2.

Can you test the latest code in Anime4KMetal and see if it's still not comparible to mpv?

Do you mean the latest shaders? How do I do that? Although, I highly doubt the code would be much faster because, as you said, the GPU is not 100% utilized.

@imxieyi
Copy link
Owner

imxieyi commented Mar 25, 2023

Is this with Anime4KMetal?

It's Anime4KMetal.

Do you mean the latest shaders?

I mean latest version of Anime4KMetal. You can run git pull in Terminal after cd into Anime4KMetal.

I highly doubt the code would be much faster because, as you said, the GPU is not 100% utilized.

You are right. The Metal code is not faster at all. These optimizations only reduce CPU overhead.

Upon further investigations looks like the low GPU utilization issue happens mostly on large shaders. For 48-core GPU M1 Ultra the sweet spot for Upscale+Denoise is VL variant. UL variant is only 2x the size but the performance drops 20x. The reason appears to be register spill:
image

The issue here is that glsl shaders in Anime4K heavily uses mat4 to store convolution weights:
image

With large shaders those weights cannot fit into GPU registers, spilling onto main memory which is super slow. I highly doubt mpv will suffer from the same problem.

Interestingly enough, this problem seems to only exist on Apple Silicon Macs. IIRC on my old hackintosh with 5700XT even the largest GAN shader Anime4K_Restore_GAN_UUL.glsl could achieve real-time performance.

@Tama47
Copy link
Author

Tama47 commented Mar 25, 2023

I mean latest version of Anime4KMetal

Oh I see, you recently updated it. I’ll check it out

Interestingly enough, this problem seems to only exist on Apple Silicon Macs.

I suppose the shaders are just not optimized to run on SoC?

Anyway, do you think you’ll be able to build one for maybe the smaller shaders first just for testing with the app?

@imxieyi
Copy link
Owner

imxieyi commented Mar 25, 2023

I suppose the shaders are just not optimized to run on SoC?

I guess so. There are definitely rooms for improvement but it would be very difficult without access to original models.

Anyway, do you think you’ll be able to build one for maybe the smaller shaders first just for testing with the app?

I'm considering this. But I don't think it's a good idea making it generally available since the severe performance drop on slightly larger shaders will confuse most users and I can expect to receive tons of negative reviews for the app. I think I'll start with a TestFlight version. I'll update this thread once it's ready for testing.

@Tama47
Copy link
Author

Tama47 commented Mar 25, 2023

without access to original models.

What do you mean original models? The shaders? Are those not on GitHub, the previous versions?

I don't think it's a good idea making it generally available

Can’t these just be made as custom models? That would already make it less accessible to most. We could use the custom models for testing.

But if you decide to go with TestFlight, just let me know and I’ll join right away!

@imxieyi
Copy link
Owner

imxieyi commented Mar 25, 2023

What do you mean original models? The shaders? Are those not on GitHub, the previous versions?

All CNN/GAN based Anime4K shaders are converted from TensorFlow models. You can find the conversion script here. The author didn't provide original models before conversion. If the original TensorFlow models are available, we could easily convert them into Core ML models which is already supported by custom models feature.

Can’t these just be made as custom models?

As I said I'll implement a minimum-effort version. Supporting this as custom models will require very significant efforts as it must be robust and supports any (including non-open-source) GLSL shaders. Therefore I decided to ship verified shaders as part of the app so that it's won't be broken easily.

@Tama47
Copy link
Author

Tama47 commented Mar 25, 2023

As I said I'll implement a minimum-effort version. Supporting this as custom models will require very significant efforts

I see. I understand now. Thanks!

If the original TensorFlow models are available, we could easily convert them into Core ML models which is already supported by custom models feature.

So if we have the original TensorFlow models, it will be easy to convert them into custom models to use with the app? Is that what you meant?

@imxieyi
Copy link
Owner

imxieyi commented Mar 25, 2023

So if we have the original TensorFlow models, it will be easy to convert them into custom models to use with the app?

Exactly. Converting TensorFlow models to Core ML is as simple as PyTorch which is already implemented in this repo.

@imxieyi
Copy link
Owner

imxieyi commented Mar 26, 2023

As I mentioned in imxieyi/Anime4KMetal#1 I added support for chaining multiple shaders. Aspect ratio after scaling has also been fixed.

I'll hold up adding support for processing videos with Anime4K in waifu2x app since I haven't been able to fix subpixel alignment. Output images are slightly zoomed out compared to input images. I tested mpv and it didn't have such issue. If this doesn't get fixed then the image will have ugly tiling artifact after tiled processing in waifu2x.

The problem resides in Metal sampler. Even in a simple shader that only copies input images to output images the subpixels are not aligned properly. I struggled a lot trying to fix it but finally gave up. Looks like the only way to fix is to write a custom sampler which will be a huge pain to do.

Probably the right way ahead is to add more codec support to Anime4KMetal and make it a more feature-complete video player for real-time processing.

@Tama47
Copy link
Author

Tama47 commented Mar 26, 2023

I added support for chaining multiple shaders. Aspect ratio after scaling has also been fixed.

Thank you for this! The stretched aspect ratio was definitely the biggest improvement for me as I use a MacBook and an iPad, which doesn't have a 16:9 aspect ratio. The shader preset finally allows me to use the Fast A Mode, which is weird as I couldn't even use a single M shader before without dropping frames.

I'll hold up adding support for processing videos with Anime4K in waifu2x app.
Looks like the only way to fix is to write a custom sampler which will be a huge pain to do.

That's unfortunate, but thank you for your efforts! I really appreciate it. So would having the original TensorFlow models help with the subpixel alignment issues and allows you to write the custom sampler?

Probably the right way ahead is to add more codec support to Anime4KMetal and make it a more feature-complete video player for real-time processing.

This is really exciting! On my Mac, I have access to mpv so it hasn't been a a problem but on an iPad or iPhone, I'd really love to be able to use Anime4K on those devices as well.

On a side note, I thought you had dropped the project since it hadn't been updated in 2 years until now, and you said that you didn't plan to create a fully-featured video player? Regardless, I am happy that you are working on Anime4KMetal again.

Also, one last thing: is there a way to hide this while watching videos?
Screenshot 2023-03-26 at 2 52 05 AM

@imxieyi
Copy link
Owner

imxieyi commented Mar 26, 2023

So would having the original TensorFlow models help with the subpixel alignment issues and allows you to write the custom sampler?

If we have the original models then we can skip Metal and use Core ML instead, which (in theory) should be way faster than Metal version.

and you said that you didn't plan to create a fully-featured video player?

Yes. I'll not invest too much time into video player. If I can't find any existing open-source video player framework that allows loading Metal shaders then I'll give up.

is there a way to hide this while watching videos?

I just uploaded v0.0.3 version that allows you to disable this banner. If you prefer to build locally make sure you set Build Configuration to Release (by default Debug) so that Metal overlay will also be hidden.

@Tama47
Copy link
Author

Tama47 commented Mar 26, 2023

If we have the original models then we can skip Metal and use Core ML instead, which (in theory) should be way faster than Metal version.

Okay I see, I’ll try asking Bloc about it, regarding the original models.

Yes. I'll not invest too much time into video player. If I can't find any existing open-source video player framework that allows loading Metal shaders then I'll give up.

I hope that you’ll be able to find a framework for it! Luckily, most of the videos from YouTube and Crunchyroll are in MP4, so codecs hasn’t been a big issue for me.

I just uploaded v0.0.3 version

Thanks!

Also slightly off-topic, but I’ve been trying to implement an Anime4K Player using a port of WebGL. It only supports MP4/WebM, as far as I’ve tested. Theoretically, it should be able to work on any device, but I’ve only been able to get it working on Desktop browsers, not on iOS Safari. I wonder if it would be better to port WebGL via Metal instead. But then again, I know nothing about Metal, so probably not.
2E20AF93-3B08-4ECC-8625-7E557852A911
Notes: It lacks many features and is purely experimental. I didn’t expect it to work or anything, just a fun experiment.

@imxieyi
Copy link
Owner

imxieyi commented Mar 26, 2023

Luckily, most of the videos from YouTube and Crunchyroll are in MP4, so codecs hasn’t been a big issue for me.

Good to know! Most streaming services use regular mp4 with h264/hevc codec so it should not be an issue for Anime4KMetal.

I’ve only been able to get it working on Desktop browsers, not on iOS Safari.

I guess it's VRAM limitation. Anime4K shaders consume a ton of VRAM. I won't be surprised if iOS Safari only allows using very little VRAM.

@Tama47
Copy link
Author

Tama47 commented Mar 27, 2023

we can skip Metal and use Core ML instead

Hey, what would be a good contact info for you? Bloc said he is willing to cooperate.

@imxieyi
Copy link
Owner

imxieyi commented Mar 27, 2023

That's good to know! Will he release the original models with code publicly? If so I can just download from wherever he uploads those models. If I got to convert them I'll release them publicly anyway. And reversing Core ML to TensowFlow is not hard.

@Tama47 Tama47 closed this as completed May 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants