Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sluggishness in Apple Silicon #124

Open
meneguzzi opened this issue May 24, 2023 · 8 comments
Open

Sluggishness in Apple Silicon #124

meneguzzi opened this issue May 24, 2023 · 8 comments

Comments

@meneguzzi
Copy link
Contributor

I have tried to use ffmpeg to stabilise a video I took with my cellphone. I had done this before in an Intel Mac, and the speed of the passes were slow but not sluggish. I recently tried doing the same in my Apple Silicon Mac (a quite beefy spec: Apple M1 Max, 32GB RAM), and the frame rate was extremely slow (best speed so far was speed=0.0339x. I checked that I am using the versions of all software (ffmpeg and libvidstab) from homebrew compiled for Apple Silicon, so this is definitely not an emulation problem.

ffmpeg -i clip.mp4 -threads 8 -vf vidstabdetect -f null - ; ffmpeg -i clip.mp4 -threads 8 -vf vidstabtransform clip-stabilized.mp4;

This might be my own recollection of what to expect being wrong, but is this speed normal?

@georgmartius
Copy link
Owner

Mh, I guess the M1 does not have the SSE extensions, so the optimized machine code is not used. If this is the reason the slow speed is expected.

@meneguzzi
Copy link
Contributor Author

I'm trying to understand the issue here, so do you have bits of straight asm code in your codebase? Or would it be possible to tweak the Makefile to use specific compiler options to mitigate this problem in Apple Silicon (or other non-Mac Arm machines)?

@meneguzzi
Copy link
Contributor Author

As a follow up (so I can try to investigate this further). Is there anything I can read about how you use those instructions for the speedup?

@meneguzzi
Copy link
Contributor Author

As an update, I found that there are some solutions to this problem of the SSE API:

The first one seems to be the easiest to use. If I got some time, I aim to try compiling this locally and do a pull request.

@juanctecdam
Copy link

As an update, I found that there are some solutions to this problem of the SSE API:

The first one seems to be the easiest to use. If I got some time, I aim to try compiling this locally and do a pull request.

@meneguzzi
Did you managed to make this work?

@meneguzzi
Copy link
Contributor Author

Hi, I tried for an hour or so to make ss2neon work, and it does compile generating a dynamic library. I did not have time to test it with ffmpeg yet, will try to do it ASAP.

@meneguzzi
Copy link
Contributor Author

Done, I pushed changes I tried to do as a workaround. I did not have time to test compiling ffmpeg with that, so if somebody (or @juanctecdam) has time to do it, I'd appreciate it.

@rgaufman
Copy link

rgaufman commented Jun 17, 2024

I've compiled ffmpeg from source:

ffmpeg version N-115657-g17c3cc5bb6 Copyright (c) 2000-2024 the FFmpeg developers
  built with Apple clang version 15.0.0 (clang-1500.3.9.4)
  configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/HEAD-17c3cc5 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags='-Wl,-ld_classic' --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libharfbuzz --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-audiotoolbox --enable-neon
  libavutil      59. 21.100 / 59. 21.100
  libavcodec     61.  7.100 / 61.  7.100
  libavformat    61.  3.104 / 61.  3.104
  libavdevice    61.  2.100 / 61.  2.100
  libavfilter    10.  2.102 / 10.  2.102
  libswscale      8.  2.100 /  8.  2.100
  libswresample   5.  2.100 /  5.  2.100
  libpostproc    58.  2.100 / 58.  2.100

But running:

fmpeg -hwaccel videotoolbox -i 20240616_C3799.MP4 -vf vidstabdetect -f null -

Is extremely slow:

Output #0, null, to 'pipe:':
  Metadata:
    major_brand     : XAVC
    minor_version   : 17506303
    compatible_brands: XAVCmp42nrasiso6
    encoder         : Lavf61.3.104
  Stream #0:0(und): Video: wrapped_avframe, yuv444p(pc, progressive), 2160x3840 [SAR 1:1 DAR 9:16], q=2-31, 200 kb/s, 50 fps, 50 tbn (default)
      Metadata:
        creation_time   : 2024-06-16T19:04:31.000000Z
        handler_name    : Video Media Handler
        vendor_id       : [0][0][0][0]
        encoder         : Lavc61.7.100 wrapped_avframe
  Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s (default)
      Metadata:
        creation_time   : 2024-06-16T19:04:31.000000Z
        handler_name    : Sound Media Handler
        vendor_id       : [0][0][0][0]
        encoder         : Lavc61.7.100 pcm_s16le
frame=   23 fps=0.9 q=-0.0 size=N/A time=00:00:00.46 bitrate=N/A speed=0.0182x

0.9fps on an M3 Max, making it not practical to run. I appreciate it's not Apples to Apples but this clip takes about 13 seconds to process in Final Cut stabe vs over an 1 hour using this.

Anything I am missing that is causing performance to be so poor?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants