Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance oddities on OS X 10.15.5 #62

Open
sludin opened this issue Jun 19, 2021 · 9 comments
Open

Performance oddities on OS X 10.15.5 #62

sludin opened this issue Jun 19, 2021 · 9 comments

Comments

@sludin
Copy link

sludin commented Jun 19, 2021

I stated with the rust version and then switched to the C version to debug this issue. In short, I cannot reliably get any frame rates faster than 30 FPS ( ~33 ms per frame ). I dug into the code and the only explicit timing code I see is in mfb_wait_sync in MacMiniFB.m. I figured before I started looking up OSX API calls (not as familiar with them as I want to be) I would see if this is a known issue of some sort.

This is with the latest version of the rust and C APIs using Metal.

In short, if I set a framerate in the C version or set the limit_update_rate in the Rust version, once I go below 30 FPS the majority of the frames still take ~33ms. Sample output from the code below fro fps set to 60:

 33.24
 33.27
 33.40
 33.28
 33.37
 33.22
 33.30
 33.26
 33.52
 33.42
 33.27
 33.29
 33.49
 33.05
 33.44
 33.42
 49.86
 33.45
 49.46
 33.67
 33.27
 33.36
 33.32
 33.32
 33.48

And fps set to 10 (somehow I think the code in drawInMTKView in OSXViewDelegate.m comes into play because of the ~.96 number but I do not know how it gets there:

 95.95
 96.02
 95.97
 96.12
 96.04
 95.97
 96.06
 96.14
 95.98
 96.05
 95.85

My barebones code is below. This is drastically reduced from the original to try and isolate the issue.:

#include <MiniFB.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

static uint32_t  g_width  = 400;
static uint32_t  g_height = 400;
static uint32_t *g_buffer = 0x0;

int main( int argc, char *argv[] ) {
  struct mfb_window *window = mfb_open_ex("Foo display", 800, 600, WF_RESIZABLE);
  if (!window)
    return 0;

  g_buffer = (uint32_t *) malloc(g_width * g_height * 4);

  mfb_set_target_fps(60);

  struct mfb_timer *t = mfb_timer_create();

  long count = 1;
  double total = 0;

  do {
    int state;

    double start = mfb_timer_now( t );
      
    state = mfb_update_ex(window, g_buffer, g_width, g_height);

    if (state < 0) {
      window = NULL;
      break;
    }
  
    double end = mfb_timer_delta( t );
    total += end;

    // Print out the time for the last cycle and the average time (average seems
    // to suffer from high times for early frames
    printf( "%6.2f\n", end * 1000.0 );
    
    count++;
  } while(mfb_wait_sync(window));

  /* Cleanup code conveniently omitted (i.e. I didn't write it */
}
@emoon
Copy link
Owner

emoon commented Jun 19, 2021

Notice that the Rust version and C version is quite different implementations. It would be better to to report it there. Also it's possible to set the wait time to None on the Rust version and do the timing that you want yourself.

@sludin
Copy link
Author

sludin commented Jun 19, 2021

I did notice that and did set the wait time to None in the rust version. The behaviors were identical. I can include the rust code if desired but it is the virtual equivalent of the above C code. My thought was that it is not something with the minifb code (either version) but rather because of the very similar behavior of effectively limiting to 30 FPS it may something in the OSX layer. For what it is worth I am running Catalina on a 2017 MPB, Intel HD Graphics 630 1536 MB, and 16 GB RAM.

Above I said "drawInMTKView in OSXViewDelegate.m comes into play because of the ~.96 number" which is false. The .96 number (which is likely not relevant) is in mfb_wait_sync in McMiniFB.m. My mistake.

@sludin
Copy link
Author

sludin commented Jun 19, 2021

I experimented a bit more and found something interesting. When the FPS is set to below 60 it works as expected. When it is 60 is goes funky and all of a sudden the stack trace under mfb_open_ex starts to take ~33ms (30 FPS - all numbers are approximate). I put printfs in the above code as follows:

  do {
    int state;

    // Measure the time spent in mfb_wait_sync
    printf( "%6.2f ", mfb_timer_delta( t ) * 1000.0 );
      
    state = mfb_update_ex(window, g_buffer, g_width, g_height);

    // Measure the time spent in mfb_update_ex
    printf( "%6.2f ", mfb_timer_delta( t ) * 1000.0 );
    
    if (state < 0) {
      window = NULL;
      break;
    }
  
    // measure anything else though there should be nothing bu t the if statement
    printf( "%6.2f\n", mfb_timer_delta( t ) * 1000.0  );
    
  } while(mfb_wait_sync(window));

The output of this code for a few frame rates ( set by calling mfb_set_target_fps ):


10:

Notice most time is 'in' mfb_wait_sync as would be expected. Fields are:

mfb_wait_sync mfb_update_ex leftover

 95.59   0.40   0.00
 95.60   0.43   0.00
 95.56   0.26   0.00
 95.73   0.39   0.00
 95.63   0.40   0.00
 95.61   0.43   0.00

30:

(As above - Also you can see that the 0.96 limit is being enforced in mfb_wait_sync as expected)

 31.66   0.34   0.00
 31.67   0.38   0.00
 31.61   0.42   0.00
 31.58   0.35   0.00
 31.64   0.32   0.00

50:

 16.73   0.57   0.01
 16.88   0.64   0.00
 16.81   0.52   0.00
 16.94   0.53   0.00
 16.92   0.42   0.00
 17.03   0.53   0.00

59:

 15.59   0.46   0.00
 15.81   0.63   0.00
 15.65   0.50   0.00
 15.77   0.42   0.00
 15.85   0.52   0.00

60:

  0.01  33.30   0.00
  0.01  33.33   0.00
  0.01  33.32   0.00
  0.01  33.32   0.00
  0.01  33.36   0.00

But once in a while you get:

 16.60   0.31   0.00
 15.99   0.37   0.00
 16.27   0.37   0.00
 16.51   0.43   0.00

FWIW, you get identical numbers for any setting over 60.


I profiled the code using FPS 60 and 30. There are a few differences that jump out which will not be a surprise from the above numbers:

  • The 30 FPS code spent most of its time in mfb_wait_sync
  • The 30 FPS code was a bit less efficient (use double the CPU) probably because of the tight loop that is hit when you are between 0.80 and 0.96 of the g_time_for_frame
  • The 60 FPS code spend most of its time in mfb_update_ex
  • Both rates have the same stack trace, but for some reason the [OSXViewDelegate drawInMTKView] call for 60 FPS is taking 10 times longer.
  • Ultimately, the call to amdBronzeMtlTextureStdWriteTileVecDisp2 (amongst a few others near it) takes much longer when the FPS is 60 or greater.

I will look and see if there are any Metal oddities around update rates. This is so stark and obvious that I can't be the first to hit it. It is possible that It is a setting somewhere on my machine. I remember that certain trackpad settings in the past would limit frame rates for example.

@sludin
Copy link
Author

sludin commented Jun 19, 2021

Compiling libminifb.a with USE_METAL_API=OFF removed the FPS limitation. I can sustain 120 FPS (didn't try faster) on the no-op example above.

I read that MTKView limits frame rates to 60 FPS. I wonder if something in there is being tickled, at least in my configuration of software/hardware. I do know know enough about the API to say.

Disabling metal unfortunately does not solve my problem on the Rust side which is what I prefer to use ( memory safe languages == good ). Obviously not a question for this project but the Rust version does not have the preprocessor conditionals and I would need to port the 15 places where it is used in the C version. Add to that that disabling Metal is probably not the "right thing" to do and I circle back to trying to figure out where I am going wrong with Metal.

@emoon
Copy link
Owner

emoon commented Jun 20, 2021

Yeah so if you disable Metal you won't be able to use updated_with_buffer as Metal is used to draw stuff on the screen. I haven't seen this behovior myself tho so I'm not really sure what is going on.

@sludin
Copy link
Author

sludin commented Jun 20, 2021

I'll try and get my hands on a different Mac or two and see if I can reproduce it or not. Definitely unique behavior.

@emoon
Copy link
Owner

emoon commented Jun 20, 2021

Great! Thanks for investigating :)

@sludin
Copy link
Author

sludin commented Jun 20, 2021

What better to do on a Sunday morning than to continue banging my head against this wall.

I made some progress, though I do not have a full explanation as of yet. I got it all to build under xcode (I never use it so it was a bit foreign to me) in order to easily get the GPU data. What I saw there conformed with the documentation: When I set FPS to 60 I got 60 FPS. When I set it to 59, I got 30 FPS. Of course, at the application level, each loop was taking ~16.6 with 59 FPS and ~33.3 ms with 60 FPS. Obviously the MTKView is working independently from the application level. Why it is causing the call under drawInMTKView to block at 60 and not at 30, I do not know yet.

Thinking that maybe there is some rounding error when converting from FPS -> g_time_for_frame -> view.preferredFramesPerSecond I hard coded the last to 30. And it worked. By worked I mean I got 30 FPS in the GPU debugger and the application was humming along at 60 'loops' per second. Leaving the mfb_set_target_fps() at 60, I changed the hard coded preferredFramesPerSecond to 10 and I got 10 FPS in the GPU debugger and still 60 loops per second in the application (as expected). When preferredFramesPerSecond is 60 (or higher), I get the blocking behavior where each loop takes 33.3ms.

preferredFramesPerSecond mfb_set_target_fps loop time expected time MTKView FPS (xcode)
60 60 33.3 16.67 60
30 60 *15.9 16.67 30
10 60 *15.9 16.67 10
30 90 *10.5 11.11 30
60 90 33.3 11.11 60

(*) The difference between measured and expected is due to the 0.96 limit in mfb_wait_sync

So solution? No. Workaround, yes. Hardcoding (for now) preferredFramesPerSecond and using a different number for mfb_set_target_fps gets me where I need to be for the moment. It is more important to me to have the application level tick along at a regular than it is to have a 60 FPS frame rate. Could I have just saved all this work and set the rate to 30 FPS to start with? Sure! But where would be the fun in that?

The mystery remaining is to explain the behavior when preferredFramesPerSecond is set to 60. Why does it 'block' for 33.3ms 95% or so of the time? Something about the triple buffering? The fact that it is sometimes 16.67 (1 frame) and most of the time 33.3 (2 frames) is suspicious. Living on the edge I removed the semaphores protecting the ring buffer or buffer and no effect. May a Metal wizard someday stumble upon this and slap me aside the head and say, "HERE'S your problem!"

@emoon
Copy link
Owner

emoon commented Jun 20, 2021

So I currently use tripple buffering (as recommended by the documentation to not get any GPU stalls) What you could try to is to set this https://github.com/emoon/rust_minifb/blob/master/src/native/macosx/OSXWindowFrameView.h#L6
to 2 instead and see if it helps.

I don't think it will/shouldn't unless something is broken when the Metal setup (the Metal code I use there is the first one I ever wrote for Metal so there may be issues)

yamt added a commit to yamt/wasm4 that referenced this issue Aug 14, 2024
yamt added a commit to yamt/wasm4 that referenced this issue Aug 15, 2024
yamt added a commit to yamt/wasm4 that referenced this issue Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants