Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rendering] animation frame callback handling when iframes are involved #10521

Open
emilio opened this issue Jul 25, 2024 · 23 comments
Open

[rendering] animation frame callback handling when iframes are involved #10521

emilio opened this issue Jul 25, 2024 · 23 comments

Comments

@emilio
Copy link
Contributor

emilio commented Jul 25, 2024

What is the issue with the HTML Standard?

I noticed that all browsers do something different right now with regards to requestAnimationFrame when iframes are involved.

  • Safari seems to match the spec in my (limited) testing.
  • Chrome seems to do something like the spec, but skips docs which don't have frame callbacks.
  • Gecko does something that depends on which document requested the callback first etc. I recently reworked Gecko and ended up somewhat accidentally matching Chrome.

Here's a trivial-ish test-case:

<!doctype html>
<iframe width=10 height=10 id="child"></iframe>
<p>
  <button onclick="runTest(true)">Run test (child has callback)</button>
  <button onclick="runTest(false)">Run test (child has no callback)</button>
</p>
<pre id="log"></pre>
<script>
  let parentWin = window; // Just out of extra caution
  let childWin;
  onload = () => {
    childWin = child.contentWindow;
  }
  function log(t, kind) {
    document.getElementById("log").appendChild(document.createTextNode(`[${t}] ${kind}\n`));
  }
  function runTest(withChildCb) {
    log(parentWin.performance.timeOrigin + parentWin.performance.now(), "start");
    parentWin.requestAnimationFrame(t => {
      log(t + parentWin.performance.timeOrigin, "parent");
      parentWin.requestAnimationFrame(t => {
        log(t + parentWin.performance.timeOrigin, "parent-from-parent");
      });
      childWin.requestAnimationFrame(t => {
        log(t + childWin.performance.timeOrigin, "child-from-parent");
      });
    });
    if (withChildCb) {
      childWin.requestAnimationFrame(t => {
        log(t + childWin.performance.timeOrigin, "child");
        parentWin.requestAnimationFrame(t => {
          log(t + parentWin.performance.timeOrigin, "parent-from-child");
        });
      });
    }
  }
</script>

The interesting case is Run test (child has no callback).

That in Safari does what the spec says, running:

  • parent
  • child-from-parent (in the same tick)
  • parent-from-parent (next tick).

Current Firefox, Chrome, and Firefox Nightly do:

  • parent
  • parent-from-parent (next tick)
  • child-from-parent (same tick)

If you test the Run test (child has callback) test, then Safari, Chrome and Firefox Nightly agree with the spec:

  • parent
  • child (same tick)
  • child-from-parent (same tick)
  • parent-from-parent (next tick)
  • parent-from-child (same tick)

Firefox release does:

  • parent
  • child (same tick)
  • parent-from-parent (next tick)
  • parent-from-child (same tick)
  • child-from-parent (same tick)

Which is kinda odd, but makes some amount of sense. It collects all callbacks at once, and fires them. Any callback scheduled from an existing one gets fired on the next rendering update.

To be clear, I think the spec is somewhat sane, but I have questions:

  • Is Chromium's behavior is intentional, or an implementation accident due to this not being well tested?
  • Is Safari's implementation intentional? (looks like so).
  • Do we want to consider something like release Firefox's behavior (collect all callbacks, then fire them)? It's kinda nice that any nested rAF runs in a different tick, regardless of the relationship between the caller and the window...
  • Another somewhat related issue is that in the spec, docs is frozen for the whole rendering update. That's not the case in neither Gecko or WebKit at least (not sure about Chrome).
    • E.g., if you add an <iframe> during rAF, that document would run the IntersectionObserver steps.

After finding this I kinda wanna change Gecko to match Safari and the spec and add some good tests for this, or to a somewhat simpler version of Firefox Release's behavior (the one that prevents nested rAF from firing on the same update)...

cc @whatwg/rendering @chrishtr @rniwa @annevk @zcorpan @aosmond @mfreed7 @szager-chromium

@emilio emilio added topic: rendering agenda+ To be discussed at a triage meeting labels Jul 25, 2024
@emilio emilio changed the title [rendering] animation frame callback handling [rendering] animation frame callback handling when iframes are involved Jul 25, 2024
@esprehn
Copy link

esprehn commented Jul 25, 2024

This change was made in Chrome relatively recently (early 2023) apparently because they thought it followed the spec better:

https://source.chromium.org/chromium/chromium/src/+/766274f6af98374883d2c30ca2dc0fc116f407ad

Collection of all documents is here:
https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/page/page_animator.cc;l=102;drc=4268052f5025da8b928c9e59a04493b396acaad3

It used to walk the document tree like WebKit.

The spec seems to say to collect all the docs at the beginning and each step says "for each doc of docs". What makes you think the behavior of Safari is what the spec is saying?

@emilio
Copy link
Contributor Author

emilio commented Jul 25, 2024

@esprehn I was commenting on the behaviour of the rAF callbacks, not on the "liveness" of the document list. Safari matches the spec afaict because a child rAF scheduled from a parent rAF runs in the same rendering update, regardless of whether the child has existing callbacks scheduled at the beginning of the rendering steps.

@rniwa
Copy link

rniwa commented Jul 25, 2024

Safari's behavior here is intentional to follow the spec.

@esprehn
Copy link

esprehn commented Jul 25, 2024

Oh I see, yeah it looks like chrome did this on purpose on a spec violating optimization:

https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/page/page_animator.cc;l=153;drc=4268052f5025da8b928c9e59a04493b396acaad3

I think they should probably revert that optimization.

@emilio
Copy link
Contributor Author

emilio commented Jul 25, 2024

The spec violation is, afaict, this line, because it assumes that new tasks won't be scheduled: https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/page/page_animator.cc;l=145;drc=4268052f5025da8b928c9e59a04493b396acaad3

And yeah it seems nobody really properly implements the "collect all docs upfront" bit... @rniwa do you know if that was a requirement or just an oversight? (Given you spent deliberate effort on trying to match the spec)

@rniwa
Copy link

rniwa commented Jul 25, 2024

And yeah it seems nobody really properly implements the "collect all docs upfront" bit... @rniwa do you know if that was a requirement or just an oversight? (Given you spent deliberate effort on trying to match the spec)

It's more of an oversight although if no browser implements that way, maybe we should update the spec to match whatever browsers are doing today.

@esprehn
Copy link

esprehn commented Jul 25, 2024

@rniwa that was the part of the spec I was saying Safari does not implement. It walks the tree in every step:
https://github.com/WebKit/WebKit/blob/64f40e1806635d78dfe9a758585b5598d8793034/Source/WebCore/page/Page.cpp#L3997

@emilio
Copy link
Contributor Author

emilio commented Jul 25, 2024

It seems blink does something close enough with regards to that, so maybe the spec is web compatible?

@esprehn
Copy link

esprehn commented Jul 25, 2024

@emilio yeah you're right, I re-read the Chrome code and edited my comment. My bad for the confusion there.

@smaug----
Copy link

smaug---- commented Jul 25, 2024

@emilio Why does the test have log(t + childWin.performance.timeOrigin, "parent-from-parent"); ? Shouldn't it use parentWin

@Kaiido
Copy link
Member

Kaiido commented Jul 26, 2024

Isn't this test hitting the step 4. Unnecessary rendering which would remove the iframe's doc if nothing has to be rendered there?

It seems that the results do vary if there is an already running rAF loop in the iframe (thus avoiding that unnecessary rendering step), or if the "main" task is ran from a timer task instead of in the click event (which I assume would mark the frame as potentially needing an update). Not only some do not fire in the same tick anymore, but even the whole execution order can be affected.

A few new tests.

Note that these tests all use the main's document's timeline.currentTime as source of truth to know in which rendering frame we are, I might have missed something with the timings in the first test.

With callbacks

no rAF loop - no timer - with callback

Chrome Firefox (Release) Firefox (Nightly) Safari
[next] start [2059.5] [next] start [3247.42] [next] start [2226.9] [next] start [3220]
[next] parent [2076.2] [next] parent [3282.98] [next] parent [2239.9] [next] parent [3221]
[same] child [2076.2] [same] child [3282.98] [same] child [2239.9] [same] child [3221]
[same] child-from-parent [2076.2] [next] parent-from-parent [3297.42] [same] child-from-parent [2239.9] [same] child-from-parent [3221]
[next] parent-from-parent [2092.8] [same] parent-from-child [3297.42] [next] parent-from-parent [2243.84] [next] parent-from-parent [3250]
[same] parent-from-child [2092.8] [same] child-from-parent [3297.42] [same] parent-from-child [2243.84] [same] parent-from-child [3250]

no rAF loop - timer - with callback

Chrome Firefox (Release) Firefox (Nightly) Safari
[next] start [319738.586] [next] start [3146.64] [next] start [2317.98] [next] start [4285]
[same] parent [319738.586] [next] parent [3502.64] [next] parent [4436.58] [next] parent [4286]
[same] child [319738.586] [same] child [3502.64] [same] child [4436.58] [same] child [4286]
[same] child-from-parent [319738.586] [next] parent-from-parent [3513.3] [same] child-from-parent [4436.58] [same] child-from-parent [4286]
[next] parent-from-parent [319755.3] [same] parent-from-child [3513.3] [next] parent-from-parent [4450.48] [next] parent-from-parent [4313]
[same] parent-from-child [319755.3] [same] child-from-parent [3513.3] [same] parent-from-child [4450.48] [same] parent-from-child [4313]

rAF loop - timer - with callback

Chrome Firefox (Release) Firefox (Nightly) Safari
[next] start [12471.5] [next] start [4072.38] [next] start [3871.52] [next] start [3071]
[next] parent [12487.8] [next] child [4089.06] [next] parent [3888.16] [next] parent [3081]
[same] child [12487.8] [same] parent [4089.06] [same] child [3888.16] [next] parent-from-parent [3098]
[same] child-from-parent [12487.8] [next] child-from-parent [4105.72] [same] child-from-parent [3888.16] [same] child [3098]
[next] parent-from-parent [12504.8] [same] parent-from-child [4105.72] [next] parent-from-parent [3904.82] [same] child-from-parent [3098]
[same] parent-from-child [12504.8] [same] parent-from-parent [4105.72] [same] parent-from-child [3904.82] [next] parent-from-child [3114]

rAF loop - no timer - with callback

Chrome Firefox (Release) Firefox (Nightly) Safari
[next] start [1525.6] [next] start [2453.96] [next] start [1982.08] [next] start [1655]
[next] parent [1542] [same] child [2453.96] [next] parent [1998.42] [next] parent [1670]
[same] child [1542] [same] parent [2453.96] [same] child [1998.42] [same] child [1670]
[same] child-from-parent [1542] [next] child-from-parent [2470.62] [same] child-from-parent [1998.42] [same] child-from-parent [1670]
[next] parent-from-parent [1558.9] [same] parent-from-child [2470.62] [next] parent-from-parent [2015.4] [next] parent-from-parent [1687]
[same] parent-from-child [1558.9] [same] parent-from-parent [2470.62] [same] parent-from-child [2015.4] [same] parent-from-child [1687]

No callback

no rAF loop - no timer - no callback

Chrome Firefox (Release) Firefox (Nightly) Safari
[next] start [2026.798] [next] start [121446.66] [next] start [16043.14] [next] start [57572]
[next] parent [2043.464] [same] parent [121446.66] [next] parent [16056.82] [next] parent [57573]
[next] parent-from-parent [2060.1] [next] parent-from-parent [121463.32] [next] parent-from-parent [16060.42] [same] child-from-parent [57573]
[same] child-from-parent [2060.1] [same] child-from-parent [121463.32] [same] child-from-parent [16060.42] [next] parent-from-parent [57583]

no rAF loop - timer - no callback

Chrome Firefox (Release) Firefox (Nightly) Safari
[next] start [371021.52] [next] start [27246.48] [next] start [23284.22] [next] start [19185]
[next] parent [371038.186] [next] parent [29239.24] [next] parent [24551.96] [next] parent [19186]
[next] parent-from-parent [371054.9] [next] parent-from-parent [29239.5] [next] parent-from-parent [24568.02] [same] child-from-parent [19186]
[same] child-from-parent [371054.9] [same] child-from-parent [29239.5] [same] child-from-parent [24568.02] [next] parent-from-parent [19196]

rAF loop - timer - no callback

Chrome Firefox (Release) Firefox (Nightly) Safari
[next] start [40338.1] [next] start [82921.86] [next] start [15404.24] [next] start [13267]
[next] parent [40354.3] [next] parent [82938.54] [next] parent [15421.42] [next] parent [13281]
[same] child-from-parent [40354.3] [next] child-from-parent [82955.2] [same] child-from-parent [15421.42] [same] child-from-parent [13281]
[next] parent-from-parent [40371.4] [same] parent-from-parent [82955.2] [next] parent-from-parent [15438.1] [next] parent-from-parent [13298]

rAF loop - no timer - no callback

Chrome Firefox (Release) Firefox (Nightly) Safari
[next] start [18124.9] [next] start [18787.16] [next] start [11698.62] [next] start [9357]
[next] parent [18141.8] [same] parent [18787.16] [next] parent [11714.86] [next] parent [9370]
[same] child-from-parent [18141.8] [next] child-from-parent [18803.82] [same] child-from-parent [11714.86] [same] child-from-parent [9370]
[next] parent-from-parent [18158.8] [same] parent-from-parent [18803.82] [next] parent-from-parent [11732.04] [next] parent-from-parent [9387]

@emilio
Copy link
Contributor Author

emilio commented Jul 26, 2024

@emilio Why does the test have log(t + childWin.performance.timeOrigin, "parent-from-parent"); ? Shouldn't it use parentWin

Typo, yeah, fixed it up.

Isn't this test hitting the step 4. Unnecessary rendering which would remove the iframe's doc if nothing has to be rendered there?

My understanding of that step was always that it was meant for stuff like background tabs or invisible iframes, not iframes that are on screen and just don't have enqueued work at the beginning of the step... Of course that sentence is so vague that sure, it could hit it... See #10333 and co too.

@esprehn
Copy link

esprehn commented Jul 26, 2024

I don't think what Chrome's doing qualifies for that step, because if your iframe has mutated content (ex. appended new text, updated styles) Chrome will consider that document for updating rendering in step 22. The spec does not allow skipping raf but running rendering like that. It's the same list of docs.

I also think what Chrome is doing is wrong because many APIs cause rAF to suddenly start happening in the same frame. For example if you add a MediaQueryList then the child rafs won't miss a frame anymore because of the heuristic.

bool ScriptedAnimationController::HasScheduledFrameTasks() const {
  return callback_collection_.HasFrameCallback() || !task_queue_.empty() ||
         !event_queue_.empty() || !media_query_list_listeners_.empty() ||
         GetWindow()->document()->HasAutofocusCandidates() ||
         !vfc_execution_queue_.empty();
}

Chrome is trying to detect in advance if any step of the process will have work, but I don't think that's actually possible because each loop may enqueue work for a child document in a later step's loop.

@szager-chromium
Copy link

Back from vacation!

I think the key issue here is the document filtering part of the spec, as mentioned above. I'll quote the spec text here, since resolving this issue probably involves a careful parsing:

Unnecessary rendering: Remove from docs any Document object doc for which all of the following are true:

the user agent believes that updating the rendering of doc's node navigable would have no visible effect;
and doc's map of animation frame callbacks is empty.

I think a careful reading indicates that the current behavior of chromium and Firefox Nightly is correct. The filtering is done prior to running any rAF callbacks, and for the failing test the above conditions hold true for the iframe: the UA believes that updating the rendering will have no visible effect and the doc's map of animation frame callbacks is empty. So it's correct to postpone the child-from-parent callback until the next rendering update.

I have no opinion as to whether that's optimal, I think reasonable people can disagree on that point. I'm open to modifying the spec, but with the current spec I don't think there's anything to do here in chromium.

@emilio
Copy link
Contributor Author

emilio commented Aug 7, 2024

I'm not convinced I agree with that read of the spec.

the user agent believes that updating the rendering of doc's node navigable would have no visible effect;

That's exceptionally vague, but given updating the rendering of a document can queue rendering updates on all same-origin documents connected to it, it doesn't really seem applicable to that case?

To me that bit of the spec has always meant to allow throttling of background tabs / offscreen frames, but it's so vague so other interpretations are surely possible...

@szager-chromium
Copy link

After a careful re-reading, I see your points @esprehn and @emilio. I think the false positives from HasScheduledFrameTasks() are probably fixable, but determining "no visible effect" in the general case would probably be very difficult, and as @esprehn points out, chromium doesn't actually skip step 22 as it ought to.

Now I am wondering why the "unnecessary rendering" clause exists at all. What situation does it address that is not covered by the preceding "rendering opportunity" clause? Could we just remove the "unnecessary rendering" language altogether?

@esprehn
Copy link

esprehn commented Aug 9, 2024

Yeah the big issue here is that it violates the requirement that requestAnimationFrame will always run before a browser applies a DOM mutation to the screen. Worse it's inconsistent since random things (from a web developers perspective) will make rAF stop missing frames.

I spent many years convincing developers to use rAF instead of Promises to batch work. Even top framework developers were convinced they had to use Promises to "not miss a frame" because they thought sometimes the browser would paint but not run rAF. That wasn't true until this change landed though.

@szager-chromium would it be possible to revert that change? I worry about backsliding into a world where frameworks stop using rAF because it's unreliable.

@Kaiido
Copy link
Member

Kaiido commented Aug 10, 2024

the big issue here is that it violates the requirement that requestAnimationFrame will always run before a browser applies a DOM mutation to the screen

I don't think there is such a "requirement" though. rAF will run before the paint only when it's been called before the animation callbacks have been fired. If it's called between animation callbacks and paint, e.g. in an rAF callback or in a ResizeObserver's callback, the paint will occur before. Here we are in a rAF callback, the question is whether rAF callbacks from other documents in the same navigable should be treated as the same pool of callbacks or not.

Here is a small test exposing the issue in a more visual way. It does set both the parent and the iframe's background to red in the parent's rAF callback, then schedules the next iframe's rAF callback to set its background to yellow, and at the same time it also schedules the next parent's callback to set everything back to blue.

  • In Chrome and FF Nightly we get one paint entirely red, then the parent is blue and the iframe yellow in the same paint.
  • In Safari, one paint where parent is red and iframe yellow, then all blue in the next paint.
  • In Firefox release, one paint entirely red, then all blue in the next paint (IMO this one is the most problematic).

So yes, in Chrome, if you call iframe.requestAnimationFrame() from within parent's rAF callback, your callback will be scheduled after the changes you made in parent's rAF callback. But once again, we are in a rAF callback already, so this makes some sense and one should probably not expect the new callback to fire before in such a case.

Also note that every browser does respect firing the iframe's ResizeObserver's callback between the parent's rAF callback and the paint, so you can still hook there if needed.

And from a web-dev's point of view, I don't think this "requirement" would be better enforced by walking down the tree at every step since an iframe lower in the DOM could call the rAF of an upper frame ending up in the same situation, unless you make rAF possibly reentrant from within an update the rendering task, which would be an even worse breakage of expectations IMO.

@szager-chromium
Copy link

@Kaiido I don't agree with your interpretation. The relevant spec text is:

14. For each doc of docs, run the animation frame callbacks for doc, passing in the relative high resolution time given frameTimestamp and doc's relevant global object as the timestamp.

...

To run the animation frame callbacks for a target object target with a timestamp now:
   Let callbacks be target's map of animation frame callbacks.
   Let callbackHandles be the result of getting the keys of callbacks.
   For each handle in callbackHandles, if handle exists in callbacks:
   Let callback be callbacks[handle].
   Remove callbacks[handle].
   Invoke callback with « now » and "report".

Each doc effective snapshots its set of rAF callbacks to run at the beginning of processing for that document. When the parent doc runs its rAF callbacks, the child doc's set is still dynamic. If a rAF callback in the parent doc schedules rAF in the child doc, it should run in the child doc during the same rendering update. If a rAF callback schedules another rAF callback for the same document (parent or child), then it should run during the next rendering update.

However, I think that's all a bit besides the point. @emilio's test case demonstrates that when a rAF callback in the parent doc schedules a rAF callback in the child doc, Chrome behaves differently depending on whether the child doc already had a registered rAF callback prior to starting the rendering update. That's a Chrome bug.

@esprehn I'm not sure if you mean revert the spec language about "unnecessary rendering" or revert the recent chromium change that modified the behavior. I don't think we should revert the chromium change; it actually brings chromium's behavior more in line with the event loop spec, the above-mentioned bug notwithstanding. If we could agree to remove the "unnecessary rendering" clause from the spec, then it would be a very simple change in chromium to fix that bug by never skipping rAF processing for a doc with a rendering opportunity.

@past past removed the agenda+ To be discussed at a triage meeting label Aug 13, 2024
@noamr
Copy link
Contributor

noamr commented Aug 14, 2024

Something that's not tested here and came about when we were discussing, is that the rAF callbacks are not actually called in tree-order in Blink & WebKit, but rather in frame insertion order, which is what's used for finding a window by name.

See https://jsfiddle.net/6xsdc3qr/2/ to reproduce.
See also whatwg/dom#1270 (comment)

@Kaiido
Copy link
Member

Kaiido commented Aug 15, 2024

Sorry this is a bit off topic, still it might warrant some clarifications:

@Kaiido I don't agree with your interpretation. The relevant spec text is:

Yes, I didn't meant to say Chrome's behavior is per specs, just that it somehow "makes sense", sorry for the confusion. My comment was a direct response to the previous one claiming that web-devs would stop using rAF because it's now "unreliable". My point was that when advocating for rAF as a batch checkpoint it should be made clear that after the doc's rAF callback it will schedule for after the next paint, and that a web-dev should probably not expect calling rAF of document A from document B's rAF callback to fire in the current animation frame. Even if indeed the specs do require some kind of order between these docs, it turns out that no browser actually follows this order, and even the specced order might be complex to handle for a web-dev. So the point was that this discrepancy doesn't make rAF much more broken, even if I'm obviously for fixing all the interop issues around here.

@szager-chromium
Copy link

@Kaiido -- you make a good point about predictability of the platform. Here's the spec text that addresses the ordering issue:

Let docs be all fully active Document objects whose relevant agent's event loop is eventLoop, sorted arbitrarily except that the following conditions must be met:

   - Any Document B whose container document is A must be listed after A in the list.

   - If there are two documents A and B that both have the same non-null container document C, then the order of A and B in the list must match the shadow-including tree order of their respective navigable containers in C's node tree.

Are the docs actually sorted arbitrarily? Or are the two sorting conditions sufficient to fully specify a deterministic order? If so, then I would propose that in addition to removed the unnecessary rendering clause we ought also to remove sorted arbitraritly. That would at least provide a deterministic ordering that is testable via WPT.

@szager-chromium
Copy link

Not sure if any of the participants here are planning to attend TPAC next month. I'm not planning to attend, but I could be convinced to change my plans if there are other people going who are interested in having a breakout session on this topic. I have a strong interest in bullet-proofing the event loop and rendering update spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

8 participants