Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle cases where signals are missed. #12

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

stephenworsley
Copy link
Contributor

As described here: https://docs.python.org/3/library/signal.html

A long-running calculation implemented purely in C (such as regular expression matching on a large body of text) may run uninterrupted for an arbitrary amount of time, regardless of any signals received. The Python signal handlers will be called when the calculation finishes.

Because of this, it is possible that many signals will be recieved but a log will be written only once. This pull attempts to fix this by estimating how many signals should have been recieved since the last one was handled. In order to get a more accurate calculation, the way signals are jittered has also been changed.

@bjlittle
Copy link
Member

@stephenworsley It's not really easy to determine whether this is an improvement or not. Without testing can you offer any evidence that this is a good PR?

@bjlittle bjlittle self-assigned this Oct 30, 2019
@bjlittle bjlittle added this to the v0.6.0 milestone Oct 30, 2019
@stephenworsley
Copy link
Contributor Author

@bjlittle This came up because I was testing Kapture with a file that had a long running call where for about 20 seconds it was not logging. On the resulting graph, this call only took up a small amount of space. This seems to me to be a tad misleading. With my fix, this call appeared on the graph as appropriately wide.

@rhattersley
Copy link
Member

To avoid having to change the sampling pattern, have you considered adding the time-since-last-signal to the _Frame that gets logged? The visualisation could then be tweaked to weight the widths accordingly?

@stephenworsley
Copy link
Contributor Author

@rhattersley I'm not sure that would get around the need to change the sampling pattern. If signals are being handled immediately, I don't think there should be any difference in the weight given to one which was handled ~0 seconds since the last handled signal and one which was handled at ~2 seconds since. I only want to add weight when the time since the last handled signal is sufficiently large that it is necessary that another signal was sent, but was ignored.

The thought behind changing the sampling pattern is that, with my pattern, the difference in time between two consecutive signals being sent is always less than the difference in time between, say, the j-th signal and the j+2th signal. As it is currently, the first value can be as large as 2 and the second value as small as 1. I suppose it would still be possible to approximate the number of missed signals, but with a tendency to underestimate by 1 (maybe something like half of the time). Whether this weighting took place before or after logging, I feel like the issue is still tied to the sampling pattern.

@rhattersley
Copy link
Member

If signals are being handled immediately, I don't think there should be any difference in the weight given to one which was handled ~0 seconds since the last handled signal and one which was handled at ~2 seconds since.

Completely agree - hence the original implementation. 👍

And I understand why inferring the number of lost signals from the previous & current signal times, requires that the possible signal intervals can't overlap. My only concern is that changing the sampling interval to [0.5, 1] might exchange one sampling bias for another. The original jitter pattern is deliberately designed to avoid bias in exchange for increased noise, so if you run for longer it will converge to an accurate answer.

On balance I suspect the new trade-off in this PR is worth it.

@rhattersley
Copy link
Member

For future reference, a couple of (totally not thought through!) ideas that probably aren't worth exploring right now:

  1. With a bit of fiddling around it might be possible to augment the signalling to make use of the value argument to sigqueue to supply a sequential counter.
  2. Can we move the timing of the signals from the main process to the wrap_user.py process, e.g. by using signal.setitimer. This would let the wrap_user.py process maintain its own concept of how many sampling intervals have passed.

@CLAassistant
Copy link

CLAassistant commented Oct 26, 2023

CLA assistant check
All committers have signed the CLA.

@trexfeathers trexfeathers removed this from the v0.6.0 milestone Feb 21, 2024
@bjlittle bjlittle removed their assignment Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants