Wait for the goroutines to complete in the MultiService runner #65

mna · 2023-01-17T19:50:55Z

Hello,

This is not technically an error, but the MultiService implementation currently returns as soon as the first service has run and the parallel goroutines for the other ones have started. The problem is that if this happens e.g. in a web server handling an MDM request, the request might return before all services have executed, and subsequent requests might not yet see the results of those services. That can be a surprising (and hard to identify/debug) behaviour.

In our nanomdm fork at Fleet, we merged a PR to ensure the MultiService does not return until all services have been executed. It maintains the same behaviour and optimization, in that it returns the results of the first service registered and runs all others in parallel, but it waits for the completion instead of leaving the goroutines running unattended. You can see it here: https://github.com/fleetdm/nanomdm/pull/4/files

If that's something you'd be interested in having upstream, I'd be happy to contribute the PR.

Thanks,
Martin

The text was updated successfully, but these errors were encountered:

korylprince · 2023-01-17T20:28:42Z

The current behavior is obviously faster, and I would imagine wouldn't run into any issues in some scenarios, but your use case is obviously valid as well. I wonder if you couldn't add a default false bool to MultiService that allows turning on this behavior:

type MultiService struct {
	logger  log.Logger
	svcs    []service.CheckinAndCommandService
	ctx     context.Context
	WaitAll bool
}

func (ms *MultiService) runOthers(ctx context.Context, r errorRunner) {
	var wg *sync.WaitGroup
	if ms.WaitAll {
		wg = new(sync.WaitGroup)
	}
	for i, svc := range ms.svcs[1:] {
		go func(n int, s service.CheckinAndCommandService) {
			if ms.WaitAll {
				defer wg.Done()
			}
			err := r(s)
			if err != nil {
				ctxlog.Logger(ctx, ms.logger).Info(
					"sub_service", n,
					"err", err,
				)
			}
		}(i+1, svc)
	}
	if ms.WaitAll {
		wg.Wait()
	}
}

jessepeterson · 2023-01-17T21:44:14Z

tl;dr: I'm not in principal against this, especially with a switch to turn it on and off as @korylprince suggested. But:

The problem is that if this happens e.g. in a web server handling an MDM request, the request might return before all services have executed, and subsequent requests might not yet see the results of those services.

NanoMDM itself doesn't have any dependencies on ordering of services other than it's own service. Can you give an example of a service or request that needs this sort of waiting? I'd love to learn more about what specifically is not working for you here.

There is only one such example I can think of and that's the TokenUpdate tally (which tracks how many TokenUpdates an enrollment has seen). We send that out on the web hook so it is looked up immediately after the service has stored it. Unfortunately because we bifurcate the context in the multi-service we can't store it on the request context, either (sadly) — but that'd be the ideal place to put it in this example.

Anyway interested to hear more!

jessepeterson · 2023-01-17T21:49:35Z

Also: an alternative could be an entirely different MultiService that waited for all services. If you had a waiting multi server you could actually re-use the context which has some benefits!

mna · 2023-01-18T13:45:12Z

Can you give an example of a service or request that needs this sort of waiting? I'd love to learn more about what specifically is not working for you here.

Sure, in our case we need to store some information in our tables in addition to the standard service, so e.g. on Authenticate, it runs the standard authenticate service, and then a service that inserts the device's information in other application-specific tables. There are a number of ways we could do this, of course, but using the multi-service was interesting since we receive the decoded parameters as arguments.

NanoMDM itself doesn't have any dependencies on ordering of services other than it's own service.

Note that it's still the case for the waiting scenario - there is no ordering of the rest of the services, they run in parallel, but they all complete before the call returns.

That being said, I'm not against the Wait option idea, but I wonder what's the use-case for not waiting? Whatever other services that are passed to the multi service may not complete for an arbitrary amount of time, if the caller passes multiple services, surely it needs to be able to see the results of those at some point after the call. Not waiting can cause a number of issues that I already hinted on, but more specifically:

For short-lived commands (maybe an unlikely scenario for this, but still), the command may exit before the other services complete (as they run in unattended goroutines)
For web servers (much more likely scenario), a graceful shutdown would not know about and wait for those other services to complete (it waits for connections to become idle, but since the requests calling multi-services may return before the other services complete, the shutdown will not wait for those)
If someone doesn't need to wait for some services, it is trivial for them to start a goroutine in the service's implementation and have it not wait. It is impossible to do the reverse (making the services wait if we want to) - without adding to the API (such as the Wait field mentioned above), that is.

jonathanweibel · 2024-01-08T11:52:03Z

Hello,

Our team is currently facing a use case where we would like to wait for all services. We host NanoMDM on a serverless cloud service to cut costs. The CPU allocation is only guaranteed during the HTTP Request/Response lifecycle. So, if my understanding of the current MultiService implementation is correct, only the first service is guaranteed to run on our hosting solution.

Is there any chance for the Wait option feature to be merged?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wait for the goroutines to complete in the MultiService runner #65

Wait for the goroutines to complete in the MultiService runner #65

mna commented Jan 17, 2023

korylprince commented Jan 17, 2023

jessepeterson commented Jan 17, 2023

jessepeterson commented Jan 17, 2023

mna commented Jan 18, 2023

jonathanweibel commented Jan 8, 2024

Wait for the goroutines to complete in the MultiService runner #65

Wait for the goroutines to complete in the MultiService runner #65

Comments

mna commented Jan 17, 2023

korylprince commented Jan 17, 2023

jessepeterson commented Jan 17, 2023

jessepeterson commented Jan 17, 2023

mna commented Jan 18, 2023

jonathanweibel commented Jan 8, 2024