-
Notifications
You must be signed in to change notification settings - Fork 586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rush] granular build event data #4741
Comments
@dmichon-msft mentioned an API that you may find useful. |
The rush-serve-plugin flattens relevant events into WebSocket messages for live data: Note that the |
@dmichon-msft Interesting, I (clearly) wasn't aware of the To your comment about |
The convergence would allow us to do most work related to build graph in one place instead of two; the main motivation of creating |
Summary
This is follow up from this Zulip thread.
Problem statement: Debugging Rush in remote environments (like CI Docker images) is difficult.
I propose a new realtime system for event generation and collection with Rush. This will be used in complement with telemetry and logs to provide a complete view of Rush lifecycle data and events. This new system will allow core libraries and plugins alike to publish events to an event bus, the same event bus would be used by both Heft and Rush libraries. There will be a set of plugins on top of the event bus that can read and publish events to a secondary location.
I'd also like this design to serve as a foundation for a realtime build tracking web UI. Also, while I won't touch on it in this design, a realtime event based system could provide the ability for web-hook like functionality where events could trigger workflows, like starting an Argo workflow, sending a message or updating a status check on Github.
Example events:
operation.status.changed
data.executionGraph
data.buildCache
Existing solutions
Telemetry
Telemetry should continue to be the place to put key information about Rush developers' everyday activities. That information can be fed into a data pipeline or reporting system and used to track pain points, long running jobs, frequent failures, etc. My view of telemetry is that its strength is in its aggregated format, timings are aggregated to important spans like operations, success and failure is reported at that same level. Overloading this format to allow for additional event types to be passed, with specific data besides machine and rush context metadata, is detracting the from the value proposition of Rush telemetry.
Logging
Logs should continue to be the place to output important debug information and helpful logs. When debugging, having a full view of the process, its preconditions, inputs and outputs is important. However, the current logging experience is always off or always on. Either you get no logs, 50% of the logs with
--verbose
, another 25% with--debug
and 100% with both--verbose
and--debug
. Looking into a specific precondition like build cache id construction would require sifting through the other logs for cobuilds, heft task execution, etc. While those logs are important, when I want to dive into a specific task (especially on a remote machine) they're difficult to parse through. Adding more information to them would make them event harder to parse through.Scope
In Scope
Out of Scope
Proposal
This proposal has 4 parts,
API
Events
Based off of k8s/Backstage, this model is intended to be extensible and be able to handle multiple schema adjustments by different kinds and spec.type. Adopters can define their own events + schemas and handle them separately.
event.kind
For naming, I propose a loose standard of using dot separated parts to denote namespaces. For example,
build.event.status.changed
,build
is the umbrella topic that this event sits in,event
is a bit more specific and the final 2 parts denote field changes.status
is something we care about andchanged
lets us know that this event should have a from state and a to state.I propose a special
data.*
namespace that will hold all events that are used to share state/data. This can be useful for uploading a build plan or cluster map. It's not quite an event in the traditional sense.Event Bus
In order to publish events, we'll use a simple event bus design. It should ideally be agnostic to event schema/type and just pass events to its subscribers. There should be one EventBus for Rush and one for Heft. This simplifies the model a little and makes the consumer model easier as well. I see the primary use case being forwarding these events to either a reporting system or a web UI, which can decide which events it cares about. For local consumers that want to be more judicious, each consumer of the event bus can listen to only the specific events that they care about, using the filters provided for kind and type.
Usage
Similar to build cache extensions, we would also provide an interface to use this event bus.
Rush Plugins
Heft Plugins
And then in the plugins,
Propagating Context
In the above Heft example, adding fields like cobuild runner ID and machine information is manual and may be prone to issues. To get around this, I propose a new
IEventFactory
that prepopulates these fields on events and a newEventContext
class that propagates context values through the application.These would also be passed to Heft plugins through
HeftTaskSession
.Default Setup
Out of the box, I propose a single plugin, the
rush-http-event-consumer-plugin
that will publish events to a given endpoint in config.Standard questions
Please answer these questions to help us investigate your issue more quickly:
@microsoft/rush
globally installed version?rushVersion
from rush.json?useWorkspaces
from rush.json?node -v
)?The text was updated successfully, but these errors were encountered: