Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Better errors, cached states, pending states, duration accounting, fe…
…wer spans (dagger#8442) * strip down redundant error wrapping this does two things: * remove a couple of not really helpful error wrapping spots * specifically for modules, stop printing ExecErrors and GraphQL errors, so that we don't have redundant output with the trace visualization Signed-off-by: Alex Suraci <[email protected]> * tui: render error logs beneath final print fixes dagger#7007 rather than printing log output in the tree view, which is annoying because it can't be copy-pasted, print error logs below it. this involves a mildly tedious process of hopping from unlazying spans (sync) to its unlazied effects (exec foo) in order to resurface it wherever it was installed in the tree (Go.gotestsum > Container.withExec). same logic as with Cloud. Signed-off-by: Alex Suraci <[email protected]> * dagql: tiny refactor don't need Err() anymore, overly precise Signed-off-by: Alex Suraci <[email protected]> * set service span errors Signed-off-by: Alex Suraci <[email protected]> * don't fail service spans if service was stopped Signed-off-by: Alex Suraci <[email protected]> * mark 'sync' spans internal these are always redundant with the effects they force to run Signed-off-by: Alex Suraci <[email protected]> * only record new LLB effects for a given call Signed-off-by: Alex Suraci <[email protected]> * set canceled attr if ctx canceled Signed-off-by: Alex Suraci <[email protected]> * wip: feat: add cached/pending labels Signed-off-by: Justin Chadwell <[email protected]> Signed-off-by: Alex Suraci <[email protected]> * refactor: support setting trace context in ops currently only implemented for exec ops, redundantly separated from ExecutionMetadata, but soon this can be used by all ops for effect tracking purposes. the end goal is to install a Span Link from the Buildkit span to the causal span Signed-off-by: Alex Suraci <[email protected]> * use span links for cause/effects Signed-off-by: Alex Suraci <[email protected]> * enable span links for all buildkit effects now we can associate spans to one another in the UI in a much more sensible manner Signed-off-by: Alex Suraci <[email protected]> * simplify llb <-> dagger tracking * instead of all effect IDs, include a singular LLB output digest * for caches, emit an attr containing all cached digests * refactor client Op tracking Signed-off-by: Alex Suraci <[email protected]> * fixup: handle nil in Solve DefToDAG Signed-off-by: Alex Suraci <[email protected]> * use span links for services Signed-off-by: Alex Suraci <[email protected]> * handle nil PB output Signed-off-by: Alex Suraci <[email protected]> * functional pending and cached state * back to an array of effect IDs, instead of single output * a span is pending iff it has effects and they're all pending * a span is cached iff it's cached itself or all of its effects are cached tricky bits: * have to exclude the "wrapper" op from the set of effects * have to emit an op's transitive input digests since Buildkit won't always send a span for every op Signed-off-by: Alex Suraci <[email protected]> * add primitive debugging to TUI Signed-off-by: Alex Suraci <[email protected]> * when zooming, focus zoomed span Signed-off-by: Alex Suraci <[email protected]> * w opens a deep link to focused span Signed-off-by: Alex Suraci <[email protected]> * don't hide "too fast" spans trying this on, not sure about it, but I like that you can see unabridged recipes, and it eliminates some randomness from the UI Signed-off-by: Alex Suraci <[email protected]> * wip: tie service starts to original exec Signed-off-by: Alex Suraci <[email protected]> * checkpoint ended up as a bit of a sprawl of work, TODO write message somewhat sizeable refactor to fix navigating out past the zoomed span Signed-off-by: Alex Suraci <[email protected]> * refactor: OrderedSet, gradually initialize spans rather than awkwardly storing relationships to deal with spans arriving in unpredictable order, we'll gradually initialize spans and bootstrap relationships immediately Signed-off-by: Alex Suraci <[email protected]> * wip: handling bass > integration case TODO: handle spans received out of order Signed-off-by: Alex Suraci <[email protected]> * show fast spans again to avoid breaking chains TODO: only hide entire pipelines that are too fast? Signed-off-by: Alex Suraci <[email protected]> * avoid double \n; each segment prints its own Signed-off-by: Alex Suraci <[email protected]> * dont print primary logs after error TODO: this might hide CLI errors? Signed-off-by: Alex Suraci <[email protected]> * dont show logs in tree view (again) not sure about this yet, but feels too verbose/redundant Signed-off-by: Alex Suraci <[email protected]> * wip: everything else (TODO diff with rebase) Signed-off-by: Alex Suraci <[email protected]> * teeny tiny refactor for consistency Signed-off-by: Alex Suraci <[email protected]> * fix running/failed status * clarify FailedSpans is really just links * make IsRunningOrLinksRunning actually only links and not also children * support debugging failed state Signed-off-by: Alex Suraci <[email protected]> * include runtime + codegen in module effects otherwise we can't tell that `withSource` or `asModule` failed when its runtime isn't able to compile Signed-off-by: Alex Suraci <[email protected]> * ignore sync spans, reveal internal errors otherwise we never show what actually failed during e.g. module compilation if we just show internal errors, that means we also show sync, so now we'll finally promote it back to 'ignore' level (presuming everything else is working properly) Signed-off-by: Alex Suraci <[email protected]> * clean up comments, more debugging Signed-off-by: Alex Suraci <[email protected]> * add telemetry golden test suite Finally, we have coverage for the full Engine -> OTel -> Frontend pipeline, with examples and failure modes that are easy to understand. Usage: dagger call test telemetry dagger call test telemetry --update -o . # update examples TODO: why are the examples wrong? TODO: did this _really_ need to be a separate suite + check? felt more sane at the time, re-validate. originally the catch was --update but maybe we just need a `golden` entrypoint? oh, but that would mean a separate check anyway. Signed-off-by: Alex Suraci <[email protected]> * add job Signed-off-by: Alex Suraci <[email protected]> * fix errors not being hoisted in tests Signed-off-by: Alex Suraci <[email protected]> * scrub redis pid Signed-off-by: Alex Suraci <[email protected]> * wait 10 seconds after warming up otherwise the cache doesn't seem to be committed or something Signed-off-by: Alex Suraci <[email protected]> * dont lint broken module Signed-off-by: Alex Suraci <[email protected]> * switch to golden package that supports interfaces using the underlying *testing.T means we don't get the assertion logs sent to OTel :( gotest.tools is good people anyway Signed-off-by: Alex Suraci <[email protected]> * slight tweak (no behavior diff) for service ctx Signed-off-by: Alex Suraci <[email protected]> * fix plain output choking on outer spans Signed-off-by: Alex Suraci <[email protected]> * golden tests: respect _EXPERIMENTAL_DAGGER_CLI_BIN Signed-off-by: Alex Suraci <[email protected]> * bring back warning, add comment warning shouldn't happen now that the root cause is fixed, and we probably care if we see it again Signed-off-by: Alex Suraci <[email protected]> * viztest lints Signed-off-by: Alex Suraci <[email protected]> * add Error type and Function.ReturnError very basic MVP to support returning proper errors from functions Signed-off-by: Alex Suraci <[email protected]> * regen viztest Signed-off-by: Alex Suraci <[email protected]> * regen SDKs Signed-off-by: Alex Suraci <[email protected]> * regen docs Signed-off-by: Alex Suraci <[email protected]> * further reduce error spamminess * don't reveal failed spans if they are linked to other spans * hide the `error()` span by default * it's kind of interesting to show, but not worth the noise * in the future we might want to bring this back, if/when errors can be annotated with more data - but it should prob be in a different representation even then. Signed-off-by: Alex Suraci <[email protected]> * dagui: update statuses in snapshots Signed-off-by: Alex Suraci <[email protected]> * promote ReexoprtLogsFromPB to sdk/go/telemetry no additional dependency footprint, and LogsFromPB is a total trap, so let's just share it Signed-off-by: Alex Suraci <[email protected]> * always emit failed spans so they can roll up Signed-off-by: Alex Suraci <[email protected]> * mark .sync passthrough ideally nothing ends up beneath here, but Passthrough is safer than Ignore - in case anything does end up there, we don't want to hide it Signed-off-by: Alex Suraci <[email protected]> * ensure spans are sorted chronologically Signed-off-by: Alex Suraci <[email protected]> * fix Dockerfile tracing and logs Signed-off-by: Alex Suraci <[email protected]> * fix indenting error messages Signed-off-by: Alex Suraci <[email protected]> * remove noisy log Signed-off-by: Alex Suraci <[email protected]> * hide internal blob spans, update golden tests Signed-off-by: Alex Suraci <[email protected]> * hide pull spans these don't add a whole lot on their own, similar to exec vs. withExec. they also suffer from not always showing up due to Buildkit solver deduping. Signed-off-by: Alex Suraci <[email protected]> * reveal linked spans, but mark execs passthrough we don't always want to hide linked spans: Dockerfiles for example are an example of a one-to-many cause => new effects relationship. withExec -> exec is one-to-one, so in that case we want to hide it, but we can just do that by setting Passthrough. at this point we should probably stop using hacky [internal] prefixes for names and just set attributes like we've done here. Signed-off-by: Alex Suraci <[email protected]> * more accurate child counting, still wip Signed-off-by: Alex Suraci <[email protected]> * bubble up errors to causal spans plus stabilize more golden test output Signed-off-by: Alex Suraci <[email protected]> * clean up more buildkit spans * switch from [internal] prefix to metadata * fix frontend input ops not being recorded Signed-off-by: Alex Suraci <[email protected]> * appease linter Signed-off-by: Alex Suraci <[email protected]> * modules/go: fix lint/tidy spans never failing Signed-off-by: Alex Suraci <[email protected]> * fix elixir test not bubbling errors Signed-off-by: Alex Suraci <[email protected]> * fix elixir test assertion Signed-off-by: Alex Suraci <[email protected]> * .dagger: appease linter Signed-off-by: Alex Suraci <[email protected]> * telemetry tests: ignore source path for uploads Signed-off-by: Alex Suraci <[email protected]> * bump sleep? :( Signed-off-by: Alex Suraci <[email protected]> * split out cached exec tests better test hygiene, and so i can disable them until we figure out what's causing cache misses Signed-off-by: Alex Suraci <[email protected]> * avoid cache flake within use-exec-service Signed-off-by: Alex Suraci <[email protected]> * disable caching-specific tests Signed-off-by: Alex Suraci <[email protected]> * finish keeping track of running activity Signed-off-by: Alex Suraci <[email protected]> * appease linter Signed-off-by: Alex Suraci <[email protected]> * .github: remove test-telemetry don't need a separate check for this, just the separate command so we can update the fixtures Signed-off-by: Alex Suraci <[email protected]> * attempt cache tests, but skip if they fail this way we can at least run them locally Signed-off-by: Alex Suraci <[email protected]> * Span.Errors: prioritize more familiar errors Signed-off-by: Alex Suraci <[email protected]> * only skip primary output if tree has errors Signed-off-by: Alex Suraci <[email protected]> * print trace link after error logs Signed-off-by: Alex Suraci <[email protected]> * assert against stdout, update golden examples Signed-off-by: Alex Suraci <[email protected]> * add python sdk test coverage (currently broken) Signed-off-by: Alex Suraci <[email protected]> * custom span test coverage across Go/TS/Python Signed-off-by: Alex Suraci <[email protected]> * return original error if nice error not available otherwise Python/TS get confusing errors Signed-off-by: Alex Suraci <[email protected]> * .dagger: don't go test with -v this just generates an asinine amount of logs - doesn't make sense for CI. Signed-off-by: Alex Suraci <[email protected]> * Return error in Python Signed-off-by: Helder Correia <[email protected]> * update Python telemetry tests Signed-off-by: Alex Suraci <[email protected]> * sdk/go: revert this: add panic handling wrote this, but throwing away since it'll actually make panics harder to diagnose until we add error stack support Signed-off-by: Alex Suraci <[email protected]> * Revert "sdk/go: revert this: add panic handling" This reverts commit a93d3e5. Signed-off-by: Alex Suraci <[email protected]> * use atomic.Bool Signed-off-by: Alex Suraci <[email protected]> * retire PBOutput a brief wrong move in time Signed-off-by: Alex Suraci <[email protected]> * stabilize Redis version also update include/exclude order for module upload, I think this is actually a behavior change Signed-off-by: Alex Suraci <[email protected]> * encapsulate gRPC noise beneath 'starting session' Signed-off-by: Alex Suraci <[email protected]> * bump midterm Signed-off-by: Alex Suraci <[email protected]> * minimize span snapshots sent over * keep track of spans that actually changed * only send passthrough spans that have failed children * TODO: double passthrough? Signed-off-by: Alex Suraci <[email protected]> * bump midterm Signed-off-by: Alex Suraci <[email protected]> * send ancestry of spans on first subscription this accounts for a situation where a UI has navigated directly to a nested span, and we want to be able to fetch the target span plus any parent spans so we can display a breadcrumb. in some respects this makes SpanSnapshots obsolete, but we'll keep it around for now Signed-off-by: Alex Suraci <[email protected]> * fix post-merge compile errors Signed-off-by: Alex Suraci <[email protected]> * Service.start: evaluate mounts in parallel Signed-off-by: Alex Suraci <[email protected]> * continue chaining from span output Signed-off-by: Alex Suraci <[email protected]> * go: fix reporting module build failures * use telemetry.End so it doesn't hide underlying error * synchronously evaluate build - otherwise it's impossible for the span to error, and we're not measuring much at all Signed-off-by: Alex Suraci <[email protected]> * dagui: insert spans using binary search obviously much faster, especially for db.Spans which contains the whole set of spans Signed-off-by: Alex Suraci <[email protected]> * look through spans with Unset status to find errors to disable this, explicitly set Status=Ok at proper encapsulation boundaries Signed-off-by: Alex Suraci <[email protected]> * version span snapshots so FE can detect changes Signed-off-by: Alex Suraci <[email protected]> * fix future test Signed-off-by: Alex Suraci <[email protected]> * appease linter Signed-off-by: Alex Suraci <[email protected]> * print all error lines, not just the first Signed-off-by: Alex Suraci <[email protected]> * simplify and shorten duration format don't bother showing fractional seconds beyond a minute. saves a bunch of valuable characters, and easier to skim. Signed-off-by: Alex Suraci <[email protected]> * only collect exec stdout/stderr for legacy mode Signed-off-by: Alex Suraci <[email protected]> * tidy up Python errors Signed-off-by: Alex Suraci <[email protected]> * say what request was failed to serve Signed-off-by: Alex Suraci <[email protected]> * better failure Signed-off-by: Alex Suraci <[email protected]> * nix noisy log Signed-off-by: Alex Suraci <[email protected]> * initialize nested client before subscribing Signed-off-by: Alex Suraci <[email protected]> * bump bubbletea to reduce flicker Signed-off-by: Alex Suraci <[email protected]> * regen Signed-off-by: Alex Suraci <[email protected]> * fix deprecation Signed-off-by: Alex Suraci <[email protected]> * update telemetry golden examples Signed-off-by: Alex Suraci <[email protected]> --------- Signed-off-by: Alex Suraci <[email protected]> Signed-off-by: Justin Chadwell <[email protected]> Signed-off-by: Helder Correia <[email protected]> Co-authored-by: Justin Chadwell <[email protected]> Co-authored-by: Helder Correia <[email protected]>
- Loading branch information