Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: invocation spec #34

Closed
wants to merge 16 commits into from
Closed

feat: invocation spec #34

wants to merge 16 commits into from

Conversation

Gozala
Copy link
Collaborator

@Gozala Gozala commented Jan 12, 2023

📜 Preview
± Diff with ucan invocation spec

This is the fork of ucan invocation spec authored by @expede that alters it as follows:

  1. Addresses Addressing individual task results as opposed to invocations ucan-wg/invocation#6 by combining tasks and invocations.
    • There are no special syntax for batches. You can create a batch by pipelining task outputs into any task input gives you batching, something like identity function e.g. data/identity could be used to do basically what batches accomplish in original spec
       {
         "v": "0.1.0",
         "aud": "did:web:ucan.run",
         "iss": "did:key:z6Mkqa4oY9Z5Pf5tUcjLHLUsDjKwMC95HGXdE1j22jkbhz6r",
         "with": "did:key:z6Mkqa4oY9Z5Pf5tUcjLHLUsDjKwMC95HGXdE1j22jkbhz6r",
         "do": "data/identity",
         "input": {
           "left": {
             "<-": {
               "v": "0.1.0",
               "aud": "did:web:ucan.run",
               "iss": "did:key:z6Mkqa4oY9Z5Pf5tUcjLHLUsDjKwMC95HGXdE1j22jkbhz6r",
               "with": "https://example.com/blog/posts",
               "do": "crud/create",
               "input": {
                 "headers": {
                   "content-type": "application/json"
                 },
                 "payload": {
                   "title": "How UCAN Tasks Changed My Life",
                   "body": "This is the story of how one spec changed everything...",
                   "topics": ["authz", "journal"],
                   "draft": true
                 }
               },
               "s": { "/": { "bytes": "sig...left" } }
             }
           },
           "right": {
             "<-": {
               "v": "0.1.0",
               "aud": "did:web:ucan.run",
               "iss": "did:key:z6Mkqa4oY9Z5Pf5tUcjLHLUsDjKwMC95HGXdE1j22jkbhz6r",
               "with": "mailto:[email protected]",
               "do": "msg/send",
               "input": {
                 "to": ["[email protected]", "[email protected]"],
                 "subject": "Coffee",
                 "body": "Hey you two, I'd love to get coffee sometime and talk about UCAN Tasks!"
               },
               "meta": {
                 "dev/tags": ["friends", "coffee"],
                 "dev/priority": "high"
               },
               "s": {
                 "/": {
                   "bytes": "sig...right"
                 }
               }
             }
           }
         },
         "s": { 
            "/": { 
              "bytes:": "5vNn4--uTeGk_vayyPuNTYJ71Yr2nWkc6AkTv1QPWSgetpsu8SHegWoDakPVTdxkWb6nhVKAz6JdpgnjABppC7" 
            } 
         } 
       }
    • There are no longer task/closure pointers, because you could just refer to invocation by CID or inline it
  2. Promises and pipelines have been altered to a more simplified syntax that either
    • Refers to whole task result (regardless of success / failure)

       {"<-":{ "/": "bafyreifngx2r7ifssddx5ohdxsn2x5ukfhh6rxy6hswulkxbn6yw6f7jce" }}
    • Specific result branch (success / failure / ...)

      {"<-":[{ "/": "bafyreifngx2r7ifssddx5ohdxsn2x5ukfhh6rxy6hswulkxbn6yw6f7jce" }, "ok"]}
    • Or a specific value inside a specific branch

      {"<-":[{ "/": "bafyreifngx2r7ifssddx5ohdxsn2x5ukfhh6rxy6hswulkxbn6yw6f7jce" }, ["error", "message"]]}
    • If wrong branch is pipelined into invocation than invocation MUST fail

  3. New pipeline syntax is simpler yet more powerful, because:
    • It no longer requires output to be Result as long as it's kinded union branching will work
    • It does recommend to use Result or extension with added variants when binary ok / error is not enough
  4. Support for non-binary output has been introduced (by removing Result requirement). That is to allow extensions that model long running tasks that may transition through multiple states. It is out of scope here, but point is it could be implemented without violating spec and promise pipelining would also work.
  5. Proofs no longer require that aud be an Executor and allow them to be iss, that way we don't have to first create a delegation and then equivalent invocation. It does state however that unless aud is Executor it will not be able to redelegate such proofs.
    • I don't feel very strong about this and can go back to match the ucan invocation spec given a compelling reason.
  6. Some field names changed to use same naming as in ucan-ipld

I hope this draft provides a better way to articulate issues we had with original spec and provides side by side comparison. @expede I would love your feedback

@expede
Copy link

expede commented Jan 12, 2023

Thanks for sharing @Gozala 🙌

Pipelines

I like the visual pun of <- 😁

Expressivity

New pipeline syntax is simpler yet more powerful, because:
It no longer requires output to be Result as long as it's kinded union branching will work
It does recommend to use Result or extension with added variants when binary ok / error is not enough

We looked at this in the original spec, but it made things much more difficult for the scheduler to understand cases like partial failure. For my own understating: do you have use case where you gain something from not putting success cases on an ok branch as a payload?

Nonbinary Output

Support for non-binary output has been introduced (by removing Result requirement). That is to allow extensions that model long running tasks that may transition through multiple states. It is out of scope here, but point is it could be implemented without violating spec and promise pipelining would also work.

Can you say more about this? The UCAN Invocation spec as also supports structured output.

Signatures

That's... a lot of signatures. You came up with a possible idea of externalizing the signatures to avoid this last year, did that idea not work out?

Is there intentionally no signature on the wrapper?

Explicit DAG

Why the added layer of DAG? Does this have e.g. transactional semantics?

We explored this idea in IPVM but ultimately decided against it. Our take (as well as most OCAP systems etc) was that the only semantically meaningful info is the dependencies between data.

@Gozala
Copy link
Collaborator Author

Gozala commented Jan 12, 2023

We looked at this in the original spec, but it made things much more difficult for the scheduler to understand cases like partial failure. For my own understating: do you have use case where you gain something from not putting success cases on an ok branch as a payload?

Unless I'm misunderstanding what you are saying here, I don't think would really change for the scheduler here. Because you just pipe specific branch, if it does not exists dependent task fails because it made wrong assumption.

If you mean more broadly it's hard to tell if something is success or failure if you don't have ok or error, I do recognize it and perhaps language could be made more harsh so Result is mandatory as opposed to recommended.

I'll elaborate on why I have not locked it to result only a next section.

Can you say more about this? The UCAN Invocation spec as also supports structured output.

By non-binary output I meant extensions of the Result variant like an example Status type described in the document. Intention behind making output open to extensions (not just Result) is because we'd like to extend an invocation spec for long running tasks that step through multiple states before either succeeding or failing. Idea is that intermediate step would provide something like a pending Status and some effects (invocations to perform) until terminating in either ok or error state.

Perhaps all of that could be modeled with result instead, I'm open to reconsidering. However it did seem that making pipeline open to picking arbitrary branch was more flexible that locking it to ok / error logic.

That's... a lot of signatures.

I would argue there are less signatures, because here you are not required to delegate to the executor and then wrap that into invocation.

You came up with a possible idea of externalizing the signatures to avoid this last year, did that idea not work out?

We could reduce signatures by signing a set indeed, that said I've decided it was premature optimization and decided to defer until we find a need for it.

Is there intentionally no signature on the wrapper?

Well there is no wrapper! It's just packet of invocations in CAR or JSON or whatever. If you want to batch them you pipe invocations into input of something like data/identity and than one does need to be signed. This way how you deliver invocations to the machine is decoupled from what you're running & I fail to see benefits in coupling various tasks that don't depend on each other, on the contrary I have concerns in regards to unnecessary coupling which had been primary blocker for UCAN invocation spec as it is now

@expede
Copy link

expede commented Jan 12, 2023

(Just between meetings, I'll give your response a closer look in a bit. Just a quick possible clarification)

Well there is no wrapper! It's just packet of invocations in CAR or JSON or whatever.

Perhaps to clarify: there's no signature on this top level in your example.

{
   "aud": "did:web:ucan.run",
   "iss": "did:key:z6Mkqa4oY9Z5Pf5tUcjLHLUsDjKwMC95HGXdE1j22jkbhz6r",
   "with": "did:key:z6Mkqa4oY9Z5Pf5tUcjLHLUsDjKwMC95HGXdE1j22jkbhz6r",
   "do": "data/identity",

Without a signature, why are there iss and aud fields?

@Gozala
Copy link
Collaborator Author

Gozala commented Jan 12, 2023

Perhaps to clarify: there's no signature on this top level in your example.

Oh I see, that was a mistake it meant to have it just overlooked it. One in the spec does have it though
https://github.com/web3-storage/specs/blob/90e9b9a99d421f6a5fd6b192e164a3c9b164d09e/invocation.md?plain=1#L547-L629

Updated that example to fix this

invocation.md Outdated

with URI
do Ability
input In (implicit {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're shortening key names like s for the varsig, maybe this could be in? fwiw I don't think that shortname is claimed in the jwt claim registry that we borrowed/were-inspired-by e.g. nbf and iss from https://www.iana.org/assignments/jwt/jwt.xhtml#claims

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also noticing that receipt has out not output so in may fit better aesthetically, regardless of length

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually wanted to use in, but was hesitating for the same reason field named with is really annoying (in JS) as you can't destructure or have variables with that name.

I don't mind renaming it to in though

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have also considered use because then you have "with Resource do Action, use Input"

invocation.md Outdated

### 3.2.8 Nonce

If present, the OPTIONAL `nnc` field MAY include a random nonce expressed in ASCII. This field can ensures that multiple invocations are unique.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's present but non-ascii, does that make it an invalid invocation? (before sig checks)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what's the motivation here as it was inherited from @expede's spec. If I were doing it from scratch I'd just made nnc bytes.

@Gozala
Copy link
Collaborator Author

Gozala commented Jan 12, 2023

For my own understating: do you have use case where you gain something from not putting success cases on an ok branch as a payload?

By non-binary output I meant extensions of the Result variant like an example Status type described in the document. Intention behind making output open to extensions (not just Result) is because we'd like to extend an invocation spec for long running tasks that step through multiple states before either succeeding or failing. Idea is that intermediate step would provide something like a pending Status and some effects (invocations to perform) until terminating in either ok or error state.

Perhaps all of that could be modeled with result instead, I'm open to reconsidering. However it did seem that making pipeline open to picking arbitrary branch was more flexible that locking it to ok / error logic.

Let me provide more context here there are some napkin notes here , which I was hesitant to share which in turn motivated me to write up this PR. There are even more napkin notes, but let me try and summarize general idea here.

We need to model system in which invoking a task may produce a result that is either

  1. A plain data
  2. An effect to be performed (either by a service or a client agent)
  3. An error

So what happens when we perform store/add with some CAR cid ?

ℹ️ The flow chart below illustrates finite-state machine (FSM) of the specific CAR in the user space

flowchart LR
   Idle("⛔️ not found") -- store/add --> 
   Active("🚦 active") --> Storage{Enough storage?}
   Storage --> |Yes| S3{Is in S3 ?}
   Storage --> |No| OverCapacity("⛔️ exceed capacity")
   
   S3 --> |YES| Done("✅ stored")
   S3 --> |No| WaitS3Put("🚦 receiving")
   WaitS3Put -- store/put --> Done
   
   
   WaitS3Put -- store/expire --> Expired("⛔️ expired")
   

   Active -- store/receive --> S3
   Active -- store/exceed --> OverCapacity
Loading

Edit: I've updated flow chart so it no longer misleads about cycles

In other words our task can be in one of the following states

  • 💤 idle - error: not found
  • ⚙️ active - queued: pending
  • ⛔️ over capacity - error: over capcity
  • ⛔️ expired - error: put expired
  • 🚦 await put - blocked: awaiting s3 put
  • ✅ done - ok: car is stored

How this relates to this spec ?

  1. We want recipts to capture full path from 💤 -> ✅ (or ⛔️). Which is why they are linked by origin.
  2. I was thinking we could
    • Use result extension with variants to capture state is blocked 🚦
    • Extend receipt to include effects that need to run to unblock
    • Which is also why receipts are origin liked

We could and it in fact it may be better, to push block and effects down into receipt.out.ok, which is what you've suggsted @expede isn't it ?

I just don't know yet, in most cases I've found having third variant besides ok / error to be useful. E.g. generators in JS can fail, succeed or yield. Futures in Rust also seem to have ready / pending (where ready often wraps result).

That is to say it felt like not locking it down seemed to provide more flexibility in the future, but I can also recognize how this ambiguity introduces complexity. In other words I don't mind making Result a requirement.

@expede
Copy link

expede commented Jan 13, 2023

@Gozala thanks for writing up this Issue and pinging me on it 🎉 It was clarifying, though maybe not in the way that either of us was hoping for. I would like to chat though some of this on our scheduled Friday call if you still have interest! At minimum I'd like to see how I can still be helpful to you even if IPVM and W3S probably need different specs.


I think my big take away from this is that we're trying to do very different things with this spec, which is where the tension is coming from. On the surface these look very similar, which is probably why it's so tempting to want to have both in one format. My comments below are under the assumption that we'd try to harmonize our specs and use cases. If we decide to just agree to disagree on this, then I'm also happy to give feedback for the DAG House use case without IPVM concerns. I like y'all and what you're doing, and want to help however I can, basically 💯

I'm ~99% sure that our proposals are equally expressive, but that they emphasize different things. To emphasize: I think we you can express everything from your current proposal in the UCAN Invocation, and vice versa, but they emphasize different things which makes each format awkward for the other.

↔️ Core Differences

Blub

I think we've fallen prey to the Blub Paradox:

[...] to explain this point I'm going to use a hypothetical language called Blub. Blub falls right in the middle of the abstractness continuum. It is not the most powerful language, but it is more powerful than Cobol or machine language.

As long as our hypothetical Blub programmer is looking down the power continuum, he knows he's looking down. Languages less powerful than Blub are obviously less powerful, because they're missing some feature he's used to. But when our hypothetical Blub programmer looks in the other direction, up the power continuum, he doesn't realize he's looking up. What he sees are merely weird languages. He probably considers them about equivalent in power to Blub, but with all this other hairy stuff thrown in as well. Blub is good enough for him, because he thinks in Blub.

That's not to say that our approaches are more of less powerful than each other: in fact I'm quite convinced that they're equally expressive, they're just tuned to make different things easy or hard.

Multiple-Task Configuration vs Collections

In IPVM, we have more than single tasks in a collection: we also have the abilty to set resource limits on the entire transaction, configure retry handlers, and so on. This means that an IPVM workflow is more than a collection of tasks that happen to be carried together. I think that needs more description than your use case. I've sketched out a few versions of this using your proposed format here, but they're extremely awkward.

I think the big sticking point is paths to promises versus direct CIDs. (The current version of) IPVM needs to be able to express "many tasks together", and the W3S version doesn't. This means that the extra layer of wrapping in the current spec probably looks like pure overhead from your POV. For us, in the current system constraints, it's essential.

Actor Model

@QuinnWilton, @zeeshanlakhani, and I went over this PR and the comments, and I think it's finally clicked for us that when you say "routing", it's not "inspired" by the actor model... this is the actor model. It's most clear in this diagram with cycles:

The major entities here are the auds (actors) and messages. You can have forever loops and the like. Not to over-focus on this one point, but this is both good and bad news for finding something that works for both of us.

The good news is that there's a ton of material on how to do this, including with capabilities. In fact, I think we can simplify the invocation model for simpler use cases like yours by drawing more directly from Capt'n Proto and CapTP.

The bad news is that having cycles is basically antithetical to the safety properties and abstraction that we're aiming for with IPVM. In short: IPVM is trying to do to compute what content addressing does to data, which is basically a new way of working. It may or may not be interesting, but here's a few resources that may be helpful in understanding the IPVM worldview are:

All of that said, the actor model is proabbly a really good fit for what you're doing at DAG House! It's way more "normal" than what we're trying to do with location independent computation in IPVM! W3S is more focused on networking and efficiency and less on declarative determinism, transactions, and effects. I think there may be some opportunity here to streamline even this PR and call it ucan-rpc or similar in the ucan-wg, and base it more closely on CapTP. IIRC "ucan-rpc" was your original name for the solutionspace. That's not to say that I "own the invocation namespace" or something, but maybe we just back away from that name because there's clearly several approaches to how to invoke something with a UCAN. We can go explore these different directions, learn, and come back together later.

Also, just to put an explicit offer on the table: the Fission team has a ton of experience with the actor model. It's a running joke internally that we kept hiring Erlang and Elixir people and having them write Haskell (and now Rust) 😅 We're more than happy to have our brains picked on these topics if that's ever helpful!

🙉 Possible Misunderstandings

It's possibly a moot point given the above, but just to explore the space: I think there's a couple misunderstandings about what's possible in the current UCAN Invocation PR.

CAR Batching

I think we may have talked past each other about being able to wrap up multiple Invocations in a CAR file. If you have a bunch of trasnactionally separate tasks, you can absolutely make a bunch of UCAN Invocations with separate audiences and signatures, stick them in a CAR file, send that to a work stealing queue and call it a day. CAR<invocationA, invocationB, invocationC>. Each can contain one or more transactions, and that's great! You can express the single-signature-per-task that I think you're looking for.

I would argue there are less signatures, because here you are not required to delegate to the executor and then wrap that into invocation.

In light of the above, the W3S PR here will always be $O(n)$ more signatures for IPVM, which is a result of some of the core differences between our use cases and overall runtime approach. IPVM will try as much as possible to not break up tasks for reasons of improving latency charactersistics, and making tracking transactions much simpler by limiting partial failure.

State Machines and Result

Reading the text and comments on this PR, something stuck out to Quinn and I think that I agree. We've both trained people in Elixir and this was something that a lot of people found surprising at first:

In the generic backbone behaviours like GenServer have a handful of states that you build all kind of stuff out of, including state machines, despite them not having that baked into the types. A huge percentage of the Erlang ecosystem is built on top of this low-level interface, adding abstraction at various layers, including state machaines. It's important for the kernel and runtime itself to understand what the status of the process itself is. As the name implies, this is very "generic" — the runtime doens't understand anything about your application's semantics, but it does understand process success, failure, how to do retries, exits, etc.

GenServer.handle_cast(request :: any(), state :: any()) ::
    {:noreply, new_state}
  | {:noreply, new_state, timeout() | :hibernate | {:continue, any()}}
  | {:stop, reason :: any(), new_state}

In the same way, having a promise interface may be sufficient to signal to a scheduler distributed runtime what to do with failure, while still leaving open the local APIs. In fact, the BEAM tends to use {:ok, result} and {:error, error} for a lot of messaging, which is roughly Result<T, E>.

These occur at different layers, though! Note how GenServer isn't sending these messages back, but rather is communicating to the callback: it's talking to the runtime, telling it that all is okay! GenServer needs to understand only a handful of generic messages and you use little more than that primitive build very diverse products like databases, P2P messaging, MQs, web servers, and so on, the Result type being generic means that your application-spcific logic can live at a different layer from what your automated system behaviour, surpervision restarts, tracing, etc can do. For Rust-style futures, Poll::Pending and Poll::Ready(val) would be tagged as Ok because the runtime doesn't need to restart or fail that process. You're both communicating to the other applciation process, but also to the runtime that everything is ok.

😩 UCAN Invocation PR Weaknesses

i.e. flaws in the current UCAN Invocation proposal

The main thing that exactly no one likes in the Invocations proposal is that promise pointers require a path to the relevant task. This means that you often (but not always!) have to load more than one block to find the relevant task, which kind of sucks. We're very likely to remove the names, because we want to force DAGs by using hashes, but this does still require being relative to the parent wrapper for us.

💔 IPVM Deal Breakers

Number of Signatures

This version has $O(n)$ more signatures than the UCAN Invocation for IPVM use cases. DAG House can use the UCAN Invocation spec to get disjoint tasks with separate signatures, IPVM can't roll up multiple tasks into a workflow using this PR without adding a linear number of signatures.

If we write an IPVM single task type with just the one signature, then we have no way of addressing promises using this proposal.

No Way To Express Globals or Atomicity

Unless we come up with a nested IPVM task type (see signature problems above), in this proposal we have no way to express global resource limits such as global timeouts or gas limits. The workaround that we could use is something like enqueuing a new job after each step and going back through the scheduler, which has multiple impacts on performance:

  1. Limits which tasks can be run in parallel
  2. Have to create (and often send) more invocations dynamically when today they're known statically
  3. No way to track what is a partial vs single failure (without putting that coordination on the programmer rather than supervision)
  4. Coordination costs (Amdahl's Law is harder to contend with)

🤝 Areas of Possible Alignment

Streamlined Single Tasks

I think that we pretty much agree about the vast majority of the fields. I think we may be able to streamline it further for your use case. Even if IPVM probably can't use that format, we can probably make calling out to such a system very straightforward.

Receipts

I think what goes into a receipt is mostly uncontroversial. There's some debate about if output effects should live and how we'd point at the specific task, but perhaps that can be generalized!

@expede
Copy link

expede commented Jan 15, 2023

I threw together a quick chart in case it's helpful. TL;DR your proposal with group signatures etc ticks a lot of these boxes that we couldn't agree on before — thank you! 🙏 Group signatures in particular seem like a broadly useful feature, and may be worth writing as an extension or technical report on Varsig to give others the pattern (sign a sorted IPLD array + take the CID and embed in other items).

Feature W3S IPVM
Invocation (AKA "Task")
Authed with UCAN
Promises
Receipts
Group Signature 🤷 Helpful for size but can live without
Per-Invocation Auth 🤷 Never actually use this layer, even if it's part of the spec, but doesn't hurt because we can just ignore it
Transactions and other group config 🙅 No use, just overhead
Per-Invocation audience 🙅 No use, just overhead, annoyingly opens up more ways to write bugs for this use case

The suggestion is that instead of us both including stuff that the other doesn't need or flagging a bunch of stuff as optional, we write a minimal spec that covers the stuff that we both need, and write further specs for the use-case specific stuff. We actually agree about most things! (I wasn't super granular about everything that we agree on about various parts of the fields in the Invocation etc but at this stage most of it overlaps I'm pretty sure!)

@Gozala
Copy link
Collaborator Author

Gozala commented Jan 16, 2023

I'm proposing that we align on Auth, Receipts, Promises, and what you're calling here InvocationID, but that DAG House then subtypes Invocation :< InvocationID (as you proposed in this PR) and IPVM subtypes IpvmTransaction :< Auth.

That sounds good, but I think there are some nuances that do not align.

We can both use the CID of InvocationID inside Receipts and as pointers to promises.

I see now what you meant, however I think it is a problem (for us anyway) we need receipt to point to the authorized invocation not just it's ID so they are fully verifiable without some additional context. If we leave out authorization they're not fully verifiable.

It does make it a bit harder for IPVM to look up tasks in a specific batch, but we can work around it.

I assume that it's probably related to my point above, if not could you please elaborate on what you mean ?

Is there a reason why you are hesitant to include auth in invocations ? Because that would address the problem for the receipts and I'm assuming make things bit easier for the IPVM, but I might be mistaken here.

For what it's worth in here #34 (comment) I've expanded Auth into something bit more generic which is why I called it Scope so it could provide "scope" for the invocation & I was thinking IPVM could extended with globals and other things like that. You could treat it as a "group / batch" that invocations are part of. That way you can look at individual task and from it's "scope" know which group / batch / transaction it is.

It's just a difference on our use cases of putting a bunch of tasks on a bus with an embedded signature group vs top-down groups of tasks. Converting between these is very simple, so IPVM could call into W3S trivially by putting signatures in each Invocation and sending them over, and W3S could call into IPVM by sending the Auth directly (i.e. omitting all of the optional global fields that don't exist in this spec but are subclassed in the IPVM specs).

Another problem is promises, they should point to authorized invocation (unlike IPVM we can't expect every invocation to return same exact result across all users). Unless scope / auth is part of it we run into incompatibility again

@Gozala
Copy link
Collaborator Author

Gozala commented Jan 16, 2023

@expede I think only two ways we could align receipts

  1. We stick auth / scope or whatever we want to call it in invocation and make receipt point to it.
  2. We point from receipt to a task (invocation without auth / scope) and add another field for scope / auth that way we have full verifiability.
  3. We create some intermediary structure that contains both task and scope and point receipts to it.

In terms of promises we also have couple of options

  1. We stick scope / auth on the invocation and make promise point to it.
  2. We complicate promise structure to add a scope / auth filed to it.
  3. We create some intermediary structure that contains both task and scope and point promises to it.
  • This pr https://github.com/web3-storage/specs/pull/35/files is basically option 1 for both receipts and promises.
  • Comment feat: invocation spec #34 (comment) is option 3 for both receipts and promises.
    I personally like this option most because:
    1. You have to generate CIDs for tasks anyway but they are only captured in .run field, and it feels like a cycle even if it is not really. Just making invocation an authorized task seem to make it more clear.
    2. Invocations end up been envelopes as opposed in-lining signatures, which avoids the canonicalization attacks you've written extensively about in varsig spec
    3. IMO it aligns well with the "IPLD Application" (One CID providing some context on how to interpret the second CID) thing we talked about in Lisbon. We could define a multiformat that simply encodes two CIDs into one in very compact form and a codec for it.
    4. I like idea of the scope more, I was thinking it could iss, aud and maybe even prf and what have you. It literary could be a scope captured by a closure (if we were to think of a task as a function) and there for provide secondary lookup table.

I am also open to exploring option no 2, if that seems preferable to you. If you see another way to align receipts and promises please let me know & we can explore them as well.

invocation.md Outdated

# Promise is a way to reference data from the output of the invocation
type Promise struct {
Await "<-"
Copy link

@expede expede Jan 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor point because it's just the characters. I initially really liked the pun on <-, but the overwhelming feedback internally at Fission is that it's difficult (or at least surprising and unfamialiar) to read, especially in combination with = and -> (which were in some version of this spec). Using words instead of symbols may be a good idea

Copy link
Collaborator Author

@Gozala Gozala Jan 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm not going to die on this hill, if <- controversial I'm fine with whatever. Although I do find argument bit moot, it's something in binary format that is nothing anyone will ever manually type or read, in fact (other than pun) name was chosen so it would be very unlikely to collide with any other key someone might actually have on their structs.

invocation.md Outdated
Comment on lines 271 to 274
type Selector union {
| Key String
| Path [String]
} kinded
Copy link

@expede expede Jan 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We started down this path in the UCAN Invocation spec, but reverted it because of the complexity. Going down this path, we really quickly ran into edge cases that made us want more power, exploring using jsonpath, and so on. This kind of thing is why lenses exist 😉 We decided instead to just use a step in the workflow that's not baked into the core spec instead (i.e. we can use Wasm or a task type to do e.g. data shaping when consuming data from a previous Invocation).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is some middle-ground between run WASM and just select ok / error, which is what I was trying to do here. JSONPath is pretty complex and I'm not surprised it did not pan out.

While I still kind of think that array of keys to traverse into is pragmatic and will get you far enough, I'm happy to let it go and maybe do it at other layer.

invocation.md Outdated
Comment on lines 276 to 279
type InvocationReference union {
| Invocation<Any>
| &Invocation<Any>
} representation kinded
Copy link

@expede expede Jan 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no problem with this, but can you expand on why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you're asking here why allow inline invocations, no reason other than trying to keep some of the behavior from original invocation spec. I'm happy to get rid off it and use &Invocation instead.

invocation.md Outdated
nnc optional String
nbf optional Int

s Varsig
Copy link

@expede expede Jan 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm contractually obligated to note that IPVM can't align on forcing a signature here, but this spec may not be totally up to date

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After our talk I have created this PR to capture what I thought we agreed on https://github.com/web3-storage/specs/pull/35/files

I have not updated this PR as I was hoping we could align on remaining pieces before I go ahead and update everything else.

invocation.md Outdated
"v": "0.1.0",
"iss": "did:key:z6Mkqa4oY9Z5Pf5tUcjLHLUsDjKwMC95HGXdE1j22jkbhz6r",
"aud": "did:web:ucan.run",
"with": "javascript:(data) => data",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

invocation.md Outdated
Comment on lines 559 to 574
{
"iss": "did:key:z6Mkqa4oY9Z5Pf5tUcjLHLUsDjKwMC95HGXdE1j22jkbhz6r",
"aud": "did:web:ucan.run",
"with": "https://example.com/blog/posts",
"do": "crud/create",
"input": {
"headers": {
"content-type": "application/json"
},
"payload": {
"title": "How UCAN Tasks Changed My Life",
"body": "This is the story of how one spec changed everything...",
"topics": ["authz", "journal"],
"draft": true
}
},
Copy link

@expede expede Jan 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is here purely for IPVM's benefit, I think we can get rid of this nested format completely. If we align on putting Auth in the canonicalization for Promises, IIRC the reason for this that you had initially used this as a solve for probably just goes away.

At least for IPVM, we're a pretty hard no on this layout -- or at least it would take a lot of convincing. It makes it very difficult for us to reuse this computation elsewhere, complicates the parser, introduces multiple ways to call a secondary computation (inline and by CID), adds error cases that are not there otherwise, and has nested signatures.

(That doesn't mean that DAG House can't use this anyways because it's "just" a type of invocation, but making this a requirement in the spec is a sticking point for finding alignment IMO)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was just trying to show that you could have named tasks without building batches into spec, if links are fine for IPVM I'd gladly remove inlining.

invocation.md Outdated
meta { String: Any }

# Principal that issued this receipt
iss Principal
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you infer this from the underlying UCAN on the Invocation? What if they don't match?

invocation.md Outdated
type Status<T, X, P> union {
| T ("ok") # Success
| X ("error") # Failure
| P ("pending") # Pending
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where would this be used? In an RPC call?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a conversation about it other day where you've convinced me to just stick to ok / error. I have not updated PR to reflect this because my understanding was we wanted to align on group auth stuff first which is sketched in https://github.com/web3-storage/specs/pull/35/files

@expede
Copy link

expede commented Jan 25, 2023

@Gozala Just looking at this stuff now!

Invocations end up been envelopes as opposed in-lining signatures, which avoids the canonicalization attacks you've written extensively about in varsig spec

Yeah, that's definitely nice! I think there's a lot of use cases for your design for signature envelopes with multiple payloads where you check for the CID of the target paylaod it's embedded in 💯 Once Varsig merges, we should consider at least writing a technical report suggesting it as a pattern.

Is there a reason why you are hesitant to include auth in invocations ?

I had written something else here before, but let me take a different approach:

I understand that for DAG House the requirement is to have signatures per-invocation because you want to be able to toss them in a work-stealing queue.

For IPVM, we would not need to submit individual invocations to the runtime at all, but rather send the Auth object itself (i.e. "a layer up"). Is the way you see IPVM using the embedded auth field for canonicalization in the Receipts, or somewhere else? If that's the only place where we'd be required to support them, then that works for me 👍

and it feels like a cycle even if it is not really. Just making invocation an authorized task seem to make it more clear.

Yeah agreed it's not a formal cycle, since it would require a hash cycle (which is "impractical"). Information theoretically, it does include redundant information, so the arrows form a loop when viewed one way (like my diagram above), but the other way to see this is something like this...

flowchart TB
  AuthedInvocation --> Invocation
  AuthedInvocation --> Auth
  Auth --> Invocation
Loading

...which still has redundant info, but doesn't appear as a cycle.

@expede
Copy link

expede commented Jan 25, 2023

  1. We stick scope / auth on the invocation and make promise point to it.
  2. We complicate promise structure to add a scope / auth filed to it.
  3. We create some intermediary structure that contains both task and scope and point promises to it.

TL;DR I think that (1) works for me 🎉

However, I'm not 100% sure that I follow, so let me try to rephrase:

(1)

The Scope object contains signatures and other metadata. In the simplest of cases, it is just an array that's signed over, but is open to being more expressive (e.g. globals, metadata, etc).

We need a canonical representation of Invocations, so that we can address it by DID.

If we include a pointer to the Scope in the Invocation's canonical CID, then we have away to address Invocations uniquely and with authentication, but without having to sign each of them independently or nest encryption.

I'm pretty sure that this works for me 👍 It's essentially a minor change in where this information lives versus the current IPVM-focused proposal. I like that it gives us a constant CID across all use cases: Promises, Receipts, etc.

(2)

I think this one is a different expression of what's in the current Invocation proposal, where you end up pointing at the Scope and the specific Invocation inside it. If so, I think (1) is better 👍

(3)

I think this is a variant of (2), because that data lives in the Promise.

invocation.md Outdated

# Proof that `iss` was authorized to by invocation `aud` to issue
# receipts. Can be omitted if `job.aud === this.iss`
prf [&UCAN] implicit ([])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually same for this, though the spec may not be entirely up to date: won't you be able to find this in the Invocation? Something like receipt.job.scope.prf?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So both iss and prf are to support the way our system is designed. Specifically client will sent invocation to did:web:web3.storage but receipt will be issued by did:key:zKeyInJanuary and prf will include delegation chain did:web:web3.storage to did:key:zKeyInJanuary.

Also note that delegation will be pre-arranged as opposed to created on invocation, which is why we'd like to capture proof chain that actor that issued receipt was acting on behalf of executor.

prf is already optional and I don't mind making iss optional either in which case it would imply to be the aud of the invocation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't you be able to find this in the Invocation? Something like receipt.job.scope.prf?

You won't be able to find it in the invocation because client submitting invocation will not know what keys are in rotation or if we have a pool of workers that pick up task which one of would.

@Gozala
Copy link
Collaborator Author

Gozala commented Jan 25, 2023

(1)
The Scope object contains signatures and other metadata. In the simplest of cases, it is just an array that's signed over, but is open to being more expressive (e.g. globals, metadata, etc).

💯

We need a canonical representation of Invocations, so that we can address it by DID.

I think you meant CID here, in which case exactly that

If we include a pointer to the Scope in the Invocation's canonical CID, then we have away to address Invocations uniquely and with authentication, but without having to sign each of them independently or nest encryption.

Exactly

I'm pretty sure that this works for me 👍 It's essentially a minor change in where this information lives versus the current IPVM-focused proposal. I like that it gives us a constant CID across all use cases: Promises, Receipts, etc.

Perfect! In that case I can update this PR to try to reflect this and address other pieces of feedback

(2)

I think this one is a different expression of what's in the current Invocation proposal, where you end up pointing at the Scope and the specific Invocation inside it. If so, I think (1) is better 👍

Cool I did not like this option either, but this sketch left me under impression you didn't wanted to include scope in canonical invocation so I tried to be creative

(3)

I think this is a variant of (2), because that data lives in the Promise.

Not exactly what I had in mind. In my head (1) just adds scope filed to the Invocation and we call it a day. In (3) we instead define structure for the Task, Scope and an intermediary (Invocation) that puts two together. Unlike (1) Task isn't just Invocation without a scope. I kind of like it because we do need to refer to tasks in the signatures so why not just reuse that as opposed to extends it with scope field.

If (1) seems better I'm ok with doing that instead.

@Gozala
Copy link
Collaborator Author

Gozala commented Jan 25, 2023

@expede #34 (comment) this was basically option (3) with additional nuance that I wanted to make sure we have considered, specifically it defined invocation as a struct with single field and desired payload (I realize name proved controversial, so lets say it's some unique descriptive name instead of =):

type Invocation union {
   | Closure   "="
} representation keyed

It is equivalent of multi(thing) in IPFS parley, it's self describing via key prefix. It means in a different context I could define another union where invocation is just one of the variants. Just like other multihings it is possible to infer what is it from it's bytes as opposed to from the context. I think overhead is negligible while potential is high, which is why I think it is worth considering regardless if we choose (1) or (3).

invocation.md Outdated Show resolved Hide resolved
invocation.md Outdated Show resolved Hide resolved
invocation.md Outdated Show resolved Hide resolved
invocation.md Outdated Show resolved Hide resolved
@Gozala Gozala requested a review from expede January 30, 2023 07:12
@Gozala
Copy link
Collaborator Author

Gozala commented Jan 30, 2023

@expede I have update the spec, could you please take another look and let me know what you think ?

Also could you please let me know if you had a chance to evaluate idea of handling batching using special capability like
system/transact which we've discussed last week. If that is going to work for IPVM I would go ahead and simplify this spec further

@Gozala
Copy link
Collaborator Author

Gozala commented Feb 7, 2023

Closing this in favor of ucan-wg/invocation#10

@Gozala Gozala closed this Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants