Skip to content

Conversation

@valentinewallace
Copy link
Owner

We have an overarching goal of getting rid of ChannelManager persistence and
rebuilding the whole ChannelManager from existing ChannelMonitors, due to
issues when the two structs are out-of-sync on restart. The main issue that can
arise is channel force closure.

Here we start this process by rebuilding ChannelManager::outbound_payments from
the ChannelMonitors.

This serialization was in place for compatibility with LDK 0.0.101, which
should no longer be necessary.
This makes it easier to see what exactly is failing upon test failure.
As part of debugging this test while implementing rebuilding
ChannelManager::outbound_payments from Channel{Monitor}s, I added some extra
checks in this test. They seemed useful for improving readability since this
test follows 3 interleaved payments, so thought it was worth committing the
changes.
We have an overarching goal of getting rid of ChannelManager persistence and
rebuilding the whole ChannelManager from existing ChannelMonitors, due to
issues when the two structs are out-of-sync on restart. The main issue that can
arise is channel force closure.

Here we start this process by rebuilding ChannelManager::outbound_payments from
the ChannelMonitors.
@valentinewallace
Copy link
Owner Author

@joostjager not sure how to request you as a reviewer on my fork so tagging you manually for now


write_tlv_fields!(writer, {
(1, pending_outbound_payments_no_retry, required),
// TLV 1 used to be used for pending_outbound_payments_no_retry

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message:

which should no longer be necessary.

Explain 'should'?


#[test]
fn test_dup_htlc_onchain_doesnt_fail_on_reload() {
fn test_dup_htlc_onchain_doesnt_fail_on_reload_0() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunate that rust doesn't have sub-tests. The numbering isn't ideal, but trying to come up with something descriptive probably isn't going to work either...

) where
F: Fn(usize, &ChannelMonitorUpdate) -> bool,
{
if let Some(chain_monitor) = node.chain_monitor() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let chain_monitor = node.chain_monitor().expect("Missing chain monitor"); ?

&nodes[0],
chan_id,
2,
|upd_idx, upd: &ChannelMonitorUpdate| {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it simpler to just return the updates, and do this matching afterwards instead of using the closure?

(3, pending_outbound_payments, option),
// We now rebuild `OutboundPayments` from the `ChannelMonitor`s as part of getting rid of
// `ChannelManager` persistence.
(3, _pending_outbound_payments, option),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be useful to still read this and do a debug comparison with the reconstructed data? To maybe catch things that aren't covered in tests.

);
}
}
for (channel_id, monitor) in args.channel_monitors.iter() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context question: the two loops through args.channel_monitors can't be combined because in the first pass the pending_outbounds need to be reconstructed completely?

// `pending_intercepted_htlcs`, we were apparently not persisted after
// the monitor was when forwarding the payment.
decode_update_add_htlcs.retain(|src_outb_alias, update_add_htlcs| {
for (htlc_source, (htlc, preimage_opt)) in monitor.get_all_current_outbound_htlcs()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the is_channel_closed check, I think also more HTLCSource::PreviousHopData htlcs are processed, but this isn't necessary for payment reconstruction?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add a few lines of comment describing the purposes of the two loops.

bolt12_invoice,
session_priv,
path,
is_channel_closed,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First the claim was only executed for closed channels, but now for all htlcs. Is that the intention?

fee_paid_msat,
bolt12_invoice: bolt12_invoice,
}, ev_completion_action.take()));
let duplicate_sent_event = pending_events.iter().find(|(ev, _)| match ev {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment how duplicate sent event can happen?

}
// Because we reload outbound payments from the existing `ChannelMonitor`s, and no HTLC exists
// in those monitors after restart, we will currently forget about the payment and fail to
// generate a `PaymentFailed` event.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What did we say about this? Wasn't this undesirable?

@joostjager
Copy link

Is it really not possible to aim for merging this PR to main?

@valentinewallace
Copy link
Owner Author

Is it really not possible to aim for merging this PR to main?

Feel free to respond in the discord thread with Matt!

@joostjager
Copy link

Summarizing that thread here:

@TheBlueMatt's point is that reconstructing outbound payments only from the monitors isn’t a strict improvement to LDK. It’s more of a step toward a larger goal. That eventual improvement would likely require a separate store for unfinished payments, so making this change now (especially without a cfg flag) might be fine as an intermediate step but not a clear win on its own.

With a cfg flag, we wouldn't need to worry about functional regression, and could still avoid merging to a dev branch. For this PR, it is a question of how easy the new and old code can be switched using a cfg flag without too much copy paste? I have the same question for lightningdevkit#4151.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants