Skip to content

Conversation

@starius
Copy link
Collaborator

@starius starius commented Feb 26, 2025

Go 1.24 apparently improved its race detector, so regular go test -race started discovering all of these. Now it passes. Some changes were needed not only in unit tests to make accesses of batcher and batch fields thread safe.

Pull Request Checklist

  • Update release_notes.md if your PR contains major features, breaking changes or bugfixes

Copy link
Member

@bhandras bhandras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes! Looks good, just a few questions.


// Check if the sweep is already in a batch. If that is the case, we
// provide the sweep to that batch and return.
for _, batch := range b.batches {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, now I got it :)

publishErrorHandler: bk.publishErrorHandler,
purger: bk.purger,
store: bk.store,
log: bk.log,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate why this caused a race?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests replace the logger with wrappedLogger (defined in sweep_batcher_test.go) to wait for a particular log message.

// This is used only in unit tests! The reason to have this is to
// avoid data races in require.Eventually calls running in parallel
// to the event loop. See method testRunInEventLoop().
testReqs chan *testRequest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could avoid this by instead using checkpoints in within mocks?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the checks that use this feature access batcher.batches and fields of a batch. The fields (e.g. fee rate, the list of sweeps) are updated in the main code, not in the mocks.

// using btclog.
func UseLogger(logger btclog.Logger) {
log = logger
log_.Store(&logger)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we only really call UseLogger once it should not race iiuc?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call it in runTests function in each test. It increases logging to include debug messages. I caught a race between tests: in one test a sweepbatcher was shutting down and logging something, while another test was calling UseLogger.

@starius starius marked this pull request as ready for review February 26, 2025 13:24
@starius starius requested a review from bhandras February 26, 2025 14:33
Copy link
Member

@bhandras bhandras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉

// we just return early.
if sweep == nil {
b.log.Infof("the sweep is nil")
b.log().Infof("the sweep is nil")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid the extra step of calling b.log() we could create log wrappers in the form of:

func (b *batch) Infof(format string, args ...interface{}) {
	b.log().Infof(...)
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks!


// Eventually request will be consumed and a new batch will spin up.
require.Eventually(t, func() bool {
return len(getBatches(ctx, batcher)) == 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively we could add a Batcher.NumBatches() and a batch.NumSweeps() so most of these repeating checks are easier to write.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could also add a batch.Sweeps() which would return the swap hashes for the sweeps. So then we could completely remove the intrusive check.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added private methods Batcher.numBatches(ctx), batch.numSweeps(ctx) and batch.snapshot(ctx) and managed to get rid of the majority of testRunInEventLoop calls.

sweep, err := b.fetchSweep(runCtx, sweepReq)
if err != nil {
log.Warnf("fetchSweep failed: %v.", err)
log().Warnf("fetchSweep failed: %v.", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as before, although there it's just one function call, but the wrapped log might be a bit nicer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added private functions debuf, infof, warnf, and errorf.

# If you change this value, please change it in the following files as well:
# /Dockerfile
GO_VERSION: 1.21.10
GO_VERSION: 1.24.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@starius starius force-pushed the fix-unit-test-races branch 2 times, most recently from 48b25bd to 51196e7 Compare February 28, 2025 18:23
Copy link
Collaborator

@hieblmi hieblmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ✅ Thanks for the fixes, only a few nits.

// we just lazily delete it and continue our scan.
if batch.isComplete() {
delete(b.batches, batch.id)
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove the continue here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!


// Check if the sweep is already in a batch. If that is the case, we
// provide the sweep to that batch and return.
for _, batch := range b.batches {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, now I got it :)

b.log.Warnf("Method timeout() returned %v. Number of"+
" sweeps: %d. It may be an empty batch.",
b.Warnf("Method timeout() returned %v. Number of "+
"sweeps: %d. It may be an empty batch.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls clarify, is there another option to being an empty batch here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the following comment:

        if timeout <= 0 {
+               // This may happen if the batch is empty or if SweepInfo.Timeout
+               // is not set, may be possible in tests or if there is a bug.
                b.Warnf("Method timeout() returned %v. Number of "+
                        "sweeps: %d. It may be an empty batch.",

}

b.log.Infof("initializing rbf fee rate for conf target=%v",
b.Infof("initializing rbf fee rate for conf target=%v",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: newline after this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


// Make sure the batch has proper batchConfTarget.
return batch.cfg.batchConfTarget == 123
return snapshot.cfg.batchConfTarget == 123
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

place 123 into var...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// Wait for tx to be published.
<-lnd.TxPublishChannel
batch := getOnlyBatch(t, ctx, batcher).snapshot(ctx)
require.Equal(t, int32(123), batch.cfg.batchConfTarget)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and use it here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


// Make sure the batch has proper batchConfTarget.
return batch.cfg.batchConfTarget == 123
return snapshot.cfg.batchConfTarget == 123
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

require.Equal(t, feeRateMedium, batch.rbfCache.FeeRate)
batch := getOnlyBatch(t, ctx, batcher)
snapshot := batch.snapshot(ctx)
require.Equal(t, 1, len(snapshot.sweeps))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

require.Len

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

require.Len(t, batch.sweeps, 2)
require.Equal(t, feeRateMedium, batch.rbfCache.FeeRate)
snapshot = batch.snapshot(ctx)
require.Equal(t, 2, len(snapshot.sweeps))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

require.Len

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

starius added 8 commits March 4, 2025 10:25
It should use the c variable passed into the lambda, not the parent t.
It should use assert, not require package.
Previously, if a completed batch was visited after a batch to which the
sweep was added, it was not deleted because the function returned early.

This has been separated into two loops: the first one removes completed batches,
and the second one adds the sweep to a batch.
This is needed to fix crashes in unit tests under -race.
Several structures were accessed without protection causing crashes under -race.
The code inside require.Eventually runs in parallel with the event loops
of the batcher and its batches. Accessing fields of the batcher and batches
must be done within an event loop.

To address this, testRunInEventLoop methods were added to the Batcher and batch
types. Unit tests were then rewritten to use this approach when accessing
batcher and batch fields.

Additionally, in many cases, receive operations from RegisterSpendChannel
were moved before require.Eventually. This prevents testRunInEventLoop from
getting stuck in an event loop while blocked on a RegisterSpendChannel send
operation.
Try to catch more race conditions in unit tests.
@starius starius force-pushed the fix-unit-test-races branch from 51196e7 to 60f4906 Compare March 4, 2025 16:45
@starius starius merged commit 800f0e0 into lightninglabs:master Mar 4, 2025
4 checks passed
@starius starius deleted the fix-unit-test-races branch March 4, 2025 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants