Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.33 - Add pebble-based storage layer files #6207

Closed
wants to merge 3 commits into from

Conversation

zhangchiqing
Copy link
Member

@zhangchiqing zhangchiqing commented Jul 11, 2024

This PR cherry-picked changes from #6197.

The changes picked are unused files or changes that won't change the existing behavior. Merging these changes in order to minimize the code changes to be maintained on a feature branch.

@zhangchiqing zhangchiqing marked this pull request as ready for review July 11, 2024 21:05
return irrecoverable.NewExceptionf("could not load data: %w", err)
}
*keyExists = true
defer closer.Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets be mindful that golang's defer have a small but non-negligible performance overhead (to the best of my knowledge ... which might already be outdated). We have a few instances in the code where it is trivially apparent how to avoid the defer, like this one:

Suggested change
defer closer.Close()
closer.Close()

Copy link
Member

@jordanschalm jordanschalm Aug 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was curious so I benchmarked it: https://gist.github.com/jordanschalm/d4d268a39c6b2197a0c0882a4ca4dab5.

It seems like it has improved a lot in the past few years. About 5 years ago, it was 15x slower (~50ns) in a minimal benchmark. Now it's about 2x slower (2ns).

Agree with removing it in this case, but the performance overhead is negligible now for almost any use-case.

Comment on lines 94 to +96
defer closer.Close()

err = msgpack.Unmarshal(val, &sc)
err = msgpack.Unmarshal(val, sc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer can be removed by shifting the Close call below line 96

Suggested change
defer closer.Close()
err = msgpack.Unmarshal(val, &sc)
err = msgpack.Unmarshal(val, sc)
err = msgpack.Unmarshal(val, sc)
closer.Close()


// checkFunc is called during key iteration through the badger DB in order to
// check whether we should process the given key-value pair. It can be used to
// avoid loading the value if its not of interest, as well as storing the key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// avoid loading the value if its not of interest, as well as storing the key
// avoid loading the value if it's not of interest, as well as storing the key

@@ -10,14 +12,71 @@ import (
"github.com/onflow/flow-go/storage"
)

type ReaderBatchWriter struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I would very appreciate at least minimal godoc for the methods. While not exported, most of the methods are called by other structs within the pebble operation package. Hence, I think documentation would be very useful.

Suggested change
type ReaderBatchWriter struct {
// ReaderBatchWriter is a batch of storage writes that also exposes functionality (see method
// `IndexedBatch`) for reading arbitrary keys as if the batch write had already been applied to
// the database. Note that this requires indexing and tracking the pending writes, hence adding
// pending writes to a `ReaderBatchWriter` is significantly slower than inserting into a
// non-indexed batch. Only use an indexed batch if you require reading from it.
type ReaderBatchWriter struct {

LowerBound: prefix,
UpperBound: append(prefix, ffBytes...),
})

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

// }
func noStore[K comparable, V any](_ K, _ V) func(pebble.Writer) error {
return func(pebble.Writer) error {
type storeFunc[K comparable, V any] func(key K, val V) func(storage.PebbleReaderBatchWriter) error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️
Pebble.Batch includes the following in its documentation

Batch implements the Reader interface, but only an indexed batch supports reading [...]
The batch index is implemented via an skiplist (internal/batchskl). While the skiplist implementation is very fast, inserting into an indexed batch is significantly slower than inserting into a non-indexed batch. Only use an indexed batch if you require reading from it.

Essentially for all our storage abstractions we use indexed batches:

batch: db.NewIndexedBatch(),

so we are always paying the performance cost for supporting reads despite rarely using that feature.

Comment on lines 16 to 17
// BatchStorage serves as an abstraction over batch storage, adding ability to add ability to add extra
// callbacks which fire after the batch is successfully flushed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment is very unlcear due to broken sentence structure and strange repetitions.

AddCallback(func())
}

func OnlyWriter(fn func(pebble.Writer) error) func(PebbleReaderBatchWriter) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this to onlyWrites

}

// BatchStorage serves as an abstraction over batch storage, adding ability to add ability to add extra
// callbacks which fire after the batch is successfully flushed.
type BatchStorage interface {
GetWriter() *badger.WriteBatch
GetWriter() BatchWriter
GetReader() Reader
Copy link
Member

@AlexHentschel AlexHentschel Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think method GetReader() is used anywhere (at least my IDE claims this). Can we remove it?

}

// Event retrieval does not guarantee any order,
// Hence, we a sort the events for comparing the expected and actual events.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Suggested change
// Hence, we a sort the events for comparing the expected and actual events.
// Hence, we sort the events for comparing the expected and actual events.

}

// index the transaction IDs within the collection
txIDs := payload.Collection.Light().Transactions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already computed payload.Collection.Light() above. As this has a non-negligible computational cost, we should not repeat it here,

Comment on lines +162 to +176
// cluster payloads only contain a single collection, allow duplicates,
// because it is valid for two competing forks to have the same payload.
light := payload.Collection.Light()
err := operation.InsertCollection(&light)(tx)
if err != nil {
return fmt.Errorf("could not insert payload collection: %w", err)
}

// insert constituent transactions
for _, colTx := range payload.Collection.Transactions {
err = operation.InsertTransaction(colTx.ID(), colTx)(tx)
if err != nil {
return fmt.Errorf("could not insert payload transaction: %w", err)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that payload.Collection.Light() already computes the IDs of the transactions. This has non-negligible cost due to serialization and potentially hashing somewhat larger data structures.

Lets not needlessly repeat that here. The colTx.ID() in line 172 is already pre-computed in light

Comment on lines +34 to +40
func InsertExecutedBlock(blockID flow.Identifier) func(pebble.Writer) error {
return insert(makePrefix(codeExecutedBlock), blockID)
}

func RetrieveExecutedBlock(blockID *flow.Identifier) func(pebble.Reader) error {
return retrieve(makePrefix(codeExecutedBlock), blockID)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear what this is used for due to lacking documentation. Given that we are just storing some blockID, this seems to be vulnerable to stale writes.

@zhangchiqing
Copy link
Member Author

Close in favor of #6197

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants