New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

v0.33 - Add pebble-based storage layer files #6207

Closed

zhangchiqing wants to merge 3 commits into v0.33 from leo/v0.33-pebble-storage-layer

Member

zhangchiqing commented Jul 11, 2024 •

edited

Loading

This PR cherry-picked changes from #6197.

The changes picked are unused files or changes that won't change the existing behavior. Merging these changes in order to minimize the code changes to be maintained on a feature branch.

zhangchiqing added 2 commits

July 11, 2024 13:19


          add files for pebble based protocol database

6469b9f


          fix access test

1d45907

zhangchiqing marked this pull request as ready for review

July 11, 2024 21:05

zhangchiqing requested a review from peterargue as a code owner

July 11, 2024 21:05

zhangchiqing requested review from jordanschalm and AlexHentschel

July 11, 2024 21:05


          fix results API

5d1ae72

AlexHentschel reviewed

View reviewed changes

storage/pebble/operation/common.go

+              			return irrecoverable.NewExceptionf("could not load data: %w", err)
+              		}
+              		*keyExists = true
+              		defer closer.Close()

Member

AlexHentschel Aug 9, 2024

lets be mindful that golang's defer have a small but non-negligible performance overhead (to the best of my knowledge ... which might already be outdated). We have a few instances in the code where it is trivially apparent how to avoid the defer, like this one:

Suggested change

      
            		defer closer.Close()
          
            		closer.Close()

Member

jordanschalm Aug 10, 2024 •

edited

Loading

I was curious so I benchmarked it: https://gist.github.com/jordanschalm/d4d268a39c6b2197a0c0882a4ca4dab5.

It seems like it has improved a lot in the past few years. About 5 years ago, it was 15x slower (~50ns) in a minimal benchmark. Now it's about 2x slower (2ns).

Agree with removing it in this case, but the performance overhead is negligible now for almost any use-case.

storage/pebble/operation/common.go

Comment on lines 94 to +96

               		defer closer.Close()
-              		err = msgpack.Unmarshal(val, &sc)
+              		err = msgpack.Unmarshal(val, sc)

Member

AlexHentschel Aug 9, 2024

defer can be removed by shifting the Close call below line 96

Suggested change

      
            		defer closer.Close()
          
            		err = msgpack.Unmarshal(val, &sc)
          
            		err = msgpack.Unmarshal(val, sc)
          
            		err = msgpack.Unmarshal(val, sc)
          
            		closer.Close()

storage/pebble/operation/common.go

+              // checkFunc is called during key iteration through the badger DB in order to
+              // check whether we should process the given key-value pair. It can be used to
+              // avoid loading the value if its not of interest, as well as storing the key

Member

AlexHentschel Aug 9, 2024

Suggested change

      
            // avoid loading the value if its not of interest, as well as storing the key
          
            // avoid loading the value if it's not of interest, as well as storing the key

storage/pebble/operation/common.go

@@ @@ -10,14 +12,71 @@ import ( @@
               	"github.com/onflow/flow-go/storage"
               )
+              type ReaderBatchWriter struct {

Member

AlexHentschel Aug 9, 2024

Generally, I would very appreciate at least minimal godoc for the methods. While not exported, most of the methods are called by other structs within the pebble operation package. Hence, I think documentation would be very useful.

Suggested change

      
            type ReaderBatchWriter struct {
          
            // ReaderBatchWriter is a batch of storage writes that also exposes functionality (see method
          
            // `IndexedBatch`) for reading arbitrary keys as if the batch write had already been applied to
          
            // the database. Note that this requires indexing and tracking the pending writes, hence adding
          
            // pending writes to a `ReaderBatchWriter` is significantly slower than inserting into a
          
            // non-indexed batch. Only use an indexed batch if you require reading from it.
          
            type ReaderBatchWriter struct {

storage/pebble/operation/common.go

+              			LowerBound: prefix,
+              			UpperBound: append(prefix, ffBytes...),
+              		})

Member

AlexHentschel Aug 9, 2024

Suggested change

storage/pebble/value_cache.go

    
              //	}

              func noStore[K comparable, V any](_ K, _ V) func(pebble.Writer) error {

              	return func(pebble.Writer) error {

              type storeFunc[K comparable, V any] func(key K, val V) func(storage.PebbleReaderBatchWriter) error

Member

AlexHentschel Aug 9, 2024

⚠️
Pebble.Batch includes the following in its documentation

Batch implements the Reader interface, but only an indexed batch supports reading [...]
The batch index is implemented via an skiplist (internal/batchskl). While the skiplist implementation is very fast, inserting into an indexed batch is significantly slower than inserting into a non-indexed batch. Only use an indexed batch if you require reading from it.

Essentially for all our storage abstractions we use indexed batches:

flow-go/storage/pebble/operation/common.go

Line 50 in 5d1ae72

batch: db.NewIndexedBatch(),

so we are always paying the performance cost for supporting reads despite rarely using that feature.

storage/batch.go

Comment on lines 16 to 17

		// BatchStorage serves as an abstraction over batch storage, adding ability to add ability to add extra
		// callbacks which fire after the batch is successfully flushed.

Member

AlexHentschel Aug 9, 2024

comment is very unlcear due to broken sentence structure and strange repetitions.

storage/batch.go

+              	AddCallback(func())
+              }
+              func OnlyWriter(fn func(pebble.Writer) error) func(PebbleReaderBatchWriter) error {

Member

AlexHentschel Aug 9, 2024

I would rename this to onlyWrites

storage/batch.go

               }
               // BatchStorage serves as an abstraction over batch storage, adding ability to add ability to add extra
               // callbacks which fire after the batch is successfully flushed.
               type BatchStorage interface {
-              	GetWriter() *badger.WriteBatch
+              	GetWriter() BatchWriter
+              	GetReader() Reader

Member

AlexHentschel Aug 9, 2024 •

edited

Loading

don't think method GetReader() is used anywhere (at least my IDE claims this). Can we remove it?

storage/pebble/operation/events_test.go

+              }
+              // Event retrieval does not guarantee any order,
+              // Hence, we a sort the events for comparing the expected and actual events.

Member

AlexHentschel Aug 10, 2024

typo

Suggested change

      
            // Hence, we a sort the events for comparing the expected and actual events.
          
            // Hence, we sort the events for comparing the expected and actual events.

AlexHentschel mentioned this pull request

🚧[under construction]🚧 Pebble Database Backend #6337

Closed

3 tasks

AlexHentschel reviewed

View reviewed changes

storage/pebble/procedure/cluster.go

    
              		}

              		// index the transaction IDs within the collection

              		txIDs := payload.Collection.Light().Transactions

Member

AlexHentschel Aug 14, 2024

we already computed payload.Collection.Light() above. As this has a non-negligible computational cost, we should not repeat it here,

storage/pebble/procedure/cluster.go

Comment on lines +162 to +176

    
              		// cluster payloads only contain a single collection, allow duplicates,

              		// because it is valid for two competing forks to have the same payload.

              		light := payload.Collection.Light()

              		err := operation.InsertCollection(&light)(tx)

              		if err != nil {

              			return fmt.Errorf("could not insert payload collection: %w", err)

              		}

              		// insert constituent transactions

              		for _, colTx := range payload.Collection.Transactions {

              			err = operation.InsertTransaction(colTx.ID(), colTx)(tx)

              			if err != nil {

              				return fmt.Errorf("could not insert payload transaction: %w", err)

              			}

              		}

Member

AlexHentschel Aug 14, 2024

note that payload.Collection.Light() already computes the IDs of the transactions. This has non-negligible cost due to serialization and potentially hashing somewhat larger data structures.

Lets not needlessly repeat that here. The colTx.ID() in line 172 is already pre-computed in light

AlexHentschel reviewed

View reviewed changes

storage/pebble/operation/headers.go

Comment on lines +34 to +40

+              func InsertExecutedBlock(blockID flow.Identifier) func(pebble.Writer) error {
+              	return insert(makePrefix(codeExecutedBlock), blockID)
+              }
+              func RetrieveExecutedBlock(blockID *flow.Identifier) func(pebble.Reader) error {
+              	return retrieve(makePrefix(codeExecutedBlock), blockID)
+              }

Member

AlexHentschel Aug 14, 2024

It is unclear what this is used for due to lacking documentation. Given that we are just storing some blockID, this seems to be vulnerable to stale writes.

Member Author

zhangchiqing commented Sep 13, 2024

Close in favor of #6197

zhangchiqing closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet