Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve integration testing client (CodexClient) and json serialization #514

Merged
merged 3 commits into from
Sep 1, 2023

Conversation

emizzle
Copy link
Contributor

@emizzle emizzle commented Aug 14, 2023

The current client used for integration testing against the REST endpoints for Codex accepts and passes primitive types. This caused a hard to diagnose bug where a uint was not being deserialized correctly.

In addition, the json de/serializing done between the CodexClient and REST client was not easy to read and was not tested.

These changes bring non-primitive types to most of the CodexClient functions, allowing us to lean on the compiler to ensure we're providing correct typings when doing integration testing. More importantly, a json de/serialization util was created as a drop-in replacement for the std/json lib, with the main two differences being that field serialization is opt-in (instead of opt-out as in the case of json_serialization) and serialization errors are captured and logged, making debugging serialization issues much easier.

Note

This PR is based on #513

) =
expectJsonKind(expectedType, {expectedKind}, json)

proc fromJson*(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use nim-json-serialization?

Copy link
Contributor Author

@emizzle emizzle Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was my first choice so that we could keep consistency across the code base, but opted out of it for a few reasons. nim-json-serialization is so hard to debug that I consistently spend far too much time trying to implement anything. The errors almost always get swallowed and the application crashes in inconspicuous and hard-to-debug ways. It also doesn't help that most of its guts are written in macros, so even attempting to quickly output the serialization error just to see what it was, takes ages to figure out where that is even possible (and even then it's just not that easy). Having a serialization utility inside of codex (or could even be its own lib) that is not overtly magical and not written entirely in macros allows us the ability to debug, to not swallow errors, and to do things like having opt-in field serialization (versus opt-out like nim-json-serialization) was generally the reasoning behind not using nim-json-serialization. More discussion around these topics can be found here: https://discord.com/channels/613988663034118151/616299964242460682/789378950484852756.

Additionally, for the life of me, I could not override serialisation of UInt256 (so it was not hex-encoded) using nim-json-serialization inside of codex/utils/json or codex/rest/json. I could override deserialization (readValue) but not serialization (writeValue), which I really think may boil down to a dependency issue. I'd like to maybe set some time up with you to see if we can work through that, as it was really not obvious and honestly quite frustrating.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! That's really helpful to get context about.

Ah, yah I can see that. I ran into some of the difficulty in debugging nim-json-serialization and the TOML one in the conf.nim module, and that was a very basic change. There seemed to be a lot of complicated magic and took me a while to debug what was even causing the issue, which was a pain.

Additionally, for the life of me, I could not override serialisation of UInt256 (so it was not hex-encoded) using nim-json-serialization inside of codex/utils/json or codex/rest/json. I could override deserialization (readValue) but not serialization (writeValue), which I really think may boil down to a dependency issue.

That's a pain. 🤔 I thought I saw some UInt256 serialization stuff in stint or somewhere. That could explain some of those issues.

I'd like to maybe set some time up with you to see if we can work through that, as it was really not obvious and honestly quite frustrating.

Sure! I'd be up for that sometime and like figuring out puzzles. It'd be nice to chat with you too, I don't think we've had the chance to. :)

I'll take a closer look at your setup as well to see if I can spot anything. It looked pretty straightfoward.

P.S. My favorite serde library in Nim is msgpack4nim. It can serialize directly into or from an object making. This avoids allocations making it near the fastest one I've benchmarked. It rivaled flatbuffers which is famed for it's no-copy speed!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I saw some UInt256 serialization stuff in stint or somewhere. That could explain some of those issues

It could have been https://github.com/status-im/nim-eth/blob/master/eth/common/eth_types_json_serialization.nim#L26-L30. Importing these was definitely something I tried, but could not get working.

Sure! I'd be up for that sometime and like figuring out puzzles. It'd be nice to chat with you too, I don't think we've had the chance to. :)

Let's do it 💪 DM me in discord and we can setup a date/time.

P.S. My favorite serde library in Nim is msgpack4nim. It can serialize directly into or from an object making. This avoids allocations making it near the fastest one I've benchmarked. It rivaled flatbuffers which is famed for it's no-copy speed!

msgpack4nim looks pretty cool! After looking at all the places that we use serialization in codex, i do think a universal solution should be implemented at some point. The issue that I see right now is that nim-chronicles uses nim-json-serialization due to our use of the json sink. I don't think it's possible to use a different serializer for a json sink, unfortunately. We could possibly remove the json sink (using formatIt instead), but I imagine we'll probably need to use the json log output for consumption in eg ElasticSearch or Loggly at some point.

Base automatically changed from fix/empty-blocks-repostore to master August 21, 2023 02:51
The current client used for integration testing against the REST endpoints for Codex accepts and passes primitive types. This caused a hard to diagnose bug where a `uint` was not being deserialized correctly.

In addition, the json de/serializing done between the CodexClient and REST client was not easy to read and was not tested.

These changes bring non-primitive types to most of the CodexClient functions, allowing us to lean on the compiler to ensure we're providing correct typings. More importantly, a json de/serialization util was created as a drop-in replacement for the std/json lib, with the main two differences being that field serialization is opt-in (instead of opt-out as in the case of json_serialization) and serialization errors are captured and logged, making debugging serialization issues much easier.
@emizzle emizzle force-pushed the feat/integration/improve-codex-client branch from 5141d85 to 2164c9f Compare August 28, 2023 03:25
AuHau
AuHau previously approved these changes Aug 30, 2023
Copy link
Member

@AuHau AuHau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Some small nitpicks that are non-blocking 👍

codex/node.nim Outdated
@@ -254,7 +254,7 @@ proc requestStorage*(
## - Run the PoR setup on the erasure dataset
## - Call into the marketplace and purchasing contracts
##
trace "Received a request for storage!", cid, duration, nodes, tolerance, reward
trace "Received a request for storage!", cid, duration, nodes, tolerance, reward, proofProbability, nodes, tolerance, reward, collateral, expiry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are duplicated logged information here (nodes, tolerance)

import std/json
import std/strutils
import pkg/stew/byteutils
# import std/json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed.

# For debugging you can enable logging output with debugX = true
# You can also pass a string in same format like for the `--log-level` parameter
# to enable custom logging levels for specific topics like: debug2 = "INFO; TRACE: marketplace"

twonodessuite "Integration tests", debug1 = false, debug2 = false:
twonodessuite "Integration tests", debug1 = "INFO;TRACE:node,restapi,erasure", debug2 = false:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can turn off the debug1 logging before merging?

@emizzle emizzle merged commit 37b3d99 into master Sep 1, 2023
8 checks passed
@emizzle emizzle deleted the feat/integration/improve-codex-client branch September 1, 2023 05:44
emizzle added a commit that referenced this pull request Sep 29, 2023
## Problem
When Availabilities are created, the amount of bytes in the Availability are reserved in the repo, so those bytes on disk cannot be written to otherwise. When a request for storage is received by a node, if a previously created Availability is matched, an attempt will be made to fill a slot in the request (more accurately, the request's slots are added to the SlotQueue, and eventually those slots will be processed). During download, bytes that were reserved for the Availability were released (as they were written to disk). To prevent more bytes from being released than were reserved in the Availability, the Availability was marked as used during the download, so that no other requests would match the Availability, and therefore no new downloads (and byte releases) would begin. The unfortunate downside to this, is that the number of Availabilities a node has determines the download concurrency capacity. If, for example, a node creates a single Availability that covers all available disk space the operator is willing to use, that single Availability would mean that only one download could occur at a time, meaning the node could potentially miss out on storage opportunities.

## Solution
To alleviate the concurrency issue, each time a slot is processed, a Reservation is created, which takes size (aka reserved bytes) away from the Availability and stores them in the Reservation object. This can be done as many times as needed as long as there are enough bytes remaining in the Availability. Therefore, concurrent downloads are no longer limited by the number of Availabilities. Instead, they would more likely be limited to the SlotQueue's `maxWorkers`.

From a database design perspective, an Availability has zero or more Reservations.

Reservations are persisted in the RepoStore's metadata, along with Availabilities. The metadata store key path for Reservations is ` meta / sales / reservations / <availabilityId> / <reservationId>`, while Availabilities are stored one level up, eg `meta / sales / reservations / <availabilityId> `, allowing all Reservations for an Availability to be queried (this is not currently needed, but may be useful when work to restore Availability size is implemented, more on this later).

### Lifecycle
When a reservation is created, its size is deducted from the Availability, and when a reservation is deleted, any remaining size (bytes not written to disk) is returned to the Availability. If the request finishes, is cancelled (expired), or an error occurs, the Reservation is deleted (and any undownloaded bytes returned to the Availability). In addition, when the Sales module starts, any Reservations that are not actively being used in a filled slot, are deleted.

Having a Reservation persisted until after a storage request is completed, will allow for the originally set Availability size to be reclaimed once a request contract has been completed. This is a feature that is yet to be implemented, however the work in this PR is a step in the direction towards enabling this.

### Unknowns
Reservation size is determined by the `StorageAsk.slotSize`. If during download, more bytes than `slotSize` are attempted to be downloaded than this, then the Reservation update will fail, and the state machine will move to a `SaleErrored` state, deleting the Reservation. This will likely prevent the slot from being filled.

### Notes
Based on #514
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants