Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: contract log query #1601

Draft
wants to merge 11 commits into
base: develop
Choose a base branch
from
Draft

feat: contract log query #1601

wants to merge 11 commits into from

Conversation

zone117x
Copy link
Member

@zone117x zone117x commented Mar 31, 2023

Closes #1598

This PR adds two new query params to the /extended/v1/contract/<principal>/events endpoint.

  1. ?filter_path=<jsonpath expression>:
    Optional jsonpath expression to select only results that contain items matching the expression.
  2. ?contains=<json object>:
    Optional stringified JSON to select only results that contain the given JSON.

Rationale

Clarity smart contracts are able to generate structured log outputs during the execution of a contract-call or contract-deploy transaction. For more context see the Clarity print docs. These contract logs can then be ingested by an event-observer (such as the API) to store and serve contract-specific data. Today we have several examples of this:

  • The bns contract emits logs used by the API to process, track, and serve clients BNS information like ownership, zonefiles, etc.
  • The pox contract emit logs used by the API to store and track stacking related data, which is used in account balance endpoints (i.e. locked balance), Rosetta queries, endpoints for Stacking pool operators, etc.
  • The send-many-stx contract, used by essentially all exchanges, performs batch stx-transfers and uses contract logs to emit a corresponding memo, which the API stores and returns in an endpoint also used by exchanges to track when their wallets have received a batch stx-transfer and the associated memo.
  • SIP-019 defines contract logs that FT/NFT tokens can implement in order to issue metadata update notifications.

These are some of the contracts and logs that this API and others currently have special handling for. Typically, for each of these contracts, the log events have unique code implemented to denormalize the log payload, store in a new table, and new endpoints + sql queries.

There are many more deployed contracts which have their own application-specific usage of contract logs, with their own unique payloads. The API doesn't and cannot reasonably implement custom handling for all of the current and future contracts in the same way it has for the contracts listed above. Currently, clients must use the /extended/v1/contract/{contract_id}/events endpoint. This simply returns the latest N logs for a given contract. Clients must paginate through these requests, and manually parse out and filter the logs they need. Even for the contracts that already receive special treatment (e.g. pox, bns), there are requests for new endpoints with different query criteria which would require us to implement new endpoints and sql queries.

In the Ethereum world, the ability to emit and query structured logs is a critical element of their app ecosystem. Token standards that you may be familiar with (e.g. ERC20) specify the logs that contracts must emit when token operations happen. The eth_getLogs RPC endpoint provides a mechanism for efficiently querying those logs. For example "give me all transfers for a given FT associated with a given recipient address".

In Stacks, we have have the building blocks for offering similar capabilities.

Contact log query implementation

Prior to this PR, contract logs (i.e. structured Clarity print outputs) were stored in a table in the contract_logs in their original consensus-serialized binary encoding. This encoding made it essentially impossible to perform queries against the contents of the log object. Postgres supports a storing arbitrary structured documents in a jsonb column type. This column type also supports a few types of advanced indexes (e.g. GIN(jsonb_ops) and GIN(jsonb_path_ops)) that make it possible perform efficient querys against the arbitrary JSON.

So what we can do is 1) decode the binary Clarity buffers into an object, 2) encode that into a JSON object, 3) store that in a new jsonb column in the contracts_log table, then 4) implement API endpoint(s) that allow more precise queries against these logs.

This PR implements the above. One of the design decisions was to implement yet-another way to encode Clarity values into JSON. The existing schemas we have are lossless (you can reverse the JSON back into the original Clarity value), however, that comes with overhead from having to store Clarity type information (because there is not a 1-1 mapping between JSON primitives and Clarity primitives), so that JSON is less readability and takes up more space. In our use-case, we aren't necessarily concerned with preserving exact Clarity typing, rather we want the JSON to be easy-to-query, readable, space-efficient, index-efficient, and have good interoperability with JSON-based workflows (e.g. jsonpath expressions). Here is the spec for how the Clarity values are encoded into a JSON object in this PR:

  • OptionalSome and ResponseOk are unwrapped (i.e. the value is encoded directly without any nesting).
  • OptionalNone is encoded as json null.
  • ResponseError is encoded as an object with a single key _error which is set to the unwrapped error value.
  • Buffers are encoded as an object containing the key hex with the hex-encoded string as the value, and the key utf8 with the utf8-encoded string as the value. When decoding a Buffer into a string that does not exclusively contain valid UTF-8 data, the Unicode replacement character U+FFFD will be used to represent those errors. Both of these are included because Clarity Buffers are used for storing both binary data and string data (e.g. BNS names).
  • Ints and UInts that are in the range of a safe js integers are encoded as numbers, otherwise they are encoded as string-quoted base-10 integers.
  • Booleans are encoded as booleans.
  • Principals are encoded as strings, e.g. <address> or <address>.<contract_name>.
  • StringAscii and StringUtf8 are both encoded as regular json strings.
  • Lists are encoded as json arrays.
  • Tuples are encoded as json objects.

Usage

Here's an example of how the subnets-nft-demo app can be refactored. The app fetches a list of all events, then on the client decodes of the Clarity values and filters for specific events. This can now be done in a single query leveraging the new filter_path query param. This accepts a jsonpath expression.

Previous code:
https://github.com/hirosystems/subnets-nft-demo/blob/34e433a2d2893f36e1767ce635ee280baf1acbf6/src/stacks/apiCalls.ts#L96-L111

Example of doing the same thing with a single fetch using a jsonpath expression filter:

const url = new URL(`${L2_URL}/extended/v1/contract/${L2_SUBNET_CONTRACT_ID}/events`);
url.searchParams.set('filter_path', `&?(
  @.event == "withdraw" && 
  @.type == "stx" && 
  @.sender == "${address}"
)`);
const req = await fetch(url);

Example of using the new contains param to perform a similar query:

const url = new URL(`${L2_URL}/extended/v1/contract/${L2_SUBNET_CONTRACT_ID}/events`);
url.searchParams.set('contains', JSON.stringify({
  event: 'withdraw',
  type: 'stx',
  sender: address,
}));
const req = await fetch(url);

Here's another example of using jsonpath expression filter to fetch recent BNS name-revoke and name-renewal events:

const bnsContract = 'SP000000000000000000002Q6VF78.bns';
let url = new URL(`${API_HOST}/extended/v1/contract/${bnsContract}/events`);
url.searchParams.set('filter_path', `$.attachment.metadata?(
  @.op == "name-revoke" ||
  @.op == "name-transfer" ||
  @.op == "name-renewal"
)`);
const req = await fetch(url);

Here's an example of using the new contains param. This is also a filter, but it takes a JSON object and returns only the contract log events that contain the given object. Here's an example of fetching recent BNS name-revoke events:

const bnsContract = 'SP000000000000000000002Q6VF78.bns';
let url = new URL(`${API_HOST}/extended/v1/contract/${bnsContract}/events`);
let contains = { attachment: { metadata: { op: 'name-renewal' } } };
url.searchParams.set('contains', JSON.stringify(contains));
const req = await fetch(url);

Considerations

The jsonpath expressions supported in Postgres can include relatively complex operations. For example, recursive key-value lookups, regex, math operations, type coercions, and more. Many of these cannot take advantage of the postgres JSON-specific indexes, which can result in unreasonably expensive queries. The vast majority of expected use-cases only need simple jsonpath expressions which my testing so far have shown to be reasonably efficient. So we need to determine when a given jsonpath expression is "too complex" and reject the request. This PR has a very simple regex-based version of that implemented, however, these regexes are easy to trick and bypass. We need to validate against the actual AST of a jsonpath expression. Existing jsonpath parsing libraries in js do not support the postgres flavor of expressions, so I'm in the process of switching the regex-based validation to jsonpath-pg. This is a library I've wrote which compiles the grammar and lexer C code from the postgres codebase into WASM and exposes the AST object. This can be used to reliable determine the complexity of an expression.

Alternatives

The capabilities described above could potentially be implemented with other tools. For example, no-sql solutions like GraphQL or MongoDB. However, I'd like to explore the possibility of getting this approach to work before considering adding new systems/layers into the project's tech stack.

Upgrading + database migration

This PR changes the sql schema and adds a new non-nullable column to an existing table. Typically, we'd modify the existing migration tables (which breaks existing dbs), and release this as a major version that requires an event-replay. This is easy for API contributors, however, these kinds of breaking changes are difficult for deployment, both internal and external, because event-replays increasingly take more time and resources.

So this PR is experimenting with an more advanced sql schema migration which is compatible with existing deployments and does not require a major version bump or event-replay. Migration libraries (including the one we use) tend not to support advanced table manipulation like what this PR requires, so a new function is called directly after the regular migrations in order to perform the "advanced" migrations. In this case, it takes around half an hour to run this migration because it involves querying and updating every row in the contract_logs table (which internally postgres treats as a DELETE and INSERT, without the ability to batch).

In a future release when a breaking change is strictly required, the "advanced migration" function for this case can be dropped.

TODO

  • Endpoint integration tests
  • Document the migration approach that avoids needing an event-replay
  • Document the new Clarity value JSON-compact encoding format implemented that was implemented power these filters

@github-actions
Copy link

github-actions bot commented Mar 31, 2023

@github-actions github-actions bot temporarily deployed to pull request March 31, 2023 18:34 Inactive
@codecov
Copy link

codecov bot commented Mar 31, 2023

Codecov Report

Merging #1601 (874ad8b) into develop (5d15bc8) will decrease coverage by 55.96%.
The diff coverage is 46.25%.

❗ Current head 874ad8b differs from pull request most recent head 6d94c0e. Consider uploading reports for the commit 6d94c0e to get more accurate results

@@             Coverage Diff              @@
##           develop    #1601       +/-   ##
============================================
- Coverage    77.41%   21.45%   -55.96%     
============================================
  Files           78       78               
  Lines        11161    11302      +141     
  Branches      2487     2521       +34     
============================================
- Hits          8640     2425     -6215     
- Misses        2403     8198     +5795     
- Partials       118      679      +561     
Impacted Files Coverage Δ
src/datastore/common.ts 100.00% <ø> (ø)
src/datastore/pg-store.ts 0.90% <0.00%> (-91.70%) ⬇️
src/api/routes/contract.ts 20.75% <6.25%> (-68.99%) ⬇️
src/datastore/migrations.ts 40.40% <26.00%> (-47.84%) ⬇️
src/api/query-helpers.ts 20.74% <60.00%> (-58.31%) ⬇️
src/helpers.ts 37.30% <80.55%> (-33.35%) ⬇️
src/datastore/pg-write-store.ts 1.26% <100.00%> (-85.98%) ⬇️

... and 52 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@github-actions github-actions bot temporarily deployed to pull request March 31, 2023 18:38 Inactive
@github-actions github-actions bot temporarily deployed to commit March 31, 2023 18:38 Inactive
@github-actions github-actions bot temporarily deployed to commit March 31, 2023 18:45 Inactive
@github-actions github-actions bot temporarily deployed to pull request March 31, 2023 18:45 Inactive
@github-actions github-actions bot temporarily deployed to pull request April 3, 2023 13:07 Inactive
@github-actions github-actions bot temporarily deployed to commit April 3, 2023 13:08 Inactive
@github-actions github-actions bot temporarily deployed to pull request April 3, 2023 14:23 Inactive
@github-actions github-actions bot temporarily deployed to commit April 3, 2023 14:23 Inactive
@zone117x zone117x requested a review from rafaelcr April 7, 2023 17:49
Copy link
Collaborator

@rafaelcr rafaelcr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome. Just left a couple comments and some discussion items.


/** @param { import("node-pg-migrate").MigrationBuilder } pgm */
exports.down = pgm => {
pgm.dropIndex('contract_logs', 'value_json_path_ops_idx');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This index is never created, is this an old line?

@@ -1379,6 +1379,7 @@ export interface SmartContractEventInsertValues {
contract_identifier: string;
topic: string;
value: PgBytea;
value_json: string;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be best if we set this as string | null IMO to avoid storing empty strings

Comment on lines +78 to +87
/*
test.skip('generate', () => {
const result = clarityCompactJsonVectors.map(([, hexEncodedCV, compactJson]) => {
const encodedJson = clarityValueToCompactJson(hexEncodedCV);
const reprResult = decodeClarityValueToRepr(hexEncodedCV);
return [reprResult, hexEncodedCV, JSON.stringify(encodedJson)];
});
console.log(result);
});
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove?

Comment on lines 43 to 51
/*
test('generate vector data', () => {
const result = postgresExampleVectors.map((input) => {
const complexity = containsDisallowedJsonPathOperation(input);
return [input, complexity ? complexity.operation : null];
});
console.log(result);
});
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove?

}
}

async function complete_1680181889941_contract_log_json(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a pressing issue that would require us to ship this feature with this kind of migration? This code looks good to me (and I like this experimentation as you specified in the PR desc), but I worry that it could create some problems if we want to add newer migrations that rely on the new columns or something. IMO it would be easier and more maintainable to just ship this as a regular new migration and rely on replay, or to explicitly mark this code as a "patch" somehow which we could drop once we switch to a new v8 release.

What do you think?

@zone117x zone117x self-assigned this Apr 13, 2023
@github-actions github-actions bot temporarily deployed to commit April 14, 2023 13:03 Inactive
@zone117x zone117x added the P3 Priority 3 There are workarounds, or the functionality is secondary label Jun 15, 2023
@zone117x
Copy link
Member Author

I'd like to get more feedback on this feature to prioritize work on it. My sense is that if this were available, then A) contract authors would take more advantage of structured logs (i.e. print statements) and B) apps would leverage this API feature to make more straightforward and efficient queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 Priority 3 There are workarounds, or the functionality is secondary
Projects
Status: 📋 Backlog
Development

Successfully merging this pull request may close these issues.

Subnet: Get pending withdrawal (STX, FT, NFTs) for address
2 participants