diff --git a/docs/3.tutorials/indexer/run-near-lake.md b/docs/3.tutorials/indexer/run-near-lake.md new file mode 100644 index 00000000000..92aa0837bf9 --- /dev/null +++ b/docs/3.tutorials/indexer/run-near-lake.md @@ -0,0 +1,226 @@ +--- +id: run-lake-indexer +sidebar_label: Running Lake Indexer +--- + +# Running NEAR Lake Indexer + +:::info + +NEAR Lake is a blockchain indexer built on top of [NEAR Indexer microframework](https://github.com/nearprotocol/nearcore/tree/master/chain/indexer) +to watch the network and store all the events as JSON files on AWS S3. + +::: + +## How to start + +The Lake Indexer setup consists of the following components: +* AWS S3 Bucket as a storage +* NEAR Lake binary that operates as a regular NEAR Protocol peer-to-peer node, so you will operate it as + any other [Regular/RPC Node in NEAR](https://near-nodes.io/rpc/hardware-rpc) + +### Prepare Development Environment + +Before you proceed, make sure you have the following software installed: +* [Rust compiler](https://rustup.rs/) of the version that is mentioned in `rust-toolchain` file in the root of + [nearcore](https://github.com/nearprotocol/nearcore) project. +* Ensure you have [AWS Credentials configured](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html) + From AWS Docs: + + > For example, the files generated by the AWS CLI for a default profile configured with aws configure looks similar to the following. + > + > ~/.aws/credentials + > ``` + > [default] + > aws_access_key_id=AKIAIOSFODNN7EXAMPLE + > aws_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY + > ``` + +### Compile NEAR Lake + +```bash +$ cargo build --release +``` + +### Configure NEAR Lake + +To connect NEAR Lake to the specific chain you need to have necessary configs, you can generate it as follows: + +```bash +$ ./target/release/near-lake --home ~/.near/testnet init --chain-id testnet --download-config --download-genesis +``` + +The above code will download the official genesis config and generate necessary configs. You can replace `testnet` in the command above to different network ID (`betanet`, `mainnet`). + +:::info nearcore configuration + +According to changes in `nearcore` config generation we don't fill all the necessary fields in the config file. +While [this issue is open](https://github.com/nearprotocol/nearcore/issues/3156) you need to download config you want and replace the generated one manually. +- [testnet config.json](https://s3-us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/testnet/config.json) +- [betanet config.json](https://s3-us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/betanet/config.json) +- [mainnet config.json](https://s3-us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/mainnet/config.json) + +::: + +Configs for the specified network are in the `--home` provided folder. We need to ensure that NEAR Lake follows +all the necessary shards, so `"tracked_shards"` parameters in `~/.near/testnet/config.json` needs to be configured properly. +Currently, `nearcore` treats empty value for `"tracked_shards"` as "do not track any shard" and **any value** as "track all shards". +For example, in order to track all shards, you just add the shard #0 to the list: + +``` +... +"tracked_shards": [0], +... +``` + +### Run NEAR Lake + +Commands to run NEAR Lake, after `./target/release/near-lake` + +| Command | Key/Subcommand | Required/Default | Responsible for | +|--------- |-------------------------- |------------------------------------------------------------------ |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| | `--home` | Default
`~/.near` | Tells the node where too look for necessary files:
`config.json`
,
`genesis.json`
,
`node_key.json`
, and
`data`
folder | +| `init` | | | Tells the node to generate config files in `--home-dir` | +| | `--chain-id` | Required

* `localnet`
* `testnet`
* `mainnet` | Defines the chain to generate config files for | +| | `--download-config` | Optional | If provided tells the node to download `config.json` from the public URL. You can download them manually

- [testnet config.json](https://s3-us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/testnet/config.json)
- [mainnet config.json](https://s3-us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/mainnet/config.json) | +| | `--download-genesis` | Optional | If provided tells the node to download `genesis.json` from the public URL. You can download them manually

- [testnet genesis.json](https://s3-us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/testnet/genesis.json)
- [mainnet genesis.json](https://s3-us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/mainnet/genesis.json) | +| | TODO:
Other `neard` keys | | | +| `run` | | | Runs the node | +| | `--bucket` | Required | AWS S3 Bucket name | +| | `--region` | Required | AWS S3 Bucket region | +| | `--fallback-region` | Default eu-central-1 | AWS S3 Fallback region | +| | `--endpoint` | Optional | AWS S3 compatible API endpoint | +| | `--stream-while-syncing` | Optional | If provided Indexer streams blocks while they appear on the node instead of waiting the node to be fully synced | +| | `--concurrency` | Default 1 | Defines the concurrency for the process of saving block data to AWS S3 | +| | `sync-from-latest` | One of the `sync-` subcommands is required | Tells the node to start indexing from the latest block in the network | +| | `sync-from-interruption` | One of the `sync-` subcommands is required | Tells the node to start indexing from the block the node was interrupted on (if it is a first start it will fallback to `sync-from-latest`) | +| | `sync-from-block --height N` | One of the
`sync-`
subcommands is required | Tells the node to start indexing from the specified block height `N` (**Ensure** you node data has the block you want to start from) | + +```bash +$ ./target/release/near-lake --home ~/.near/testnet run --stream-while-syncing --concurrency 50 sync-from-latest +``` + +After the network is synced, you should see logs of every block height currently received by NEAR Lake. + + +## Syncing + +Whenever you run NEAR Lake for any network except localnet you'll need to sync with the network. +This is required because it's a natural behavior of `nearcore` node and NEAR Lake is a wrapper +for the regular `nearcore` node. In order to work and index the data your node must be synced +with the network. This process can take a while, so we suggest to download a fresh backup of +the `data` folder and put it in you `--home-dir` of your choice (by default it is `~/.near`) + +:::tip +Running your NEAR Lake node on top of a backup data will reduce the time of syncing process +because your node will download only the data after the backup was cut and it takes reasonable amount time. +::: + +All the backups can be downloaded from the public S3 bucket which contains latest daily snapshots: + +You will need [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html) to be installed in order to download the backups. + +### Mainnet + +``` +$ aws s3 --no-sign-request cp s3://near-protocol-public/backups/mainnet/rpc/latest . +$ LATEST=$(cat latest) +$ aws s3 --no-sign-request cp --no-sign-request --recursive s3://near-protocol-public/backups/mainnet/rpc/$LATEST ~/.near/data +``` + +### Testnet + +``` +$ aws s3 --no-sign-request cp s3://near-protocol-public/backups/testnet/rpc/latest . +$ LATEST=$(cat latest) +$ aws s3 --no-sign-request cp --no-sign-request --recursive s3://near-protocol-public/backups/testnet/rpc/$LATEST ~/.near/data +``` + + + +## Running NEAR Lake as an archival node + +It's not necessary but in order to index everything in the network it is better to do it from the genesis. +`nearcore` node is running in non-archival mode by default. That means that the node keeps data only +for [5 last epochs](https://docs.near.org/concepts/basics/epoch). In order to index data from the genesis +you need to turn the node in archival mode. + +To do it you need to update `config.json` located in `--home-dir` (by default it is `~/.near`). + +Find next keys in the config and update them as following: + +```json +{ + ... + "archive": true, + "tracked_shards": [0], + ... +} +``` + +The syncing process in archival mode can take a lot of time, so it's better to download a backup provided by NEAR +and put it in your `data` folder. After that your node will download only the data after the backup was cut and it +takes reasonable amount time. + + +All the backups can be downloaded from the public S3 bucket which contains the latest daily snapshots: + +* [Archival Mainnet data folder](https://near-protocol-public.s3-accelerate.amazonaws.com/backups/mainnet/archive/data.tar) +* [Archival Testnet data folder](https://near-protocol-public.s3-accelerate.amazonaws.com/backups/testnet/archive/data.tar) + +See [this link](https://near-nodes.io/archival/run-archival-node-with-nearup) for reference + +## Using the data + +We write all the data to AWS S3 buckets: + +- `near-lake-data-testnet` (`eu-central-1` region) for testnet +- `near-lake-data-mainnet` (`eu-central-1` region) for mainnet + +## Custom S3 storage + +In case you want to run you own `near-lake` instance and store data in some S3 compatible storage ([Minio](https://min.io/) or [Localstack](https://localstack.cloud/) as example) +You can override default S3 API endpoint by using `--endpoint` option + +- run `minio` + +```bash +$ mkdir -p /data/near-lake-custom && minio server /data +``` + +- run `near-lake` + +```bash +$ ./target/release/near-lake --home ~/.near/testnet run --endpoint http://127.0.0.1:9000 --bucket near-lake-custom sync-from-latest +``` + +### Data structure + +The data structure we use is the following: + +``` +/ + block.json + shard_0.json + shard_1.json + ... + shard_N.json +``` + +- `` is a 12-character-long `u64` string with leading zeros (e.g `000042839521`). [See this issue for a reasoning](https://github.com/near/near-lake/issues/23) +- `block_json` contains JSON-serialized [`BlockView`](https://github.com/near/nearcore/blob/e9a28c46c2bea505b817abf484e6015a61ea7d01/core/primitives/src/views.rs#L711-L716) struct. **Note:** this struct might change in the future, we will announce it +- `shard_N.json` where `N` is `u64` starting from `0`. Represents the index number of the shard. In order to find out the expected number of shards in the block you can look in `block.json` at `.header.chunks_included` + +### Access the data + +All NEAR Lake AWS S3 buckets have [Request Payer](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RequesterPaysBuckets.html) enabled. It means that anyone with their own AWS credentials can List and Read the bucket's content and **be charged for it by AWS**. Connections to the bucket have to be done with AWS credentials provided. See [NEAR Lake Framework](https://github.com/near/near-lake-framework) for a reference. + +### NEAR Lake Framework + +Once we [set up the public access to the buckets](https://github.com/near/near-lake/issues/22) anyone will be able to build their own code to read it through. + +For our own needs we are working on [NEAR Lake Framework](https://github.com/near/near-lake-framework) to have a simple way to create an indexer on top of the data stored by NEAR Lake itself. + +:::note +See the official NEAR Lake Framework [announcement on the NEAR Gov Forum](https://gov.near.org/t/announcement-near-lake-framework-brand-new-word-in-indexer-building-approach/17668). +::: diff --git a/website/sidebars.json b/website/sidebars.json index 6d5b02dedaa..f119f992ca3 100644 --- a/website/sidebars.json +++ b/website/sidebars.json @@ -365,6 +365,7 @@ { "NEAR Lake Indexer": [ "tutorials/indexer/near-lake-state-changes-indexer", + "tutorials/indexer/run-lake-indexer", "tutorials/indexer/js-lake-indexer", "tutorials/indexer/lake-start-options", "tutorials/indexer/python-lake-indexer",