Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISS Detection and Logging #363

Open
poulok opened this issue Nov 26, 2024 · 0 comments
Open

ISS Detection and Logging #363

poulok opened this issue Nov 26, 2024 · 0 comments
Labels
New Feature A new feature, service, or documentation. Major changes that are not backwards compatible.

Comments

@poulok
Copy link
Member

poulok commented Nov 26, 2024

Problem

Debugging an ISS is difficult. Today, we have tools on the consensus node that allow engineers to look at logs from an ISS node and a non-ISS node, compare the top level hashes from various parts of the merkle tree, and determine which part of the state contains the mismatch (i.e. NFT virtual map, accounts virtual map, platform state, etc.) These hashes are written for every single round for this exact purpose. Since there are only a couple dozen hashes, this is possible. Knowing which part of state differs is critical information for engineers trying to determine where the faulty logic is.

Example log output:

Round:                         191982655
Timestamp:                     2024-11-26T14:37:40.976788905Z
Next consensus number:         79452167590
Legacy running event hash:     beb5fbdb272fc796a00f5add9ba7af0f242ebc6cf4f8bc6ca2f72b7258fe58598cae0e9defa8401cfae46029c3810939
Legacy running event mnemonic: city-lucky-tourist-ask
Rounds non-ancient:            26
Creation version:              SemanticVersion[major=0, minor=55, patch=2, pre=, build=0]
Minimum judge hash code:       -1081566551
Root hash:                     e95c80da72c8185efd3d9c2b7c6120637ad4d9262ae4d797145289f14d6304105aea8ff4fee533b2f8c59b52951c3c4c
First BR Version:              null
Last round before BR:          -1
Lowest Judge Gen before BR     -1

(root) MerkleStateRoot                                                                             /      stand-tooth-flip-armed
   0 SingletonNode        EntityIdService.ENTITY_ID                                                /0     rain-vibrant-pupil-portion
   1 SingletonNode        BlockRecordService.BLOCKS                                                /1     large-super-scorpion-monkey
   2 SingletonNode        BlockRecordService.RUNNING_HASHES                                        /2     blade-brand-lawn-peasant
   3 SingletonNode        CongestionThrottleService.CONGESTION_LEVEL_STARTS                        /3     door-library-planet-blush
   4 SingletonNode        CongestionThrottleService.THROTTLE_USAGE_SNAPSHOTS                       /4     nose-kite-debris-jump
   5 VirtualMap           ConsensusService.TOPICS                                                  /5     brother-remove-faint-list
   6 VirtualMap           ContractService.BYTECODE                                                 /6     hybrid-head-inside-peanut
   7 VirtualMap           ContractService.STORAGE                                                  /7     ghost-urban-symptom-squirrel
   8 SingletonNode        FeeService.MIDNIGHT_RATES                                                /8     fancy-key-reopen-amount
   9 VirtualMap           FileService.FILES                                                        /9     unfair-fit-funny-drive
   10 QueueNode           FileService.UPGRADE_DATA[FileID[shardNum=0, realmNum=0, fileNum=150]]    /10    invest-upset-hold-caution
   11 QueueNode           FileService.UPGRADE_DATA[FileID[shardNum=0, realmNum=0, fileNum=151]]    /11    few-garbage-proud-potato
   12 QueueNode           FileService.UPGRADE_DATA[FileID[shardNum=0, realmNum=0, fileNum=152]]    /12    frame-wet-liberty-latin
   13 QueueNode           FileService.UPGRADE_DATA[FileID[shardNum=0, realmNum=0, fileNum=153]]    /13    snack-bargain-regret-depth
   14 QueueNode           FileService.UPGRADE_DATA[FileID[shardNum=0, realmNum=0, fileNum=154]]    /14    panel-dance-paper-item
   15 QueueNode           FileService.UPGRADE_DATA[FileID[shardNum=0, realmNum=0, fileNum=155]]    /15    hurdle-student-type-risk
   16 QueueNode           FileService.UPGRADE_DATA[FileID[shardNum=0, realmNum=0, fileNum=156]]    /16    school-mass-brain-dilemma
   17 QueueNode           FileService.UPGRADE_DATA[FileID[shardNum=0, realmNum=0, fileNum=157]]    /17    news-harsh-spawn-resist
   18 QueueNode           FileService.UPGRADE_DATA[FileID[shardNum=0, realmNum=0, fileNum=158]]    /18    dish-sponsor-salmon-mechanic
   19 QueueNode           FileService.UPGRADE_DATA[FileID[shardNum=0, realmNum=0, fileNum=159]]    /19    toss-rival-slush-scan
   20 SingletonNode       FreezeService.FREEZE_TIME                                                /20    fragile-tree-boil-wage
   21 SingletonNode       FreezeService.UPGRADE_FILE_HASH                                          /21    actual-legal-ritual-cash
   22 null                                                                                         /22    bid-belt-culture-decorate
   23 VirtualMap          ScheduleService.SCHEDULES_BY_EQUALITY                                    /23    cheap-practice-mansion-swear
   24 VirtualMap          ScheduleService.SCHEDULES_BY_EXPIRY_SEC                                  /24    joke-rival-suffer-school
   25 VirtualMap          ScheduleService.SCHEDULES_BY_ID                                          /25    trash-enrich-combine-improve
   26 VirtualMap          TokenService.ACCOUNTS                                                    /26    result-quote-oil-pulse
   27 VirtualMap          TokenService.ALIASES                                                     /27    dismiss-cereal-vintage-subway
   28 VirtualMap          TokenService.NFTS                                                        /28    bronze-solution-portion-grit
   29 VirtualMap          TokenService.STAKING_INFOS                                               /29    shoot-jar-chapter-lake
   30 SingletonNode       TokenService.STAKING_NETWORK_REWARDS                                     /30    average-outer-pool-nothing
   31 VirtualMap          TokenService.TOKENS                                                      /31    double-noodle-trend-hurry
   32 VirtualMap          TokenService.TOKEN_RELS                                                  /32    viable-robot-citizen-pumpkin
   33 VirtualMap          AddressBookService.NODES                                                 /33    impose-jaguar-shock-veteran
   34 VirtualMap          TokenService.PENDING_AIRDROPS                                            /34    quick-vicious-salmon-ill
   35 SingletonNode       PlatformStateService.PLATFORM_STATE                                      /35    detect-purse-farm-party
   36 QueueNode           RecordCache.TransactionReceiptQueue                                      /36    angry-piano-doll-verify
   37 VirtualMap          RosterService.ROSTERS                                                    /37    frog-churn-lock-faith
   38 SingletonNode       RosterService.ROSTER_STATE                                               /38    village-false-engage-range

With the MegaMap project, there will no longer be top level hashes for each type of data in state. Everything will be mixed together and interleaved in a single virtual map. Engineers will no longer have any tools to determine where the faulty logic is - it will be like finding a needle in a haystack.

Solution

If a consensus node ISSes, there will be a mismatch between the block for that round the block with the same block ID from a non-ISS consensus node. Since block nodes will receive blocks from several consensus nodes, it is able to identify mismatches. When a block node receives a block with the same block ID but a different hash, an ISS has occurred.

To provide debugging information for engineers, the block node should start at the beginning of the two mismatching blocks, iterate through each block item, and compare the hashes. The first block item with mismatched hashes should be logged.

Information to log:

  1. The ID of the non-ISS consensus node used to perform the comparison
  2. The ID of the consensus node that ISSed (i.e. does not have a proof for the block)
  3. The block ID and round number of the mismatched block
  4. The contents of the mismatched block item in readable format
  5. If the mismatched block item is a state change, the contents of the transaction (or other input) in readable format that is associated with that state change

Additionally, it may be useful to keep the ISS block on disk somewhere easy to access in case engineers need to inspect it.

Alternatives

If engineers have the states from the ISS node and a non-ISS node, they could do a state diff and find exactly which node diverged. However this is not practical. The vast majority of states are not written to disk and are cleared from memory when they get old enough. Consensus nodes will stop uploading state at some point, so even if the consensus node has special logic to save ISS states to disk AND a non-ISS node still have the same round in memory in order to save it to disk, engineers would need a way to download those states. As we become more and more decentralized, this will be increasingly difficult.

@poulok poulok added the New Feature A new feature, service, or documentation. Major changes that are not backwards compatible. label Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
New Feature A new feature, service, or documentation. Major changes that are not backwards compatible.
Projects
None yet
Development

No branches or pull requests

1 participant