Skip to content

Runbook: Security Coordination Emergency Bug Response with Chain Validators

Jessy Irwin edited this page Oct 29, 2021 · 5 revisions

In the event of the active exploitation of a security vulnerability, or the discovery of a critical or emergency bug in the Agoric stack, it is highly likely that Agoric chain validators and code maintainers will need to coordinate in order to protect or restore network stability. As security emergencies require discrete communication and quick action to defend against attacks, proactively planning for a security coordination group and developing incident response procedures for emergency situations is important for stakeholders across the Agoric ecosystem.

All participants in the security coordination group should refrain from using confidential information about security vulnerabilities for individual gain, and should report any security issues using the vulnerability coordination process outlined in the “Reporting a Security Bug” runbook.

  • Agoric code maintainers will create a Keybase team named Ag0 SERT for the purpose of security coordination for Mainnet 0 and select representatives to communicate information about security emergencies to participants.

  • Validators can signal that they would like to participate in emergency security coordination by creating a governance proposal for self-nomination.

    • The proposal should include a Keybase username, and should be signed with a validator key. This will provide transparency about participation and an audit trail for the wider community as the proposals will be visible in a block explorer.
    • It is not necessary for the proposal to pass, or for supporting votes to be cast for a validator to join the group.
    • It may take up to 48 hours for participants to access the Ag0 SERT Keybase team.

If the Agoric code maintainers have knowledge of a software vulnerability or incident of active vulnerability exploitation:

  • Code maintainers will triage and reproduce the issue to validate impact and severity.
  • Code maintainers may consider developing detection tools to investigate active exploitation of security vulnerabilities.
  • When an issue is validated, information about it will be shared directly with the security coordination group in Keybase.

If the chain validators have knowledge of a software vulnerability or active vulnerability exploitation:

  • Chain validators may consider creating a working group of trusted partners to validate or confirm the issue by reproducing it.
  • Chain validators may consider developing detection tools to investigate active exploitation of a bug on the network.
  • Chain validators should report the issue to Agoric code maintainers by engaging with the vulnerability disclosure process outlined in the “Reporting a Security Bug” runbook.

Once a security issue or incident has been validated, chain validators and Agoric code maintainers will begin discussing technical remediation approaches to resolve the issue.

  • When a solution to an issue or incident is identified, patch development or remediation steps will begin.
  • If a software update is required, Agoric code maintainers will work with validators to create and distribute an emergency patch.
  • If node configuration changes or emergency actions are required, chain validators will create and share information and coordinate any activity required to protect the chain.
  • If an issue is Critical or High Severity, Agoric code maintainers will release a security advisory to notify impacted parties to prepare for an emergency patch.

Once a patch is released or emergency actions are taken:

  • Chain validators are responsible for deciding to test and apply security patches to their nodes.
  • Chain validators will lead any required coordination of on-chain governance, network upgrades, chain halts, hard forks, timing for upgrades (e.g. choice of block height) necessary to resolve the security emergency.
  • Agoric code maintainers will collaborate with chain validators to gather facts, confirm a timeline of events, and publish information including a retrospective timeline about the security emergency within a week of the security patch release.

Discussion of this topic, including recommendations and edits, should take place in Issue #4012.

Clone this wiki locally