Skip to content

Network Canary Down

Anandkumar Patel edited this page Jul 6, 2016 · 11 revisions

network canary investigation

  1. look in loggly for network canary failed message link to search That log message provides the ip addresses that where unreachable and also the dockerHost the test was run on.

  2. In mongo get list of containers we are supposed to ping

db.instances.find({ 
  'container.inspect.State.Running': true,
  'owner.github': ORG_ID, 
}, {
  'network.hostIp': 1,
  'name': 1,
  'container.dockerHost': 1,
  'container.dockerContainer': 1
})

Ensure

  • we did not ping something we are not supposed to
  • dockerhost still exists
  • container is running
  • weave ps on server shows an ip for that container (if it does not something went wrong, setupSwarmDelta and restart the container)
  • weave status
  • weave status connections
  1. look at weave logs for errors (run this on the dockerhost of the unreachable instance and the targetDockerUrl you got from loggly
docker logs weave 2>&1 | grep -q -m1 "no such device" && echo BAD || echo OK

if you see BAD weave is hosed

  1. kill weave container to fix and ensure it comes back up (if it does not check sauron)
docker kill weave
  1. If dock does not exist, setup ssh tunnel to delta rabbit and enqueue dock.removed job
{
  "host": "http://DOCK_IP:4242",
  "githubId": ORG_NUMBER
}

ORG_NUMBER is a number not a string

Clone this wiki locally