Network Canary Down

Jump to bottom Edit New page

Anandkumar Patel edited this page Jul 6, 2016 · 11 revisions

network canary investigation

look in loggly for network canary failed message link to search That log message provides the ip addresses that where unreachable and also the dockerHost the test was run on.
In mongo get list of containers we are supposed to ping

db.instances.find({ 
  'container.inspect.State.Running': true,
  'owner.github': ORG_ID, 
}, {
  'network.hostIp': 1,
  'name': 1,
  'container.dockerHost': 1,
  'container.dockerContainer': 1
})

Ensure

we did not ping something we are not supposed to
dockerhost still exists
container is running
weave ps on server shows an ip for that container (if it does not something went wrong, setupSwarmDelta and restart the container)
weave status
weave status connections

look at weave logs for errors (run this on the dockerhost of the unreachable instance and the targetDockerUrl you got from loggly

docker logs weave 2>&1 | grep -q -m1 "no such device" && echo BAD || echo OK

if you see BAD weave is hosed

kill weave container to fix and ensure it comes back up (if it does not check sauron)

docker kill weave

If dock does not exist, setup ssh tunnel to delta rabbit and enqueue dock.removed job

{
  "host": "http://DOCK_IP:4242",
  "githubId": ORG_NUMBER
}

ORG_NUMBER is a number not a string