Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add Stop() method to shut down cluster. #322

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

timoreimann
Copy link
Collaborator

This enables to terminate all pending health checks. Apart from making our tests shut down cleanly, it will also give users an opportunity to stop otherwise potentially leaking health-checking goroutines.

@timoreimann timoreimann force-pushed the shut-down-cluster branch 3 times, most recently from 5a758ea to 0fd6fa5 Compare August 28, 2017 19:31
@timoreimann timoreimann changed the title Add Stop() method to shut down cluster. [WIP] Add Stop() method to shut down cluster. Aug 28, 2017
@timoreimann
Copy link
Collaborator Author

timoreimann commented Aug 28, 2017

Something is still fishy:

=== RUN   TestUnsubscribe
--- PASS: TestUnsubscribe (0.00s)
=== RUN   TestEventStreamEventsReceived
2017/08/28 19:42:43 httptest.Server blocked in Close after 5 seconds, waiting for connections:
  *net.TCPConn 0xc820032088 127.0.0.1:41900 in state active
SIGQUIT: quit
PC=0x463e41 m=0

goroutine 0 [idle]:
runtime.futex(0xd19788, 0x0, 0x0, 0x0, 0x0, 0xd18d30, 0x0, 0x0, 0x410bca, 0xd19788, ...)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/runtime/sys_linux_amd64.s:302 +0x21
runtime.futexsleep(0xd19788, 0x0, 0xffffffffffffffff)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/runtime/os1_linux.go:40 +0x53
runtime.notesleep(0xd19788)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/runtime/lock_futex.go:145 +0xca
runtime.stopm()
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/runtime/proc.go:1535 +0x10b
runtime.findrunnable(0xc820027500, 0x0)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/runtime/proc.go:1973 +0x808
runtime.schedule()
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/runtime/proc.go:2072 +0x2a6
runtime.goexit0(0xc820001800)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/runtime/proc.go:2207 +0x1bf
runtime.mcall(0x4577c0)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/runtime/asm_amd64.s:233 +0x5b

goroutine 1 [chan receive, 9 minutes]:
testing.RunTests(0xb5fa28, 0xd16a20, 0x76, 0x76, 0xd14c01)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/testing/testing.go:583 +0xb39
testing.(*M).Run(0xc820040f10, 0xc8200137f0)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/testing/testing.go:515 +0x11e
main.main()
	github.com/gambol99/go-marathon/_test/_testmain.go:288 +0x211

goroutine 17 [syscall, 9 minutes, locked to thread]:
runtime.goexit()
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/runtime/asm_amd64.s:1998 +0x1

goroutine 385 [select]:
github.com/gambol99/go-marathon.(*cluster).healthCheckNode(0xc820404000, 0xc8203feca0)
	/home/travis/gopath/src/github.com/gambol99/go-marathon/cluster.go:166 +0x33f
github.com/gambol99/go-marathon.(*cluster).markDown.func1(0xc820404000, 0xc8200330b0)
	/home/travis/gopath/src/github.com/gambol99/go-marathon/cluster.go:152 +0x4a
created by github.com/gambol99/go-marathon.(*cluster).markDown
	/home/travis/gopath/src/github.com/gambol99/go-marathon/cluster.go:154 +0x278

goroutine 548 [chan send, 9 minutes]:
github.com/donovanhide/eventsource.(*Server).Handler.func1(0x2b492f53be58, 0xc8202476c0, 0xc8202f87e0)
	/home/travis/gopath/src/github.com/donovanhide/eventsource/server.go:87 +0x44f
github.com/gambol99/go-marathon.authMiddleware.func1(0x2b492f53be58, 0xc8202476c0, 0xc8202f87e0)
	/home/travis/gopath/src/github.com/gambol99/go-marathon/testing_utils_test.go:254 +0x251
net/http.HandlerFunc.ServeHTTP(0xc820409260, 0x2b492f53be58, 0xc8202476c0, 0xc8202f87e0)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/net/http/server.go:1618 +0x48
net/http.(*ServeMux).ServeHTTP(0xc820409230, 0x2b492f53be58, 0xc8202476c0, 0xc8202f87e0)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/net/http/server.go:1910 +0x213
net/http.serverHandler.ServeHTTP(0xc82030c680, 0x2b492f53be58, 0xc8202476c0, 0xc8202f87e0)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/net/http/server.go:2081 +0x207
net/http.(*conn).serve(0xc82030c980)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/net/http/server.go:1472 +0x1566
created by net/http.(*Server).Serve
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/net/http/server.go:2137 +0x4d2

goroutine 384 [select]:
github.com/gambol99/go-marathon.(*cluster).healthCheckNode(0xc820404000, 0xc8203fec20)
	/home/travis/gopath/src/github.com/gambol99/go-marathon/cluster.go:166 +0x33f
github.com/gambol99/go-marathon.(*cluster).markDown.func1(0xc820404000, 0xc8200330a8)
	/home/travis/gopath/src/github.com/gambol99/go-marathon/cluster.go:152 +0x4a
created by github.com/gambol99/go-marathon.(*cluster).markDown
	/home/travis/gopath/src/github.com/gambol99/go-marathon/cluster.go:154 +0x278

goroutine 527 [semacquire, 9 minutes]:
sync.runtime_Semacquire(0xc820405d1c)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/runtime/sema.go:47 +0x26
sync.(*WaitGroup).Wait(0xc820405d10)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/sync/waitgroup.go:127 +0x118
net/http/httptest.(*Server).Close(0xc820405ce0)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/net/http/httptest/server.go:189 +0x357
github.com/gambol99/go-marathon.(*fakeServer).Close(0xc820405e10)
	/home/travis/gopath/src/github.com/gambol99/go-marathon/testing_utils_test.go:318 +0x6b
github.com/gambol99/go-marathon.(*endpoint).CloseServer(0xc820405e00)
	/home/travis/gopath/src/github.com/gambol99/go-marathon/testing_utils_test.go:327 +0x3b
github.com/gambol99/go-marathon.(*endpoint).Close(0xc820405e00)
	/home/travis/gopath/src/github.com/gambol99/go-marathon/testing_utils_test.go:323 +0x6c
github.com/gambol99/go-marathon.TestEventStreamEventsReceived(0xc8204105a0)
	/home/travis/gopath/src/github.com/gambol99/go-marathon/subscription_test.go:359 +0x11bb
testing.tRunner(0xc8204105a0, 0xd172f0)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/testing/testing.go:473 +0xdd
created by testing.RunTests
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/testing/testing.go:582 +0xae3

goroutine 546 [chan receive]:
github.com/gambol99/go-marathon.(*marathonClient).registerSSESubscription.func1(0xc8204482d0)
	/home/travis/gopath/src/github.com/gambol99/go-marathon/subscription.go:180 +0x1a1
created by github.com/gambol99/go-marathon.(*marathonClient).registerSSESubscription
	/home/travis/gopath/src/github.com/gambol99/go-marathon/subscription.go:187 +0x73
rax    0xca
rbx    0x0
rcx    0x463e43
rdx    0x0
rdi    0xd19788
rsi    0x0
rbp    0x1
rsp    0x7ffe0a7ff850
r8     0x0
r9     0x0
r10    0x0
r11    0x286
r12    0xc82002cf58
r13    0xd19680
r14    0xc820001800
r15    0xd3cd00
rip    0x463e41
rflags 0x286
cs     0x33
fs     0x0
gs     0x0
*** Test killed with quit: ran too long (10m0s).
FAIL	github.com/gambol99/go-marathon	600.030s
make: *** [test] Error 1

This enables to terminate all pending health checks. Apart from making
our tests shut down cleanly, it will also give users an opportunity to
stop otherwise potentially leaking health-checking goroutines.
Add extra CloseServer method for cases where only the server should be
closed.
@timoreimann
Copy link
Collaborator Author

Seems like we're yet missing to terminate SSE subscriptions.

@nilium
Copy link

nilium commented Sep 23, 2019

Has there been any further work towards supporting cluster (or client) shutdowns? Asking because I'm about ready to start doing this myself since we're running into goroutine leaks in health checks.

We currently schedule Marathon on arbitrary hosts, so it's possible that a member of the cluster will go away and reappear with a new host and port, which currently requires a new client to be allocated. So, unless I can reconfigure a client's cluster (doesn't seem like the right approach), I'd have to be able to stop the client to avoid leaking health check and other goroutines.

@timoreimann
Copy link
Collaborator Author

@nilium I stopped using Marathon and subsequently go-marathon a while ago. You're more than welcome to continue the effort. I can try my best to review any PR(s) you may submit.

Thank you!

@nilium
Copy link

nilium commented Sep 23, 2019

@timoreimann Thanks! I've got some changes that I'm testing out in our systems, so if all goes well, I'll have a pull request once I've had some time to make sure that it's working the way I expect. Just testing it on our own work to avoid too much churn in the PR. No ETA on that right now, but the WIP branch is over at https://github.com/Kochava/go-marathon/tree/cluster-stop for anyone else that wants to poke at it. (May be force pushed over, so please don't pin anything to that fork or branch even if it works for someone.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants