Skip to content
This repository has been archived by the owner on Apr 24, 2020. It is now read-only.

change test/run.js to report failing script name #256

Merged
merged 1 commit into from
Jan 16, 2014

Conversation

f34rdotcom
Copy link
Contributor

test/run.js does not report the failing test file making it difficult to know what test failed when all tests are async.

The build failure is not the fault of this fix. Its an existing issue I have been looking at that this fix helps expose. A race condition seems to exist in test/test.socket.unbind.js and it will fail intermittently.

@kkoopa
Copy link
Contributor

kkoopa commented Nov 24, 2013

I did the unbind code and can of course not get it to fail on my own system, race conditions are annoying as hell. Could there be something off in libzmq itself? I see the travis tests are running against outdated release candidates (3.2.0, 3.2.1). Do these transient failures ever show up in other versions of libzmq? If not, I guess we can write them off as broken upstream, but that would probably be too easy.

@ronkorving: Is there a reason to have those old release candidates in travis? Why support them at all?

Version of zmq does not seem to be the problem. I tried with zmq-3.2.0-rc1 and node 0.11.9 and could not get it to fail.

@f34rdotcom
Copy link
Contributor Author

I am using 3.2.4 on my own project and have not been able to reproduce the problem. I have seen it fail on 3.2.1,3.2.1 and 3.2.2 on Travis.

The error itself looks like a well known issue related to accessing a zmq socket from multiple threads something that is not supported in zmq.

AFAIK this is not something that will happen on node as V8 runs from a single thread as I understand it. If some case exist that main() talks to a zmq function and not the thread() that main creates such as maybe on some signal handler or on exit that would be the only case I can think of.

I have not studied the guts of V8 so I am not sure if that would ever happen.

@kkoopa
Copy link
Contributor

kkoopa commented Nov 24, 2013

Hmm... could it be that only certain node versions show odd behavior?
@rvagg: Have you got any ideas regarding what might be causing this or how to track it down? This is related to #254.

@f34rdotcom
Copy link
Contributor Author

I did a bunch of tests with zmq 3.2.4 and 3.2.2 and node .10 and .11 and was able after many tries capture the error on both on all versions of node and zmq.

@rvagg
Copy link
Contributor

rvagg commented Nov 24, 2013

sorry @kkoopa I'm not sure exactly what I'm supposed to be commenting on here, the travis build for this is green ... ?

re travis and npm, it's a bit of a hit-and-miss depending on the version of Node & npm you're using (certainly v0.10.19 is the worst of recent 0.10 releases). I think it's mostly find in 0.10.20+ but the later 0.11.x releases have problems. For good coverage across node versions for a project like this I'm not sure travis can give us what we need right now; hence my resorting to Docker containers locally to have good test coverage: https://github.com/rvagg/nan/tree/master/test/docker - the same solution could be used here but of course it doesn't give the advantage of having it run for every PR.

@ronkorving
Copy link
Collaborator

@kkoopa what do you mean release candidates? No RCs are mentioned in https://github.com/JustinTulloss/zeromq.node/blob/master/.travis.yml
We should probably add some newer versions though.

@kkoopa
Copy link
Contributor

kkoopa commented Nov 25, 2013

@rvagg It's the intermittent failure in #254 . It sometimes fails, it sometimes does not. This problem should already exist in current master. There is a race condition lurking somewhere, I am now thinking it might be related to #232 where I removed socket locking in order to add unbind support: kkoopa@62cd8bc

@ronkorving I am referring to https://raw.github.com/zeromq/zeromq3-x/master/NEWS which lists 3.2.0 and 3.2.1 as release candidates. 3.2.2 was the first proper release of 3.2-series.

I have created a branch of master https://github.com/kkoopa/zeromq.node/tree/unbindfix with the socket locking changes reverted, which currently does not support unbinding either. Tests fail due to unbind locking the socket, which fails the GET_SOCKET() in the poller, due to the socket being marked as busy.

@ronkorving
Copy link
Collaborator

@kkoopa ah, I wasn't aware of that (anymore). Feel free to update the travis file. By the way, you need maintainer rights (if you want, that is). Could either @JustinTulloss or @visionmedia take care of that?

@kkoopa kkoopa mentioned this pull request Nov 25, 2013
@rvagg
Copy link
Contributor

rvagg commented Nov 26, 2013

I can't make a failure happen unfortunately, probably a machine-speed thing I guess. I bet it's to do with GC, maybe you're releasing a Persistent too early? If you can get to a situation where you're able to reproduce the error with some regularity then try changing the tests to hang on (and use) the associated variables right up till the end of the test. If you go down about 1/2 way in this post where I start to talk about batch leaks then you'll see one way we were able to confirm a GC/Persistent related bug in LevelDOWN by only changing the JS.
I keep on running in to these kinds of bugs when writing code against V8, you either fail to create a Persistent for the required life of an object or you release one too early and the underlying data gets cleaned up before it's used again; usually by a third-party library that you're interacting with.

ronkorving pushed a commit that referenced this pull request Jan 16, 2014
change test/run.js to report failing script name
@ronkorving ronkorving merged commit 66dfe55 into JustinTulloss:master Jan 16, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants