Replies: 2 comments 7 replies
-
I strongly disagree with this analysis and the proposed solutions for the following reasons:
I would strongly suggest NOT to implement the mechanisms suggested here. |
Beta Was this translation helpful? Give feedback.
0 replies
-
We are to deal with data and evidence, this being the inference, I see no qualms in taking this approach. |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
With the nightly runs showing us the time being consumed for every test case run, we can see that certain test cases are taking around 20 seconds to complete. If one were to analyze the operations being performed in the said test case, it is..
So this begs the question as to where the time is being used???
On a finer observation, it is found that around 9 seconds is the time being consumed for operations 4-5. Now this is in case of a test case wherein we don't do anything other than steps 1-6. Imagine having performed certain operations after step 3 ( logic of a test case scenario ), the time would anyway be 20+delta wherein the time consumed by the intermediate steps is delta.
In the current design, the non-disruptive tests are those test cases which can run concurrently with other test cases, or in other terms they don't affect the other test cases in a negative manner that we see some behavior which prevents us from analyzing the real test case scenario.
Now, these non-disruptive tests cases even though they consume around 20 seconds ( for example ) to run then supposing we run 100 test cases each with 4 volume types and all of them being non-disruptive, that'd mean the total time taken if run sequentially would be around =
20*4*100 = 8000 seconds
.In case of redant, as they fall under non-disruptive tests category, assuming we have 4 worker threads, the time taken would be around =
(20*4*100)/4 = 2000 seconds
.Sounds good right ?
Yes, it does. But our intention is not to be complacent after a certain reduction. The next step is to see where the time is being spent. That's where we find that it takes around 9 seconds per test case to unmount a volume, stop a volume and then delete it. Assuming it takes around 4 seconds to do the steps related to the creation, start and mount, that'd mean the effective runtime of the test would be 20-9-4 = 7 seconds.
If we were to just run the test cases for their effective time, the total time taken for the 100 cases would be =
(7*4*100)/4 = 700 seconds
!That is a reduction of around 91%
The question now is how do we achieve this reduction...now what is that one set of action which every test case performs and consumes the big chunk of time...setting up and tear-down. Its as simple as that.
Also provided the fact that it is a testing framework which is testing a filesystem, would the actual user be creating volumes in isolation ? or re-create a new volume for every new operation ? NO. The same ole volume is re-used and that is the solution.
The current flow treats all non-disruptive test cases as same. Letting them to do create and also destroy things ( in a manner that it isn't disruptive to the operation of other tests running parallel to it ). This implies that the worker thread can pick up a job ( test case run ) of any volume type. If we go with the same flow and create volumes before the non-disruptive tests are even started and using them to run these tests, we can get somewhat near to the desired flow but with a problem. Contention. If our workers were to pick two test cases of same volume type, they'd be in contention for that one volume we had created. So do we create an extra volume in such scenario ? ( That would be just the same old behavior )
But this is not a clean solution. We cannot just put in more resources just because the framework is not able to handle the existing requirements of the tests.
The proposed solution is to hold a worker responsible for a volume type and each worker accessing a queue specific to non-disruptive test of the volume type for which worker is responsible. In this manner, we can control the behavior of the worker thread and also prevent two tests of same volume having a contention for the existing volume.
But isn't the volume create sliding a little back from the responsibility of the test case to that of the framework ? Yes, it is. And that is the way we can obtain both
Won't this reduce the agnostic nature of the test framework ? Yes it does, but we need to understand what we are trying to achieve here. The test framework is not the end, it is a means to the end. Without gluster there is no redant ( at-least for now. Let's not kid us into taking this into a bigger stage at this point of time, we need to pass the litmus test before going ahead and de-coupling things ). So, if the test framework is lifting some extra weight and we can get an improvement of 91 % because of that, I'm sold.
Will this cover all tests ? Not in all cases. For instance when a test case demands no volume, that gets a free pass from all this hassle. Also, the test cases which are working on creating a good amount of volumes won't be falling under this. And finally, disruptive test cases cannot come under this category or flow.
NOTE: The calculations done above are just rough figures, actual number will vary.
Beta Was this translation helpful? Give feedback.
All reactions