Skip to content
This repository has been archived by the owner on Aug 20, 2018. It is now read-only.

ensure /tcktestdir created for FileConnection tests #1862

Merged

Conversation

mykmelez
Copy link
Contributor

I think the TCK FileConnection tests fail sometimes because there's a race between two fs.init promise handlers: first, in fs-init.js, which mkdirs /tcktestdir after fs.init; and second, in fstests.js, which calls fs.clear after fs.init.

When fstests.js clears the database before fs-init.js creates /tcktestdir, then that directory exists, and the FileConnection tests pass. But when fstests.js clears the database after fs-init.js creates /tcktestdir, then that directory gets deleted after being created, so it doesn't exist when the FileConnection tests are run, and they fail.

This branch explicitly creates /tcktestdir if we're running the FileConnection tests. It also explicitly deletes the database before running those tests, so it should lead to more consistent success (or failure!) of that test suite.

@mykmelez
Copy link
Contributor Author

The bad news is that the FileConnection tests are still failing on this branch. The good news is that they're failing consistently, at least on Travis (they pass locally). So it should be a bit easier to figure out why.

@mykmelez
Copy link
Contributor Author

I figured out how to get them to fail locally (blow away the database before running the tests), and this is actually a regression from #1201, which made the VM start running before the filesystem is initialized.

When loading a midlet, that's ok, because the VM itself (and apparently the MIDletSuiteLoader) don't need the fs, and #1201 ensures that the fs will be inited in MIDletSuiteUtils.vmBeginStartUp before MIDletSuiteLoader starts the midlet.

But the FileConnection tests are run by com.ibm.tck.client.TestRunner, which is a main program (specified by the main config parameter, not the midletClassName parameter). And the VM doesn't wait for the fs to be inited before starting it. Which means it can race fs initialization (causing test failures).

The simplest (and most robust) fix is to revert the fs part of #1201 and make VM startup wait on fs initialization. Presumably that would regress startup perf, but I don't see any numbers in #1201, and when I run the startup benchmark for WA on desktop, it shows that doing so actually improves perf:

Test Baseline Mean Mean +/- % P Min Max
startupTime 1,015ms 979ms -36ms -3.57 BETTER 944ms 1,042ms
vmStartupTime 200ms 208ms 9ms 4.28 WORSE 183ms 243ms
bgStartupTime 49ms 48ms -1ms -1.50 BETTER 47ms 50ms
fgStartupTime 683ms 631ms -52ms -7.65 BETTER 608ms 684ms
fgRefreshStartupTime 83ms 91ms 8ms 9.91 WORSE 66ms 111ms
fgRestartTime n/a NaNms n/a n/a n/a Infinityms -Infinityms
totalSize 30,376kb 29,608kb -767kb -2.53 BETTER 28,413kb 30,292kb
domSize 167kb 167kb 0kb 0.00 SAME 167kb 167kb
styleSize 395kb 395kb 0kb -0.03 SAME 393kb 396kb
jsObjectsSize 19,849kb 19,757kb -93kb -0.47 BETTER 19,713kb 19,770kb
jsStringsSize 1,182kb 655kb -527kb -44.59 BETTER 654kb 655kb
jsOtherSize 8,555kb 8,408kb -148kb -1.73 SAME 7,224kb 9,081kb
otherSize 227kb 227kb 0kb 0.09 SAME 225kb 230kb
USS n/a NaNkb n/a n/a n/a Infinitykb -Infinitykb
peakRSS 363,775kb 354,459kb -9,317kb -2.56 BETTER 310,316kb 365,060kb

Ditto on a Flame (albeit insignificantly):

Test Baseline Mean Mean +/- % P Min Max
startupTime 16,535ms 16,208ms -327ms -1.98 SAME 15,325ms 18,323ms
vmStartupTime 2,790ms 2,777ms -13ms -0.48 SAME 2,408ms 3,044ms
bgStartupTime 896ms 884ms -12ms -1.36 SAME 695ms 1,169ms
fgStartupTime 11,680ms 11,487ms -193ms -1.65 SAME 10,525ms 13,751ms
fgRefreshStartupTime 1,169ms 1,061ms -108ms -9.24 SAME 686ms 2,199ms
fgRestartTime n/a NaNms n/a n/a n/a Infinityms -Infinityms
totalSize 22,959kb 22,840kb -119kb -0.52 SAME 21,970kb 23,690kb
domSize 74kb 74kb 0kb -0.03 SAME 74kb 74kb
styleSize 310kb 310kb -1kb -0.19 SAME 302kb 319kb
jsObjectsSize 16,079kb 16,105kb 26kb 0.16 SAME 15,687kb 16,570kb
jsStringsSize 1,327kb 1,212kb -115kb -8.64 BETTER 1,075kb 1,236kb
jsOtherSize 4,898kb 4,872kb -26kb -0.53 SAME 4,156kb 5,531kb
otherSize 270kb 267kb -4kb -1.40 SAME 204kb 354kb
USS 52,633kb 53,414kb 782kb 1.49 WORSE 51,020kb 56,756kb
peakRSS 0kb NaNkb NaNkb NaN SAME NaNkb NaNkb

So I think the simple, robust fix is the best one here.

@marco-c
Copy link
Contributor

marco-c commented Aug 27, 2015

The simplest (and most robust) fix is to revert the fs part of #1201 and make VM startup wait on fs initialization.

This probably depends on the order in which we perform initialization steps.
Forcing the FS to be loaded before starting the VM could not be a regression in our current situation, but could be a regression if we slightly modify the order of the initialization steps.
Another thing that comes to mind is that FS initialization is likely to be slower when there's more data in the IndexedDB store, so the results could be different if you use WA for a while (receive messages, send/receive media files, etc.).

when I run the startup benchmark for WA on desktop, it shows that doing so actually improves perf:

This is definitely weird, how can this change improve startup perf? It could leave it unchanged (if the other promises in loadingPromises are resolved after the FS is initialized), but I don't see how it could improve perf!
Another weird thing is that on desktop it's improving fgStartupTime, which is totally unaffected by this.

So, I'd keep the optimization from #1201 and make TestRunner a MIDlet.

@mykmelez
Copy link
Contributor Author

Another thing that comes to mind is that FS initialization is likely to be slower when there's more data in the IndexedDB store, so the results could be different if you use WA for a while (receive messages, send/receive media files, etc.).

The number of files could affect fs initialization, but I'm not sure it's significant, given that we use IDBObjectStore.getAll to get their metadata, and it is presumably designed to be efficient at retrieving all records, such that the majority of its cost is unrelated to the number of records it retrieves (which is why it's so much faster than retrieving all records via a standard request).

This is definitely weird, how can this change improve startup perf? It could leave it unchanged (if the other promises in loadingPromises are resolved after the FS is initialized), but I don't see how it could improve perf!

One possibility is that there's a hidden cost to context switching between fs initialization and other work we do during VM startup (like Ion compilation). It's also possible that my results on desktop were faulty.

So, I'd keep the optimization from #1201 and make TestRunner a MIDlet.

That would solve the problem for TestRunner, but other Java programs would still be susceptible to the underlying bug. I'd prefer a more robust solution.

@marco-c
Copy link
Contributor

marco-c commented Aug 27, 2015

The number of files could affect fs initialization, but I'm not sure it's significant, given that we use IDBObjectStore.getAll to get their metadata, and it is presumably designed to be efficient at retrieving all records, such that the majority of its cost is unrelated to the number of records it retrieves (which is why it's so much faster than retrieving all records via a standard request).

Right, I always forget that we're only loading the metadata.

One possibility is that there's a hidden cost to context switching between fs initialization and other work we do during VM startup (like Ion compilation). It's also possible that my results on desktop were faulty.

I'll run the startup benchmark again on my machine as well.

That would solve the problem for TestRunner, but other Java programs would still be susceptible to the underlying bug. I'd prefer a more robust solution.

OK. We could wait for the FS to be initialized somewhere else (at a point in common between normal Java programs and MIDlets), but let's do that only if the benchmark shows there's a regression.

@mykmelez mykmelez mentioned this pull request Aug 30, 2015
@mykmelez mykmelez closed this Aug 31, 2015
@mykmelez mykmelez deleted the intermittent-file-connection-failure branch August 31, 2015 16:49
@mykmelez mykmelez restored the intermittent-file-connection-failure branch August 31, 2015 16:49
@marco-c
Copy link
Contributor

marco-c commented Sep 2, 2015

@mykmelez why did you close this PR?

@mykmelez
Copy link
Contributor Author

mykmelez commented Sep 2, 2015

@mykmelez why did you close this PR?

Hmm, I didn't intend to. I was doing some branch cleanup and accidentally deleted the branch for which I was requesting the pull, hence these messages about my deleting and restoring the branch:

mykmelez deleted the mykmelez:intermittent-file-connection-failure branch 2 days ago
@mykmelez mykmelez restored the mykmelez:intermittent-file-connection-failure branch 2 days ago

I guess @github autotragically closed the PR when I deleted the branch and didn't reopen it when I restored it (and didn't tell me what it was doing/not doing in either case).

@mykmelez mykmelez reopened this Sep 2, 2015
@marco-c
Copy link
Contributor

marco-c commented Sep 6, 2015

This is my result. Baseline is without the optimization (with your patch).
I'm testing startup to the contact list, my exported FS is 55 MB.

Config Value
User Agent Mozilla/5.0 (X11; Linux x86_64; rv:42.0) Gecko/20100101 Firefox/42.0
Rounds 30
Delay(ms) 5000
Delete FS no
Delete JIT CACHE no
Warm startup no
Test Baseline Mean Mean +/- % P Min Max
startupTime 2,066ms 2,015ms -51ms -2.47 BETTER 1,973ms 2,100ms
vmStartupTime 294ms 283ms -12ms -3.94 BETTER 272ms 319ms
bgStartupTime 69ms 69ms 0ms -0.02 SAME 65ms 83ms
fgStartupTime 1,380ms 1,351ms -29ms -2.13 BETTER 1,313ms 1,415ms
fgRefreshStartupTime 323ms 313ms -10ms -3.14 SAME 292ms 337ms
fgRestartTime n/a NaNms n/a n/a n/a Infinityms -Infinityms
totalSize 40,021kb 46,097kb 6,077kb 15.18 WORSE 40,893kb 50,693kb
domSize 136kb 135kb 0kb -0.28 SAME 134kb 140kb
styleSize 425kb 421kb -3kb -0.71 SAME 412kb 461kb
jsObjectsSize 24,122kb 27,283kb 3,161kb 13.11 WORSE 25,672kb 28,937kb
jsStringsSize 1,447kb 969kb -478kb -33.02 BETTER 950kb 1,042kb
jsOtherSize 13,636kb 17,034kb 3,398kb 24.92 WORSE 13,426kb 20,004kb
otherSize 256kb 255kb -1kb -0.47 SAME 250kb 279kb
USS 117,660kb 131,675kb 14,015kb 11.91 WORSE 123,024kb 141,056kb
peakRSS 506,280kb 506,280kb 0kb 0.00 SAME 506,280kb 506,280kb

@mykmelez
Copy link
Contributor Author

mykmelez commented Sep 7, 2015

I reran the tests on desktop and saw no difference this time (baseline is master, benchmark is this branch):

Config Value
User Agent Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:43.0) Gecko/20100101 Firefox/43.0
Rounds 30
Delay(ms) 5000
Delete FS no
Delete JIT CACHE no
Warm startup no
Test Baseline Mean Mean +/- % P Min Max
startupTime 1,143ms 1,140ms -3ms -0.26 SAME 1,097ms 1,265ms
vmStartupTime 263ms 260ms -3ms -1.25 SAME 223ms 271ms
bgStartupTime 59ms 58ms -2ms -2.54 BETTER 57ms 60ms
fgStartupTime 743ms 742ms 0ms -0.03 SAME 699ms 868ms
fgRefreshStartupTime 78ms 80ms 2ms 2.58 SAME 73ms 91ms
fgRestartTime n/a NaNms n/a n/a n/a Infinityms -Infinityms
totalSize 29,207kb 29,214kb 7kb 0.02 SAME 28,072kb 29,792kb
domSize 171kb 171kb 0kb -0.00 SAME 171kb 171kb
styleSize 386kb 385kb 0kb -0.12 BETTER 385kb 387kb
jsObjectsSize 19,425kb 19,431kb 6kb 0.03 WORSE 19,413kb 19,438kb
jsStringsSize 653kb 653kb 0kb 0.04 SAME 652kb 656kb
jsOtherSize 8,353kb 8,353kb 0kb 0.00 SAME 7,229kb 8,928kb
otherSize 220kb 221kb 1kb 0.42 WORSE 219kb 222kb
USS n/a NaNkb n/a n/a n/a Infinitykb -Infinitykb
peakRSS 343,555kb 359,052kb 15,497kb 4.51 WORSE 324,928kb 361,900kb

But I'm testing startup to the Terms screen, and my exported FS is only 4.9MB, so it's possible that a difference would appear with a larger FS or on startup to a different screen of the midlet.

Nevertheless, I still don't like the optimization, since it can break midlets and other Java programs in ways that are hard to diagnose. I'd prefer correctness in this case, even if it means taking a perf hit. But I'm open to alternatives, like initializing the FS at some common point.

@marco-c Any idea what that common point might be?

@marco-c
Copy link
Contributor

marco-c commented Sep 7, 2015

Given that the regression is small and is only present in some cases, I think we can merge this without waiting for the alternative optimization (I've filed #1871 to investigate it).

marco-c pushed a commit that referenced this pull request Sep 7, 2015
…ailure

ensure /tcktestdir created for FileConnection tests
@marco-c marco-c merged commit 1697dd6 into mozilla:master Sep 7, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants