Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cut FS activity down dramatically #33

Closed
wants to merge 1 commit into from

Conversation

stefanpenner
Copy link
Contributor

  • better logging
  • use mkdir-batch to group mkdirs together and do the less overall work (don’t attempt to re-create the same dirs repeatedly)
  • rather then writing to output and then doing a copyDereferenceSync to the cache, we just write to cache and link to output
  • fix windows
  • tests

fixes #32

@stefanpenner
Copy link
Contributor Author

When operating on large dir trees, this cuts rebuild time down dramatically. In some of my tests (against ember.js) it cuts them down by 40% or 50%.

We can likely restore the lack of needing mkdir in the cache, but that now barely costs us anything. The big win is priming the output dirs together.

Previously when building 1035 files, we would make 1035 calls to mkdirp, which would recursively call mkdir once per segment in a path. Now it only results in 135 calls to mkdir.

so if we had foo/bar/baz/quz mkdir would be invoked 3 times.

Commonly, we are actually creating.

path/to/app/tmp/thing/x/y/a/file.js' path/to/app/tmp/thing/x/y/b/file.js'
path/to/app/tmp/thing/x/y/c/file.js' path/to/app/tmp/thing/x/y/d/file.js'

etc.

before that would be atleast 8 internal calls to mkdir + lstat per entry. ^^ would be at least 24 mkdir.

Now we attempt to create the dirs together, which allows us to skip over repeated work. The above example reduces to 11 mkdir.

This bulk mkdir really scales nicely as project trees grow large, as large chunks of the dir structure is commonly shared.

* better logging
* use mkdir-batch to group mkdirs together and do the less overall work (don’t attempt to re-create the same dirs repeatedly)
* rather then writing to output and then doing a copyDereferenceSync to the cache, we just write to cache and link to output
@rwjblue
Copy link
Member

rwjblue commented Aug 24, 2015

Very excited about this!

return self.outputPath + '/' + p;
}));

if (this._cacheDirsPrimed === undefined) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused why we're checking this at all, but we're setting this to false (not undefined) below, so this would only seem to run once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this would only seem to run once.

Ya the goal is for it to only run exactly once.

The variable name was written in haste, and pattern was written in haste.

@joliss
Copy link
Member

joliss commented Aug 24, 2015

You said on slack that some of the code is still work in progress, but 👍 on what this is doing in general.

@stefanpenner
Copy link
Contributor Author

Ya. I'll likely circle back soon and clean this up. It was more of an exploration of what is possible, and the results where good.

Also indicates a better patch variant would be dramatically faster (but also more work). Chances are I'll massage this one further. Release it then later come back to the patch approach we discussed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

reduce number of disk accesses
3 participants