Skip to content
This repository has been archived by the owner on Nov 14, 2020. It is now read-only.

read performance #21

Open
alandipert opened this issue Nov 26, 2013 · 4 comments
Open

read performance #21

alandipert opened this issue Nov 26, 2013 · 4 comments
Milestone

Comments

@alandipert
Copy link
Owner

read is currently very slow, and likely the slowest part of program loading. Currently, reading core.gk results in 100+ subshells - probably because we subshell in readline and perform various tricks in order not to lose whitespace.

Any boost in read performance while retaining the ability to read whitespace characters is killer.

@abrooks
Copy link
Collaborator

abrooks commented Nov 28, 2013

How dow you feel about a dependency on awk? Perl?

@joelmccracken
Copy link

Awk is a posix standard.

Sent from my iPhone

On Nov 28, 2013, at 5:06 PM, Aaron Brooks [email protected] wrote:

How dow you feel about a dependency on awk? Perl?


Reply to this email directly or view it on GitHub.

@quoll
Copy link
Collaborator

quoll commented Nov 29, 2013

Perl may not always be available. One of the reasons for Gherkin is for those environments where you don't know what you may have (Bash 4 is a bit of an ask, but ultimately every new system should have it by default).

Gherkin is already based on AwkLisp. It seems to me that if we're happy being dependent on Awk, then we would be better served by simply extending that project (which would be MUCH easier to do).

The readLine function is currently calling out to external programs (grep, tail, tr), which is NOT what we want... and it makes the reader very slow. However, no one has quite figured out how to avoid this yet (one attempt was close, but couldn't deal with * characters, IIRC). Other than that one line, I don't think we rely on external programs anywhere.

alandipert added a commit that referenced this issue Dec 6, 2013
- implemented @kanaka's idea to map files to strings when loading files
- removed unnecessary skip_blanks calls
- multiline strings unsupported
@alandipert
Copy link
Owner Author

@kanaka has made incredible progress in his experimentation with tokenization instead of character pushback. While our "string-mapped" file reader is about 40% faster than the interactive read-based one, Joel's current approach could bring us another 2x and also scales linearly with input. For latest progress, see his tokenizer.sh

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants