read performance #21

alandipert · 2013-11-26T01:22:27Z

read is currently very slow, and likely the slowest part of program loading. Currently, reading core.gk results in 100+ subshells - probably because we subshell in readline and perform various tricks in order not to lose whitespace.

Any boost in read performance while retaining the ability to read whitespace characters is killer.

The text was updated successfully, but these errors were encountered:

abrooks · 2013-11-28T22:06:35Z

How dow you feel about a dependency on awk? Perl?

joelmccracken · 2013-11-28T23:46:44Z

Awk is a posix standard.

Sent from my iPhone

On Nov 28, 2013, at 5:06 PM, Aaron Brooks [email protected] wrote:

How dow you feel about a dependency on awk? Perl?

—
Reply to this email directly or view it on GitHub.

quoll · 2013-11-29T02:00:05Z

Perl may not always be available. One of the reasons for Gherkin is for those environments where you don't know what you may have (Bash 4 is a bit of an ask, but ultimately every new system should have it by default).

Gherkin is already based on AwkLisp. It seems to me that if we're happy being dependent on Awk, then we would be better served by simply extending that project (which would be MUCH easier to do).

The readLine function is currently calling out to external programs (grep, tail, tr), which is NOT what we want... and it makes the reader very slow. However, no one has quite figured out how to avoid this yet (one attempt was close, but couldn't deal with * characters, IIRC). Other than that one line, I don't think we rely on external programs anywhere.

@kanaka

- implemented @kanaka's idea to map files to strings when loading files - removed unnecessary skip_blanks calls - multiline strings unsupported

alandipert · 2013-12-06T21:27:12Z

@kanaka has made incredible progress in his experimentation with tokenization instead of character pushback. While our "string-mapped" file reader is about 40% faster than the interactive read-based one, Joel's current approach could bring us another 2x and also scales linearly with input. For latest progress, see his tokenizer.sh

alandipert added a commit that referenced this issue Dec 6, 2013

improve eval_file performance toward #21

3949421

- implemented @kanaka's idea to map files to strings when loading files - removed unnecessary skip_blanks calls - multiline strings unsupported

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read performance #21

read performance #21

alandipert commented Nov 26, 2013

abrooks commented Nov 28, 2013

joelmccracken commented Nov 28, 2013

quoll commented Nov 29, 2013

alandipert commented Dec 6, 2013

read performance #21

read performance #21

Comments

alandipert commented Nov 26, 2013

abrooks commented Nov 28, 2013

joelmccracken commented Nov 28, 2013

quoll commented Nov 29, 2013

alandipert commented Dec 6, 2013