Koota (pronounced /ˈkoː.tɑ/, means “to assemble” in Finnish) generates words based on a pattern, similar to Awkwords.
It was created as an experiment to see if we could compile patterns down to bytecode that can be executed by a word generator virtual machine. It is possible, of course!
You may ask… why? Well, I dunno. ¯\_(ツ)_/¯
Add this line to your application’s Gemfile:
gem 'koota'
And then execute:
bundle
Or install it yourself as:
gem install koota
A pattern can be anything you want, but there are some special characters.
A pair of parentheses forms an optional block. So if you have hell(o)
, two
words can be generated: “hell” and “hello”. You can also have choices within
those: hell(a/e/i/o/u)
will generate “hell” then any English vowel. You
can always nest them: (h(e))ll(o)
can generate “ll”, “hll”, “hllo”, “hell”,
and “hello”. Note that “ello”, for example, can’t be generated, because (e)
is within (h(e))
so “h” will always have to be picked so that “e” itself can
be picked.
A pair of square brackets, like [...]
, does the same thing as parentheses
except they’re not optional.
Slashes defines choices to be picked at random by Koota, so a/b/c/d
is a
choice between a, b, c, and d. Note that the characters within the slashes can
be of any length you want. Note that you cannot put parentheses or square
brackets within slashes, like a/(b/c)/[d/e]
. That’s illegal; you should use
subpatterns for that.
If a single character corresponds to a subpattern, then it stands for that subpattern. If you have the following .koota file:
C = p/t/k
v = a/i/u
Cv
Then the Cv
there is as if it were [p/t/k][a/i/u]
. To bypass this, use
quotes: "Cv"
is taken as-is and the only generatable word is “Cv”. Anything
within quotes is taken as-is.
My recommendation is that you reserve uppercase characters for subpatterns, and only use lowercase characters for raw characters.
The syntax in ANTLR format:
grammar Koota;
pattern : group+ ;
group : '(' pattern ')'
| '[' pattern ']'
| choice
;
choice : atom ('/' atom)* ;
atom : ~[()[\]/"]* // i.e. anything except groups, choices, and quotes
| STRING
;
STRING : '"' .*? '"' ;
Koota ships with an executable to run your generationings handily. It takes a file as an input:
koota my-patterns.koota
The pattern file is simple:
# Comments with ye olde hash
# Each line is a 'name = pattern' association.
N = m/n
C = p/t/k/b/d/g/s/N # you can refer to patterns
V = a/e/i/o/u
# If a pattern doesn't have a name, it's the root pattern generated by Koota.
# All .koota files need to have this!
(C)V(N)
# Make sure the file is UTF-8 encoded, with no BOM, for the best results.
After that, you’ll get 100 fresh words right out of the generation oven, like this:
$ koota my-patterns.koota
pa ti ken son na hu ...
You can change the amount of words generated with --words
(or -w
):
$ koota --words=5 my-patterns.koota
pa ti ken son ha
# and it's over
With the above file, however, they’ll all be single-syllable words. You can change this with a command-line option:
koota -s 3 my-patterns.koota
# or
koota --syllables=3 my-patterns.koota
This will generate exactly 3 syllables per word. If you want to vary the amount of syllables per word, use this:
koota --syllables=1,3 my-patterns.koota
This will generate 1 to 3 syllables per word, randomly.
You can also automatically syllabificate with --syllable-separator
(or -r
):
koota -s 1,3 -r '.' my-patterns.koota
Which will generate words like ta.ka
, na.po.ke
, etc. By default it is empty,
which does away with syllabification.
Duplicate words are automatically pruned, so you may get less than 100 words.
To disable this behaviour, pass the --duplicates
(or -d
) command-line option.
Each word is separated by the separator given in the --word-separator
(or -p
)
option. The default is a space. To output each word in a new line, for example,
you could pass --word-separator="\n"
.
You can go mad and get the bytecode with --bytecode
. It’ll dump the bytecode
on standard output, so best redirect it with something like > file
if you
don’t want your console to freak out!
After that, you can run directly from bytecode, just by passing the resulting
file to koota
:
koota my-patterns.koolla
Same thing happens, except you use the bytecode directly. Why would you want to do this? No idea.
To seek more help, use --help
(or -h
).
You can also run this as a library inside some Ruby code, of course. Use
Koota::Pattern
objects to compile patterns given their references:
require 'koota'
nasals = Koota::Pattern.new('m/n')
vowels = Koota::Pattern.new('a/e/i/o/u')
consonants = Koota::Pattern.new('p/t/k/b/d/g/s/N', N: nasals) # N reference, must pass!
Then, use a Koota::Generator
, passing in the root pattern to #call
:
root = Koota::Pattern.new('(C)V(N)', C: consonants, V: vowels, N: nasals)
generator = Koota::Generator.new
generator.call(root) # returns an Array<String> containing the generated words
You can pass many of the same command-line options to Koota::Generator#call
:
generator.call(
root,
# Option: Default
words: 100, # Integer only
syllables: 1, # Integer or Range of Integer
syllable_separator: '', # String only
duplicates: false # Boolean only
)
And you can get the bytecode for a generator with #bytecode
:
generator.bytecode # returns an array of 8-bit integers
For more info, see the documentation.
See VM.md for info on the virtual machine.
After checking out the repo, run bundle
to install dependencies. Then, run
bundle exec rake
to run rubocop followed by the tests, or just bundle exec rake spec
for just the tests. You can also run ruby bin/console
for an interactive prompt
that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install
. To
release a new version, update the version number in version.rb
, and then run
bundle exec rake release
, which will create a git tag for the version, push
git commits and tags, and push the .gem
file to
rubygems.org.
Bug reports and pull requests are welcome on GitHub at https://github.com/unleashy/koota.
The gem is available as open source under the terms of the MIT License.