Deprecated. Use wz instead!
cw (count words) is a faster alternative to classic GNU wc, written on pure
Rust. It provides the same tools as wc, but with a more friendly interface
and multiple encoding support. cw also provides its core
functionality as a library called libcw
that can target any arch with no
platform-specific code. The Rust compiler leverages great performance with
stupidly simple source code
cw differentiates itself from other wc clones by providing great defaults. cw will always count characters using the provided encoding, and thus, always providing the right count. Other word counters will provide, for example, wrong max line length on UTF-8 encoded text
To learn more about this project, visit it's GitHub repo
Because cw is written entirely on Rust, is as simple as using cargo
. If you
already have installed
cargo
on
your system, run the following from the commandline:
# Ensure rust's toolchain is up-to-date
rustup update stable
git clone https://github.com/Altair-Bueno/cw.git
cd cw
cargo install --locked --path crates/cw
Warning: This will install cw on
$HOME/.cargo/bin
. Ensure this location is on your shell's$PATH
variable by runningecho $PATH | grep '.cargo/bin'
Shell completions for Zsh, Bash, Fish, Elvish and PowerShell can be found under
target/release/build/cw-*/out
# zsh shell
cp target/release/build/cw-*/out/completions/zsh/* /usr/local/share/zsh/site-functions
# Fish
cp target/release/build/cw-*/put/completions/fish/* /usr/local/share/fish/completions
- Download the artifact that matches your OS and architecture from the releases page
- Unzip the archive
- Move the binary to the desired destination folder. Make sure that your
shell's
PATH
includes said folder
The same functionality you'll expect from GNU wc, but with some extras. To see
the full list of options, type cw -h
or cw --help
:
cw uses the high-performant library tokio
for IO
concurrency. This allows cw
to parse a file while the operating system is
loading another one.
You can use the --multithread
flag to force the multithread runtime flavour
from tokio. This is useful when you want cw
to use all CPU cores for heavy
workloads
Bonus:
alias cm='cw --multithread'
for count multithread
By default, cw will search for UTF-8 encoded text, with LF (U+000A
) line
breaks. Note that this crate does not validate any input. It asumes it's
encoded correctly, although invalid encoded input is safely managed
To use any of these features, add them to the --features "..."
list. For
example:
cargo install --git https://github.com/Altair-Bueno/cw.git --features "mimalloc"
mimalloc
: Uses mimalloc instead
See BENCH.md
- Full Unicode support (eg process Z҉͈͓͈͎a̘͈̠̭l̨̯g̶̬͇̭o̝̹̗͎̙ ͟t͖̙̟̹͇̥̝͡e̥͘x͚̺̭̻͘t͉͔̩̲̘ correctly)
- UTF-16 encoding
- Auto-detect file encoding
- Make cw faster