Word Frequency Counter

Several C programs to process text documents. The programs will count the frequency of every distinct word in a text document and print out the top 3 most frequent words for each provided text document. User can input more than one file at a time to be processed.

rvw.c does word counting sequentially in the main process, printing results one after the other for each data set.
process.c does word counting concurrently using multiple processes, creating a new child process per data set.
thread.c does word counting concurrently using multiple threads, creating one new pthread per data set.

Build

Compile with:

$ gcc rvw.c

$ gcc process.c

$ gcc thread.c -lpthread

Usage

Run with:

$ ./a.out 'filename1' 'filename2' ...

Notes

dataset1.txt dataset2.txt dataset3.txt are given for your reference. To check accuracy, you may use the following Unix commands:

To show how many distinct words in file ‘dataset1.txt’:

$ cat dataset1.txt | tr -s ' \t' '\n\n' | tr -d '\t' | grep -v '^$'| sort | uniq -c  | wc -l

To show the top 3 most frequent words in ‘dataset1.txt’ and their frequencies:

$ cat dataset1.txt | tr -s ' \t' '\n\n' | sort | uniq -c | sort -r -n -k1 | head -n 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word Frequency Counter

Build

Usage

Notes

License

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataset1.txt		dataset1.txt
dataset2.txt		dataset2.txt
dataset3.txt		dataset3.txt
process.c		process.c
rvw.c		rvw.c
thread.c		thread.c

License

sarracini/Word-Frequency-Counter

Folders and files

Latest commit

History

Repository files navigation

Word Frequency Counter

Build

Usage

Notes

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages