Skip to content

Commit

Permalink
blog post #2
Browse files Browse the repository at this point in the history
  • Loading branch information
lwcarani committed Apr 8, 2024
1 parent d9bc54e commit 98b81eb
Showing 1 changed file with 378 additions and 0 deletions.
378 changes: 378 additions & 0 deletions _posts/2023-10-30-wc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,378 @@
---
title: Coding challenge 1 - wc
author: luke
date: 2023-10-30 12:00:00 +0500
categories: [Software Engineering]
tags: [coding projects]
---

Recently, I discovered my passion for software engineering, and have decided to pursue a career as a software engineer (SWE) when I complete my military service. As an aspiring SWE, I'm always looking for new ways to grow my knowledge and skills. I've discovered that I learn most by building real software, but it's often hard for me to think of well-scoped projects.

Enter [John Crickett's](https://www.linkedin.com/in/johncrickett/) [coding challenges](https://codingchallenges.substack.com/)! He has posted dozens (and counting) of well-scoped projects that can usually be completed in ~8 hours. I hope to complete several of them over the next few months (years), starting with `wc`. In this post, I'll walk through my implementation of the unix/linux command-line tool `wc`.

## About
`wc` stands for word count. `py-wc` is my version of the Linux-style command-line tool named `wc`, implemented in Python (hence the 'py' in `py-wc`). As the name implies, its only use is for counting the number of lines, words, bytes, or characters in the files or directories specified in the input arguments.

I decided to code this up in Python since that's the language I'm most familiar with. This allowed me to focus more on the workflow of `wc` and less on the syntax of the language I was working with.

## Instructions
To use this as a command-line tool, I recommend adding the finished script to PATH / system variables. For Windows, create a folder named `Aliases` in your C drive: `C:/Aliases`, and then add this folder to PATH. Next, create a batch file that will execute when you call the specified alias. For example, on my machine, I have a batch file named `wc.bat` located at `C:/Aliases`, that contains the following script:

```bat
@echo off
echo.
python C:\...\GitHub\py-wc\main.py %*
```

So now, when I type `wc` in the command prompt, this batch file will execute, which in turn, runs the `py-wc` Python script.

## Examples

`py-wc` allows you to execute typical Linux-style `wc` commands.

Here we see the line count for a single file:

```cmd
C:\> wc test.txt -l
7145 test.txt
7145 total
lines
```

Byte count:

```cmd
C:\> wc test.txt -c
342185 test.txt
342185 total
bytes
```

Character count:

```cmd
C:\> wc test.txt -m
339289 test.txt
339289 total
chars
```

And word count:

```cmd
C:\> wc test.txt -w
58164 test.txt
58164 total
words
```

You can also mix and match flags:

```cmd
C:\> wc test.txt -w -l
7145 58164 test.txt
7145 58164 total
lines words
C:\> wc test.txt -w -l -c -m
7145 58164 339289 342185 test.txt
7145 58164 339289 342185 total
lines words chars bytes
```

And the order in which you pass the flags does not matter:

```cmd
C:\> wc -w -l test.txt
7145 58164 test.txt
7145 58164 total
lines words
C:\> wc -w -l -c -m test.txt
7145 58164 339289 342185 test.txt
7145 58164 339289 342185 total
lines words chars bytes
```

If you don't pass any flags, you get lines, words, and bytes by default:

```cmd
C:\> wc test.txt
7145 58164 342185 test.txt
7145 58164 342185 total
lines words bytes
```

You can also pass in more than one file:

```cmd
C:\> wc test.txt test2.txt
7145 58164 342185 test.txt
26 136 814 test2.txt
7171 58300 342999 total
lines words bytes
```

Or, you can pass in a directory:

```cmd
C:\> wc test_text
7145 58164 342185 test_text\test.txt
26 136 814 test_text\test2.txt
7171 58300 342999 total
lines words bytes
```

Or multiple directories:

```cmd
C:\> wc test_text test_text2
44 121 1453 test_text\graph.py
21 47 568 test_text\node.py
7145 58164 342185 test_text\test.txt
26 136 814 test_text\test2.txt
7145 58164 342185 test_text2\test.txt
26 136 814 test_text2\test2.txt
14407 116768 688019 total
lines words bytes
```

Finally, you can specify file extensions to ignore:

```cmd
C:\> wc test_text test_text2 -i .py
7145 58164 342185 test_text\test.txt
26 136 814 test_text\test2.txt
7145 58164 342185 test_text2\test.txt
26 136 814 test_text2\test2.txt
14342 116600 685998 total
lines words bytes
```

And you can even specify directory names to ignore! First we call `wc` on a python project, without ignoring any file extensions or directories:

```cmd
C:\> wc matching-algorithms -l
6 matching-algorithms\.gitignore
22 matching-algorithms\README.md
2 matching-algorithms\.git\COMMIT_EDITMSG
16 matching-algorithms\.git\config
1 matching-algorithms\.git\description
2 matching-algorithms\.git\FETCH_HEAD
2 matching-algorithms\.git\HEAD
7 matching-algorithms\.git\index
2 matching-algorithms\.git\ORIG_HEAD
16 matching-algorithms\.git\hooks\applypatch-msg.sample
25 matching-algorithms\.git\hooks\commit-msg.sample
175 matching-algorithms\.git\hooks\fsmonitor-watchman.sample
9 matching-algorithms\.git\hooks\post-update.sample
15 matching-algorithms\.git\hooks\pre-applypatch.sample
50 matching-algorithms\.git\hooks\pre-commit.sample
14 matching-algorithms\.git\hooks\pre-merge-commit.sample
54 matching-algorithms\.git\hooks\pre-push.sample
170 matching-algorithms\.git\hooks\pre-rebase.sample
25 matching-algorithms\.git\hooks\pre-receive.sample
43 matching-algorithms\.git\hooks\prepare-commit-msg.sample
79 matching-algorithms\.git\hooks\push-to-checkout.sample
129 matching-algorithms\.git\hooks\update.sample
7 matching-algorithms\.git\info\exclude
11 matching-algorithms\.git\logs\HEAD
11 matching-algorithms\.git\logs\refs\heads\main
33 matching-algorithms\.git\logs\refs\remotes\origin\HEAD
11 matching-algorithms\.git\logs\refs\remotes\origin\main
12 matching-algorithms\.git\objects\00\0d21fcd4cb2ca6e59ef2b2002cb1048d541f27
4 matching-algorithms\.git\objects\01\6ca9c5ca0ff63978f9922ad01a1825c485a6bd
1 matching-algorithms\.git\objects\03\337b5c7dbec27bf6f6e6924006a0c119807769
6 matching-algorithms\.git\objects\03\efc0b7f527818bb540c6c5584ba612ecc1f594
2 matching-algorithms\.git\objects\04\caa8e53017b4ab5a660ca8fdb2583eb5362ac2
5 matching-algorithms\.git\objects\05\a357455eee5604cf4df366252cfe9c8f2f135e
1 matching-algorithms\.git\objects\09\2bf3e7900f85f5f61af91578883c0a27fbd979
8 matching-algorithms\.git\objects\09\34ea9b62bd31b4bfd8c4299f0c99ba8eb7d074
4 matching-algorithms\.git\objects\0a\66ffa21b808f14e472b8923cc06ee17b6b0e30
3 matching-algorithms\.git\objects\0b\96665845a9fc038ece44f13d56851fdb3e9913
3 matching-algorithms\.git\objects\0c\c5762af493d1dc5d0e7cc6afba4c9a8cd59e1a
4 matching-algorithms\.git\objects\0f\772273d50d95af0e8228ac1e3991802ca133af
2 matching-algorithms\.git\objects\10\5cb69c8126a9821a9b60eefa545493bea7af69
8 matching-algorithms\.git\objects\14\09ea0477d18d1265ba2198dec5aba43311f919
6 matching-algorithms\.git\objects\14\9e179ddcef5f250a0d457cbbc8a8598a7ea56d
2 matching-algorithms\.git\objects\16\4dee95843a1040ed4e5786ce0bdb0404fce3c7
1 matching-algorithms\.git\objects\17\9beaa4916f3066fd5a1d118fea0c3981ec0377
2 matching-algorithms\.git\objects\18\9f66776d49cd6466fa0ecc9ac242f298498c0e
1 matching-algorithms\.git\objects\1a\81cd8eb500d40599b5a6af6c3771c1c209b7e8
9 matching-algorithms\.git\objects\24\69195205a89376a0e264940fcd4cbf92e776d3
2 matching-algorithms\.git\objects\2a\c8234a99788f02864a4e52865561f6a5c0cf66
4 matching-algorithms\.git\objects\2c\095aa199c98877d853e6c7418e251f021d2858
6 matching-algorithms\.git\objects\2f\977032ce0b7c592dd6dfc1578daf90465712f6
2 matching-algorithms\.git\objects\30\311bdd6eefa848215aeb374a1e81f0422ba4dc
1 matching-algorithms\.git\objects\35\3f3fbb423644c20310875a4a9b0f320b8472c8
2 matching-algorithms\.git\objects\35\5cfe627e10c353a0622033ec854480aa45a6a5
8 matching-algorithms\.git\objects\36\aa7c1a8250066e1f72602624784c381c7cfe47
12 matching-algorithms\.git\objects\38\7f1ecdc33689b9832facbc79d5a7b350781e9c
4 matching-algorithms\.git\objects\45\289f3fb33ed737ba32477315d2d5dfec5916f5
2 matching-algorithms\.git\objects\46\670337f355e2ce85e2b86b55929a2c0003fce4
4 matching-algorithms\.git\objects\46\b61ec0bbdca71363c33e9a2984ae81aea49d65
9 matching-algorithms\.git\objects\48\2adbc27ccbd61a72a42437f051a7c554697c0e
2 matching-algorithms\.git\objects\49\6f2da837dfc087e1780e769d45a049f9526a39
3 matching-algorithms\.git\objects\49\c1a8aca332004b01ba3b5311dc5df76bbc4412
9 matching-algorithms\.git\objects\4d\a730235cf72ec132fbc58211db69aeef361ddd
10 matching-algorithms\.git\objects\4f\a4ecf0807c318bed1a5b187a094f61d56a8997
9 matching-algorithms\.git\objects\55\893097ffaae27e8966686c0eba5f978fc30f69
2 matching-algorithms\.git\objects\55\e4fafb340aedb627ae6a83f3a47f38d8fd17f7
3 matching-algorithms\.git\objects\57\020d7a093ad291bbb127936d7ba333ca0f417d
1 matching-algorithms\.git\objects\57\ba1a9eb61aabf8a9f355736bf37e4fc35e8a38
1 matching-algorithms\.git\objects\5b\6f10feb2f6dccf6b6646de20786e5984f76ae5
2 matching-algorithms\.git\objects\5c\9f6a1231a64a60cda335aab6b56655817eeea2
2 matching-algorithms\.git\objects\5d\0a4763935dad000434d62731b6aa01ebb130d1
2 matching-algorithms\.git\objects\60\3b2d4ea63da2a1abc5ad988d0778b41ab4ee01
4 matching-algorithms\.git\objects\62\d256746da9d28392b4ba52a7165feb2f8dbeeb
11 matching-algorithms\.git\objects\65\11e9b9a71e21bd93ebd7b6c56f204cad5bf151
2 matching-algorithms\.git\objects\66\9366557b06d1c037613a6e846931ebbba4fd63
1 matching-algorithms\.git\objects\6f\9509c88bed7080d496fc5e1d87a9315e30549d
2 matching-algorithms\.git\objects\75\7c54dc0b4bf7edb1f1cdd89b0f441482274998
4 matching-algorithms\.git\objects\76\cb89c01d795dd41ccbaa0d1080a5c16807f67c
2 matching-algorithms\.git\objects\7b\004c01d76cc06d5d8248ab31cc049cc19f5383
2 matching-algorithms\.git\objects\7d\24640b77c38b84b12d081f519e08238784d52a
2 matching-algorithms\.git\objects\7d\83f4a503dc10bae5d19a6711a5e05398bff429
2 matching-algorithms\.git\objects\7e\76c1ad1732110f153d3828af39dbc6e0352aed
1 matching-algorithms\.git\objects\7f\de9be408eaf61e6479123dd648631e587a69e1
2 matching-algorithms\.git\objects\83\18423da9a4f73461bcddc93203ef361722acd3
12 matching-algorithms\.git\objects\83\f6d95476afa6fb88f8dbe5dc94ec5534897384
3 matching-algorithms\.git\objects\85\29717a14738aa61b0d1fbb5392e9a32a95c838
2 matching-algorithms\.git\objects\89\3fe505d9af10f04abc4a4a7c8111a3eeffbd58
1 matching-algorithms\.git\objects\8d\a5ebfafbf3ef7a44680d3aa40833fbfc1336c3
3 matching-algorithms\.git\objects\8e\5dff36b31240309255cda561b41384c1532753
2 matching-algorithms\.git\objects\8f\87bb0cd9eb91644f865c9b34e739cfd2e52e89
2 matching-algorithms\.git\objects\91\b5f4e717c244518bebe7dc700ed0ea8bb53057
4 matching-algorithms\.git\objects\91\cd93c59e08145be520068c1b2149168859f86c
14 matching-algorithms\.git\objects\92\220a0b842ef847474a5577920aae801b5a9bb8
2 matching-algorithms\.git\objects\9d\fb9af432fd7ada9c839d2f1f648457d620e609
2 matching-algorithms\.git\objects\9e\32406d8f0b61943382fc09028b23a7eac6ce24
2 matching-algorithms\.git\objects\a4\7b497c678a3c9e080567170c22e917b856d154
1 matching-algorithms\.git\objects\a7\9d0521b0ead0b83df67faacc89d86709b43ad6
1 matching-algorithms\.git\objects\a7\b4cdd90a69728f1b5d0f44048ea17fe784b65b
17 matching-algorithms\.git\objects\a9\f0bf8781c8154d7c2b64c1bcfe450388bc48b5
1 matching-algorithms\.git\objects\ab\2d599d0ab5cc83aeca856b9a878c19176b6475
5 matching-algorithms\.git\objects\ae\7ece480dd34cefe0be8fc1468b5b9b40b53b5d
1 matching-algorithms\.git\objects\b2\faac4308bdb86ee184d638fac8ebd933af4bf2
4 matching-algorithms\.git\objects\b6\45f27b781af23c4cb6ee610e8fa0396bea123a
1 matching-algorithms\.git\objects\b8\0c246e7886ac724e9501b5eea46502d70931e1
1 matching-algorithms\.git\objects\bd\037fcb6bffac4ce63c5804fefe38c9a9783e3d
5 matching-algorithms\.git\objects\be\52369b3a6d051d87dd31762b2eac876a0da74e
1 matching-algorithms\.git\objects\bf\4bf806a309ad94e07c5d5aeb79e3213691de99
1 matching-algorithms\.git\objects\c7\a8e9372af0bfd72531a1a51d109c4fcf6bb67f
3 matching-algorithms\.git\objects\c8\6e2e430e1a990dc06cf4064f40f843d8ee19f8
13 matching-algorithms\.git\objects\c8\e7f121be82ca1b2d82a0b3c9c78cc9538d39e2
2 matching-algorithms\.git\objects\cc\33fad8fe23fe6a6358ec67800697a7edf04691
2 matching-algorithms\.git\objects\cd\620f8d2a5e25bdf63571ec62c6343f8d8a9db3
7 matching-algorithms\.git\objects\d1\ea52f806e033a972ba604ea3d84894c5743104
2 matching-algorithms\.git\objects\d7\6b6bef3a7b9e11fe3ef54b363b500d1f7dacaf
2 matching-algorithms\.git\objects\dd\9107788044beeac219f388ec407b7ee7963ef4
1 matching-algorithms\.git\objects\dd\d90a8241e81b1c2680932fd13ff66456aab8c8
2 matching-algorithms\.git\objects\de\35ae3f7705048f79c5c311c8909deecb910a6f
1 matching-algorithms\.git\objects\df\e0770424b2a19faf507a501ebfc23be8f54e7b
12 matching-algorithms\.git\objects\df\f061105e5f4c646404e74c10005642fe66f230
3 matching-algorithms\.git\objects\e0\eef57f311949b04717e887a623319ef7140d59
1 matching-algorithms\.git\objects\e1\2b850fa2dff1081315b74ac30580d3e7c7bf9f
3 matching-algorithms\.git\objects\e2\6698cbd1eba7a04d6843344464358d67d39c13
1 matching-algorithms\.git\objects\e6\9de29bb2d1d6434b8b29ae775ad8c2e48c5391
2 matching-algorithms\.git\objects\e7\8c23e4f705f854e23b966f277620ddead47267
1 matching-algorithms\.git\objects\ec\554de06cb005d3482d5db863227be29ced8202
2 matching-algorithms\.git\objects\ec\f68c4d00d8ec5a65bdb18d8a48bcad1259ac8e
2 matching-algorithms\.git\objects\ed\4eca9f0ef03b54f99c8263800709addc1acbad
1 matching-algorithms\.git\objects\f0\068bae1b50f7b8ce62374819becac2c1ac395c
3 matching-algorithms\.git\objects\f4\5f4249e989b919e37e83252eb477b772e4dee0
2 matching-algorithms\.git\objects\f9\4a132a71a830d406fdd437614f12759bb3e825
5 matching-algorithms\.git\objects\fa\19b2bb66dde866581c81e85c2cf1f590252c26
1 matching-algorithms\.git\objects\fb\75d698e6ebc711e2e7e16123ec47c3198b7dd4
2 matching-algorithms\.git\objects\fd\6690d1380958c5f22a25bed4f5ca34bd20f68f
3 matching-algorithms\.git\objects\ff\75481a8bd2251e860820af9fe718d8ed198958
2 matching-algorithms\.git\refs\heads\main
2 matching-algorithms\.git\refs\remotes\origin\HEAD
2 matching-algorithms\.git\refs\remotes\origin\main
21 matching-algorithms\python\data_generator.py
44 matching-algorithms\python\graph.py
21 matching-algorithms\python\node.py
95 matching-algorithms\python\test_da.py
394 matching-algorithms\python\test_ttc.py
39 matching-algorithms\python\algos\da_utils.py
92 matching-algorithms\python\algos\deferred_acceptance.py
61 matching-algorithms\python\algos\top_trading_cycle.py
80 matching-algorithms\python\algos\ttc_utils.py
1 matching-algorithms\python\algos\__init__.py
15 matching-algorithms\python\algos\__pycache__\da_utils.cpython-311.pyc
30 matching-algorithms\python\algos\__pycache__\deferred_acceptance.cpython-311.pyc
34 matching-algorithms\python\algos\__pycache__\top_trading_cycle.cpython-311.pyc
13 matching-algorithms\python\algos\__pycache__\ttc_utils.cpython-311.pyc
2 matching-algorithms\python\algos\__pycache__\__init__.cpython-311.pyc
6 matching-algorithms\python\__pycache__\data_generator.cpython-311.pyc
31 matching-algorithms\python\__pycache__\graph.cpython-311.pyc
6 matching-algorithms\python\__pycache__\node.cpython-311.pyc
2316 total
lines
```

Yikes. Now, let's call `wc` on the same directory, but ignore any compiled python bytecode files (.pyc), .gitignore files, and any .git directories:

```cmd
C:\> wc matching-algorithms -l -i .pyc .gitignore .git
22 matching-algorithms\README.md
21 matching-algorithms\python\data_generator.py
44 matching-algorithms\python\graph.py
21 matching-algorithms\python\node.py
95 matching-algorithms\python\test_da.py
394 matching-algorithms\python\test_ttc.py
39 matching-algorithms\python\algos\da_utils.py
92 matching-algorithms\python\algos\deferred_acceptance.py
61 matching-algorithms\python\algos\top_trading_cycle.py
80 matching-algorithms\python\algos\ttc_utils.py
1 matching-algorithms\python\algos\__init__.py
870 total
lines
```

Much better!

## Libraries
Nothing fancy required - just some basic I/O with Python's built-in `open` function, command-line parsing with the `argparse` library (which comes standard as part of the Python Standard Library), and some directory navigating with `os` (also part of the Python Standard Library).

If you've never used [`argparse`](https://docs.python.org/3/library/argparse.html), I highly recommend it for handling parsing of command-line inputs for programs, here's an example from my repo for this project:

```python
if __name__ == '__main__':
# Create an ArgumentParser object
parser = argparse.ArgumentParser(description='Process text file(s).')

# Add arguments for input file(s) and/or dir(s)
parser.add_argument(
'input_files_or_dirs',
nargs='+',
type=str,
help='Path to the input file(s) and/or dir(s). Pass no options with input file to compute -l (line count), -w (word count), and -c (byte count)'
)

# Add flags for different options, store as True/False
parser.add_argument('-c', '--bytes', action='store_true', help='Count bytes')
parser.add_argument('-l', '--lines', action='store_true', help='Count lines')
parser.add_argument('-w', '--words', action='store_true', help='Count words')
parser.add_argument('-m', '--characters', action='store_true', help='Count characters')

# Add a flag to specify extensions to ignore, set default to empty list, allow multiple args to be passed
parser.add_argument('-i', '--ignore-extensions', default=[], nargs='+', help='List of file extensions to ignore')

# Parse the command-line arguments
args = parser.parse_args()

# Count how many optional flags were passed out of -c, -l, -w, and -m (ignore -i)
optional_flags = sum([args.bytes, args.lines, args.words, args.characters])

# more code below...
```

The full code for this project can be found at my Github [here](https://github.com/lwcarani/py-wc).

## Acknowledgements
Thanks to [John Crickett](https://github.com/JohnCrickett) for the idea from his site, [Coding Challenges](https://codingchallenges.fyi/challenges/challenge-wc)!

Text samples were downloaded from [this](https://www.gutenberg.org/cache/epub/132/pg132.txt) site.

If you happen to peruse my code and notice any bugs or opportunities for optimizations, please let me know!

0 comments on commit 98b81eb

Please sign in to comment.