Analyzing Word Frequency

Unit 4 Session 2 Standard (Click for link to problem statements)

U-nderstand

Understand what the interviewer is asking for by using test cases and questions about the problem.

Q: What is the goal of the problem?
- A: The goal is to analyze a given text to determine the frequency of each unique word and identify the most frequent word(s).
Q: What are the inputs?
- A: The input is a string of text.
Q: What are the outputs?
- A: The output is a dictionary where keys are words and values are their frequencies, and a list of the most frequent word(s).
Q: How should text be processed?
- A: The text should be treated as case-insensitive, and punctuation should be ignored.
Q: What if there is a tie for the most frequent word?
- A: Return all words that have the highest frequency.

P-lan

Plan the solution with appropriate visualizations and pseudocode.

General Idea: Convert the text to lowercase and remove punctuation. Split the text into words and count the frequency of each word using a dictionary. Then, identify the word(s) with the highest frequency.

1) Convert the entire `text` to lowercase to ensure case insensitivity.
2) Remove punctuation from the text.
3) Split the `text` into individual words.
4) Initialize an empty dictionary `frequency_dict` to store word frequencies.
5) Iterate through the list of words:
   a) If the word is already in `frequency_dict`, increment its count.
   b) If the word is not in `frequency_dict`, add it with a count of 1.
6) Determine the maximum frequency in `frequency_dict`.
7) Initialize a list `most_frequent_words` to store words with the highest frequency.
8) Iterate through `frequency_dict` and add words with the maximum frequency to `most_frequent_words`.
9) Return `frequency_dict` and `most_frequent_words`.

**⚠️ Common Mistakes**

- Not handling punctuation correctly, leading to incorrect word counts.
- Forgetting to account for case insensitivity when counting word frequencies.
- Not correctly identifying all words with the highest frequency in case of ties.

I-mplement

def word_frequency_analysis(text):
    # Convert the text to lowercase and remove punctuation manually
    text = text.lower()
    clean_text = ''
    for char in text:
        if char.isalnum() or char.isspace():
            clean_text += char

    # Split the text into words
    words = clean_text.split()

    # Dictionary to store word frequencies
    frequency_dict = {}

    for word in words:
        if word in frequency_dict:
            frequency_dict[word] += 1
        else:
            frequency_dict[word] = 1

    # Find the maximum frequency without using max
    max_frequency = -1
    most_frequent_words = []

    for word, freq in frequency_dict.items():
        if freq > max_frequency:
            max_frequency = freq
            most_frequent_words = [word]
        elif freq == max_frequency:
            most_frequent_words.append(word)

    return frequency_dict, most_frequent_words

Example Usage:

text = "The quick brown fox jumps over the lazy dog. The dog was not amused."
print(word_frequency_analysis(text))  
# Output: ({'the': 3, 'quick': 1, 'brown': 1, 'fox': 1, 'jumps': 1, 'over': 1, 'lazy': 1, 'dog': 2, 'was': 1, 'not': 1, 'amused': 1}, ['the'])

text_2 = "Digital nomads love to travel. Travel is their passion."
print(word_frequency_analysis(text_2))  
# Output: ({'digital': 1, 'nomads': 1, 'love': 1, 'to': 1, 'travel': 2, 'is': 1, 'their': 1, 'passion': 1}, ['travel'])

text_3 = "Stay connected. Stay productive. Stay happy."
print(word_frequency_analysis(text_3))  
# Output: ({'stay': 3, 'connected': 1, 'productive': 1, 'happy': 1}, ['stay'])

Analyzing Word Frequency

U-nderstand

P-lan

I-mplement

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!