Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flags at the beginning and end of the text are not rendered correctly #51

Open
ProfJanetDavis opened this issue Aug 11, 2020 · 16 comments
Assignees
Labels
bug Something isn't working

Comments

@ProfJanetDavis
Copy link
Contributor

Describe the bug
There are problems with rendering flags at the beginning and end of the text.

To Reproduce
Steps to reproduce the behavior:

  1. Go to http://localhost:8080 or http://biascorrect.org
  2. Enter text "She is willing. I am willing. She is willing."
  3. Click Submit
  4. See results shown below

Screen Shot 2020-08-11 at 3 22 17 PM

Expected behavior
Both instances of "She is willing" should be flagged. There should not be "undefined" at the end.

Additional context
We discovered this while investigating #47 .

@ProfJanetDavis ProfJanetDavis added the bug Something isn't working label Aug 11, 2020
@ProfJanetDavis ProfJanetDavis changed the title Flags at beginning and end are not rendered correctly Flags at the beginning and end of the text are not rendered correctly Aug 12, 2020
@ProfJanetDavis
Copy link
Contributor Author

@nidhi2509's work on #47 is probably relevant here. She identified 482699b as the last good commit. The next commit after that is af9014e.

@nidhi2509
Copy link
Contributor

Actually, I am seeing this regression with the previous commits too.

@ProfJanetDavis
Copy link
Contributor Author

So not a regression but a plain old bug!

@ProfJanetDavis
Copy link
Contributor Author

ProfJanetDavis commented Aug 12, 2020

A good next step would be to make a new branch and write some "red" tests for these particular bugs. Does that make sense?

@ProfJanetDavis
Copy link
Contributor Author

@kaliloua7 has started writing test cases. He wrote a "green" test case for a flag in the middle of the text. We need "red" test cases for flags at the beginning and end, which will be "green" when we fix the bug(s).

@elsayeaa has a theory about why the "undefined" is appearing at the end. This bug would benefit from some pair programming.

@ProfJanetDavis ProfJanetDavis assigned dylanjwu and elsayeaa and unassigned nidhi2509 Sep 3, 2020
@elsayeaa
Copy link
Contributor

elsayeaa commented Sep 6, 2020

After working on the bug with @dylanjwu, we concluded that the bug arises from the getBlurbs function. What's happening is that the text is split into an array of elements with an undefined object at the end, since the splitting method is using "" (empty string) as a delimiter. Then, the text is modified, joined, and then split into another array of flags using the delimiter "[!]". There are two current fixes, with each have a disadvantage:

The first fix

simply adding an empty space at the end of the text before splitting it, which will still create an undefined object in the area (so, the elements would be the n elements included + empty space + undefined; instead of n elements included + undefined). However, this is considered as a Kludge since it's not actually fixing the problem.

The second fix

slicing the TextArray before joining it to remove the undefined element at the end and then splitting it again using the delimiter "[!]" to create the flags. Disadvantage, the last character in the entire text is not rendered, which is the "."

Suggestion:

Completely reworking the getBlrubs() method to avoid creating an unwanted undefined object. Let me know what you think @ProfJanetDavis and @j6k4m8, so whether to make this is as a new issue or implement one of the suggsted fixes.

@ProfJanetDavis
Copy link
Contributor Author

Thanks for this analysis. The proposed fixes seem to deal with the "undefined" at the end of the text, but not the unrendered flag at the beginning of the text. Is that right or am I missing something?

Of the two suggested fixes, the second one sounds more appropriate. The undefined value should never be rendered as text. Do you know why the last character is not rendered?

Regarding your suggestion, do you have an idea for an alternate algorithm for getBlurbs()? Do you think this would let you account for both bugs (which I've been assuming are related) at the same time?

@elsayeaa
Copy link
Contributor

elsayeaa commented Sep 6, 2020

@dylanjwu has proposed a solution for the flag at the beginning of the sentence, and it is currently functional. @dylanjwu Can you elaborate more?
I am currently working on figuring out why isn't the last character rendered. I think I will report to you by the end of this day. An alternate algorithm for getBlrubs() will depend on how do we want to display the text. I was figuring it out now and trying to console log different methods. I managed to get rid of all undefined elements, and when we console log the textArray into the text it shows that textArray[flag.start()] is translated into an undefined object at the end of the textArray.
image
But when I apply this code:

if (typeof displayed[-1] === "undefined"){
                displayed.pop(); 
            }

it shows like this:
image
So, the process removed the undefined element but also treated the last object as also undefined. My suggestion is that because "lady." is treated as one word but "lady" is included in the wordlist of gendered words, it separated them from each other, and then after we display the messages that are separated using ("||"), another undefined object is created. I think the question might be restructuring the entire process of separating and joining the array together.

@dylanjwu
Copy link
Contributor

dylanjwu commented Sep 7, 2020

The bug that arose for the flag at the start of the text was because the "dummy flag" starting and ending at index 0 (to ensure that text at the start that does not contain a blurb is rendered) was overwriting any actual flags that also start at 0. I fixed this by checking if a flag is contained at the start of the text, and if so, setting the flags array to be empty. See sol'n here:

getFlags(issues){
             let flags = [
                {   
                    start: 0,
                    end: 0,
                    category: "",
                    problem: "", 
                    suggestions: "", 
                    bias: ""
                }
            ];

            let firstFlag = issues[0].flags;
            if(firstFlag.length > 0 && firstFlag[0][0] === 0){
                flags = [];
            }

@ProfJanetDavis
Copy link
Contributor Author

ProfJanetDavis commented Sep 7, 2020 via email

@elsayeaa
Copy link
Contributor

elsayeaa commented Sep 8, 2020

So, after walking through the code again and giving it another look. I am suggesting that we should use another delimiters for separating and joining the text. That is, use another delimiter that's not existing at the end of the text. The following are some of the suggested fixes that came to my mind, and I am not quite sure which one to be better implemented.

  • We can edit the back-end process of creating the issues by keeping the additional punctuation after detecting it. That is, as you see in my previous comment, when the word "lady." is detected, it is generated as "lady" afterwards instead of "lady." So that's something we might need to modify -- if my understanding of the codes and outputs is correct.
  • We can use a different delimiter and keep the original algorithm. For example, we can change the way we present the start and the end of a flag, so instead of adding "||||" or "||[!]${i}$" we add a more convenient, bug-free delimiter for distinguishing between flags.
  • We can create objects at the spot instead of collecting them into an array and then modifying them again. (Least preferred, kinda messy and long).

What do you think?

@ProfJanetDavis
Copy link
Contributor Author

@elsayeaa I'm not sure. I've been trying to understand the example you gave earlier, with pop(). The "before" and "after" examples seem unrelated, so I don't see the problem you are trying to explain. Is it that when you remove the undefined, you also remove punctuation from the end of the text? That would still be a bug, but less confusing for the user than seeing "undefined."

I think you might need to screenshare and walk me through it during our next meeting on Thursday.

@elsayeaa
Copy link
Contributor

I was able to finally fix the problem!
So, after going through the code manifold times and the back-end, I discovered that the main issue was a backend issue since using nltk.tockenize only takes the word without spaces and punctuation, which causes the loss of punctuation and spaces. That said, if we look at previous code of splitting and separating the text into flags and non-flags. The separating method takes the flag and adds the delimiter to the end of the flag (the flag that lost its punctuation). So the following code does the fix.

for (const [i, flag] of flags.entries()) {
                textArray[flag.end] = textArray[flag.end]  === '.' || textArray[flag.end] === ',' 
                ? textArray[flag.end] + "[!]||||" 
                : "[!]||||" + textArray[flag.end];
                textArray[flag.start] = `[!]||${i}||` + textArray[flag.start];
            }

However, I am putting this as a prototype and was thinking to make it more generalized and instead of OR statements for each punctuation mark, we look into a list or a datatype that have them collectively. This solves the problem of the extra spaces between the flagged words and their punctuation. Finally, for the last flag, to solve the undefined problem, we do the following code:

textArray = textArray.join("").split("[!]"); 
            if (textArray[-1] === '||||undefined'){
                textArray.pop(); 
            }

this code basically checks if the last element is undefined and removes it if it exists, if not it leaves the text as it is.

These fixes are good for the front-end, but I strongly suggest modifying the tockenize function in the backend and make it include the punctuation. I think Tockenize was used to detect words, but not display the text as it is, so it fails to achieve what's required on the website scale. Let me know what you think. @dylanjwu can you make a branch of your fix to the flag at the beginning, so I can merge both?

@ProfJanetDavis
Copy link
Contributor Author

ProfJanetDavis commented Sep 13, 2020 via email

@elsayeaa
Copy link
Contributor

elsayeaa commented Sep 13, 2020

I thought about the String Include method.. I can show you an illustration of before-and-after my modifications. There is no error if the flag is at the end of the sentence, middle of the sentence, middle and end at the same time, beginning, beginning and middle and end, beginning and end, and so on with different permutations. I made sure to account for these variations. Checking if it is not space or a new line is a beautiful idea and easier to implement than checking punctuation symbols. I think this fix solves it at the frontend level, but it makes the code fragile and prone to unaccounted errors, so although we need a deeper understanding of the code (which I tried to do over this weekend), I strongly recommend that we change it to accommodate the needs of the front end. In the end, it is a matter of choice. @ProfJanetDavis Should I make a pull request after merging @dylanjwu solution to the first-flag issue to this fix?

@ProfJanetDavis
Copy link
Contributor Author

ProfJanetDavis commented Sep 13, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants