Flags at the beginning and end of the text are not rendered correctly #51

ProfJanetDavis · 2020-08-11T22:26:11Z

Describe the bug
There are problems with rendering flags at the beginning and end of the text.

To Reproduce
Steps to reproduce the behavior:

Go to http://localhost:8080 or http://biascorrect.org
Enter text "She is willing. I am willing. She is willing."
Click Submit
See results shown below

Expected behavior
Both instances of "She is willing" should be flagged. There should not be "undefined" at the end.

Additional context
We discovered this while investigating #47 .

ProfJanetDavis · 2020-08-12T17:38:22Z

@nidhi2509's work on #47 is probably relevant here. She identified 482699b as the last good commit. The next commit after that is af9014e.

nidhi2509 · 2020-08-12T18:47:26Z

Actually, I am seeing this regression with the previous commits too.

ProfJanetDavis · 2020-08-12T20:15:39Z

So not a regression but a plain old bug!

ProfJanetDavis · 2020-08-12T20:16:12Z

A good next step would be to make a new branch and write some "red" tests for these particular bugs. Does that make sense?

ProfJanetDavis · 2020-09-03T20:12:08Z

@kaliloua7 has started writing test cases. He wrote a "green" test case for a flag in the middle of the text. We need "red" test cases for flags at the beginning and end, which will be "green" when we fix the bug(s).

@elsayeaa has a theory about why the "undefined" is appearing at the end. This bug would benefit from some pair programming.

elsayeaa · 2020-09-06T00:45:21Z

After working on the bug with @dylanjwu, we concluded that the bug arises from the getBlurbs function. What's happening is that the text is split into an array of elements with an undefined object at the end, since the splitting method is using "" (empty string) as a delimiter. Then, the text is modified, joined, and then split into another array of flags using the delimiter "[!]". There are two current fixes, with each have a disadvantage:

The first fix

simply adding an empty space at the end of the text before splitting it, which will still create an undefined object in the area (so, the elements would be the n elements included + empty space + undefined; instead of n elements included + undefined). However, this is considered as a Kludge since it's not actually fixing the problem.

The second fix

slicing the TextArray before joining it to remove the undefined element at the end and then splitting it again using the delimiter "[!]" to create the flags. Disadvantage, the last character in the entire text is not rendered, which is the "."

Suggestion:

Completely reworking the getBlrubs() method to avoid creating an unwanted undefined object. Let me know what you think @ProfJanetDavis and @j6k4m8, so whether to make this is as a new issue or implement one of the suggsted fixes.

ProfJanetDavis · 2020-09-06T21:18:55Z

Thanks for this analysis. The proposed fixes seem to deal with the "undefined" at the end of the text, but not the unrendered flag at the beginning of the text. Is that right or am I missing something?

Of the two suggested fixes, the second one sounds more appropriate. The undefined value should never be rendered as text. Do you know why the last character is not rendered?

Regarding your suggestion, do you have an idea for an alternate algorithm for getBlurbs()? Do you think this would let you account for both bugs (which I've been assuming are related) at the same time?

elsayeaa · 2020-09-06T22:37:26Z

@dylanjwu has proposed a solution for the flag at the beginning of the sentence, and it is currently functional. @dylanjwu Can you elaborate more?
I am currently working on figuring out why isn't the last character rendered. I think I will report to you by the end of this day. An alternate algorithm for getBlrubs() will depend on how do we want to display the text. I was figuring it out now and trying to console log different methods. I managed to get rid of all undefined elements, and when we console log the textArray into the text it shows that textArray[flag.start()] is translated into an undefined object at the end of the textArray.

But when I apply this code:

if (typeof displayed[-1] === "undefined"){
                displayed.pop(); 
            }

it shows like this:

So, the process removed the undefined element but also treated the last object as also undefined. My suggestion is that because "lady." is treated as one word but "lady" is included in the wordlist of gendered words, it separated them from each other, and then after we display the messages that are separated using ("||"), another undefined object is created. I think the question might be restructuring the entire process of separating and joining the array together.

dylanjwu · 2020-09-07T04:20:00Z

The bug that arose for the flag at the start of the text was because the "dummy flag" starting and ending at index 0 (to ensure that text at the start that does not contain a blurb is rendered) was overwriting any actual flags that also start at 0. I fixed this by checking if a flag is contained at the start of the text, and if so, setting the flags array to be empty. See sol'n here:

getFlags(issues){
             let flags = [
                {   
                    start: 0,
                    end: 0,
                    category: "",
                    problem: "", 
                    suggestions: "", 
                    bias: ""
                }
            ];

            let firstFlag = issues[0].flags;
            if(firstFlag.length > 0 && firstFlag[0][0] === 0){
                flags = [];
            }

ProfJanetDavis · 2020-09-07T21:00:24Z

That makes sense!

…

On Sun, Sep 6, 2020 at 9:20 PM Dylan Wu ***@***.***> wrote: The bug that arose for the flag at the start of the text was because the "dummy flag" starting and ending at index 0 (to ensure that text at the start that does not contain a blurb is rendered) was overwriting any actual flags that also start at 0. I fixed this by checking if a flag is contained at the start of the text, and if so, setting the flags array to be empty. See the code here: ` getFlags(issues){ let flags = [ { start: 0, end: 0, category: "", problem: "", suggestions: "", bias: "" } ]; let firstFlag = issues[0].flags; if(firstFlag.length > 0 && firstFlag[0][0] === 0){ flags = []; }` — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#51 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEXJIU3ZATKS2VBPRB72XT3SERNP3ANCNFSM4P3U737Q> .

-- Janet Davis (she/they) Department Chair, Computer Science Associate Professor and Microsoft Chair of Computer Science Whitman College, Walla Walla, WA 99362 206-383-8798 (cell), http://cs.whitman.edu/~davisj/ Schedule an appointment: http://meetme.so/JanetDavis

elsayeaa · 2020-09-08T03:55:56Z

So, after walking through the code again and giving it another look. I am suggesting that we should use another delimiters for separating and joining the text. That is, use another delimiter that's not existing at the end of the text. The following are some of the suggested fixes that came to my mind, and I am not quite sure which one to be better implemented.

We can edit the back-end process of creating the issues by keeping the additional punctuation after detecting it. That is, as you see in my previous comment, when the word "lady." is detected, it is generated as "lady" afterwards instead of "lady." So that's something we might need to modify -- if my understanding of the codes and outputs is correct.
We can use a different delimiter and keep the original algorithm. For example, we can change the way we present the start and the end of a flag, so instead of adding "||||" or "||[!]${i}$" we add a more convenient, bug-free delimiter for distinguishing between flags.
We can create objects at the spot instead of collecting them into an array and then modifying them again. (Least preferred, kinda messy and long).

What do you think?

ProfJanetDavis · 2020-09-08T18:38:04Z

@elsayeaa I'm not sure. I've been trying to understand the example you gave earlier, with pop(). The "before" and "after" examples seem unrelated, so I don't see the problem you are trying to explain. Is it that when you remove the undefined, you also remove punctuation from the end of the text? That would still be a bug, but less confusing for the user than seeing "undefined."

I think you might need to screenshare and walk me through it during our next meeting on Thursday.

elsayeaa · 2020-09-13T01:15:45Z

I was able to finally fix the problem!
So, after going through the code manifold times and the back-end, I discovered that the main issue was a backend issue since using nltk.tockenize only takes the word without spaces and punctuation, which causes the loss of punctuation and spaces. That said, if we look at previous code of splitting and separating the text into flags and non-flags. The separating method takes the flag and adds the delimiter to the end of the flag (the flag that lost its punctuation). So the following code does the fix.

for (const [i, flag] of flags.entries()) {
                textArray[flag.end] = textArray[flag.end]  === '.' || textArray[flag.end] === ',' 
                ? textArray[flag.end] + "[!]||||" 
                : "[!]||||" + textArray[flag.end];
                textArray[flag.start] = `[!]||${i}||` + textArray[flag.start];
            }

However, I am putting this as a prototype and was thinking to make it more generalized and instead of OR statements for each punctuation mark, we look into a list or a datatype that have them collectively. This solves the problem of the extra spaces between the flagged words and their punctuation. Finally, for the last flag, to solve the undefined problem, we do the following code:

textArray = textArray.join("").split("[!]"); 
            if (textArray[-1] === '||||undefined'){
                textArray.pop(); 
            }

this code basically checks if the last element is undefined and removes it if it exists, if not it leaves the text as it is.

These fixes are good for the front-end, but I strongly suggest modifying the tockenize function in the backend and make it include the punctuation. I think Tockenize was used to detect words, but not display the text as it is, so it fails to achieve what's required on the website scale. Let me know what you think. @dylanjwu can you make a branch of your fix to the flag at the beginning, so I can merge both?

ProfJanetDavis · 2020-09-13T22:55:44Z

However, I am putting this as a prototype and was thinking to make it more generalized and instead of OR statements for each punctuation mark, we look into a list or a datatype that have them collectively.

The built-in `String includes()` method may be useful here. You can make a string of punctuation characters and check if `textArray[flag.end]` is in that string. But wouldn't it be simpler to instead check if the next character is *not* a space or newline? Due to the tokenization, I don't think a flag will ever end mid-word. Also, wouldn't the suggested code error if `flag.end` is at the end of `textArray`? Finally, I would hesitate to recommend changes to the backend code without a much deeper understanding of that code.

…

On Sat, Sep 12, 2020 at 6:15 PM elsayeaa ***@***.***> wrote: I was able to finally fix the problem! So, after going through the code manifold times and the back-end, I discovered that the main issue was a backend issue since using nltk.tockenize only takes the word without spaces and punctuation, which causes the loss of punctuation and spaces. That said, if we look at previous code of splitting and separating the text into flags and non-flags. The separating method takes the flag and adds the delimiter to the end of the flag (the flag that lost its punctuation). So the following code does the fix. for (const [i, flag] of flags.entries()) { textArray[flag.end] = textArray[flag.end] === '.' || textArray[flag.end] === ',' ? textArray[flag.end] + "[!]||||" : "[!]||||" + textArray[flag.end]; textArray[flag.start] = `[!]||${i}||` + textArray[flag.start]; } However, I am putting this as a prototype and was thinking to make it more generalized and instead of OR statements for each punctuation mark, we look into a list or a datatype that have them collectively. This solves the problem of the extra spaces between the flagged words and their punctuation. Finally, for the last flag, to solve the undefined problem, we do the following code: textArray = textArray.join("").split("[!]"); if (textArray[-1] === '||||undefined'){ textArray.pop(); } this code basically checks if the last element is undefined and removes it if it exists, if not it leaves the text as it is. These fixes are good for the front-end, but I strongly suggest modifying the tockenize function in the backend and make it include the punctuation. I think Tockenize was used to detect words, but not display the text as it is, so it fails to achieve what's required on the website scale. Let me know what you think. @dylanjwu <https://github.com/dylanjwu> can you make a branch of your fix to the flag at the beginning, so I can merge both? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#51 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEXJIU52W5RPUUN62NR5EI3SFQMM5ANCNFSM4P3U737Q> .

-- Janet Davis (she/they) Department Chair, Computer Science Associate Professor and Microsoft Chair of Computer Science Whitman College, Walla Walla, WA 99362 206-383-8798 (cell), http://cs.whitman.edu/~davisj/ Schedule an appointment: http://meetme.so/JanetDavis

elsayeaa · 2020-09-13T23:04:09Z

I thought about the String Include method.. I can show you an illustration of before-and-after my modifications. There is no error if the flag is at the end of the sentence, middle of the sentence, middle and end at the same time, beginning, beginning and middle and end, beginning and end, and so on with different permutations. I made sure to account for these variations. Checking if it is not space or a new line is a beautiful idea and easier to implement than checking punctuation symbols. I think this fix solves it at the frontend level, but it makes the code fragile and prone to unaccounted errors, so although we need a deeper understanding of the code (which I tried to do over this weekend), I strongly recommend that we change it to accommodate the needs of the front end. In the end, it is a matter of choice. @ProfJanetDavis Should I make a pull request after merging @dylanjwu solution to the first-flag issue to this fix?

ProfJanetDavis · 2020-09-13T23:06:54Z

Yes, I think this will be ready for a pull request.

…

On Sun, Sep 13, 2020 at 4:04 PM elsayeaa ***@***.***> wrote: I thought about the String Include method but I was just asking for any suggestions here. I can show you an illustration of before-and-after my modifications. There is no error if the flag is at the end of the sentence, middle of the sentence, middle and end at the same time, beginning, beginning and middle and end, beginning and end, and so on with different permutations. I made sure to account for these variations. Checking if it is not space or a new line is a beautiful idea and easier to implement than checking punctuation symbols. I think this fix solves it at the backend, but it makes the code fragile and prone to unaccounted errors, so although we need a deeper understanding of the code (which I tried to do over this weekend), I strongly recommend that we change it to accommodate the needs of the front end. In the end, it is a matter of choice. @ProfJanetDavis <https://github.com/ProfJanetDavis> Should I make a pull request after merging @dylanjwu <https://github.com/dylanjwu> to this fix? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#51 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEXJIU75QZLIRWC76GVBC33SFVFXLANCNFSM4P3U737Q> .

-- Janet Davis (she/they) Department Chair, Computer Science Associate Professor and Microsoft Chair of Computer Science Whitman College, Walla Walla, WA 99362 206-383-8798 (cell), http://cs.whitman.edu/~davisj/ Schedule an appointment: http://meetme.so/JanetDavis

ProfJanetDavis added the bug Something isn't working label Aug 11, 2020

ProfJanetDavis assigned ProfJanetDavis, nidhi2509 and kaliloua7 Aug 11, 2020

ProfJanetDavis mentioned this issue Aug 11, 2020

Regression: No flags shown #47

Closed

ProfJanetDavis changed the title ~~Flags at beginning and end are not rendered correctly~~ Flags at the beginning and end of the text are not rendered correctly Aug 12, 2020

ProfJanetDavis assigned dylanjwu and elsayeaa and unassigned nidhi2509 Sep 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flags at the beginning and end of the text are not rendered correctly #51

Flags at the beginning and end of the text are not rendered correctly #51

ProfJanetDavis commented Aug 11, 2020

ProfJanetDavis commented Aug 12, 2020

nidhi2509 commented Aug 12, 2020

ProfJanetDavis commented Aug 12, 2020

ProfJanetDavis commented Aug 12, 2020 •

edited

Loading

ProfJanetDavis commented Sep 3, 2020

elsayeaa commented Sep 6, 2020 •

edited

Loading

ProfJanetDavis commented Sep 6, 2020

elsayeaa commented Sep 6, 2020

dylanjwu commented Sep 7, 2020 •

edited

Loading

ProfJanetDavis commented Sep 7, 2020 via email

elsayeaa commented Sep 8, 2020

ProfJanetDavis commented Sep 8, 2020

elsayeaa commented Sep 13, 2020

ProfJanetDavis commented Sep 13, 2020 via email

elsayeaa commented Sep 13, 2020 •

edited

Loading

ProfJanetDavis commented Sep 13, 2020 via email

Flags at the beginning and end of the text are not rendered correctly #51

Flags at the beginning and end of the text are not rendered correctly #51

Comments

ProfJanetDavis commented Aug 11, 2020

ProfJanetDavis commented Aug 12, 2020

nidhi2509 commented Aug 12, 2020

ProfJanetDavis commented Aug 12, 2020

ProfJanetDavis commented Aug 12, 2020 • edited Loading

ProfJanetDavis commented Sep 3, 2020

elsayeaa commented Sep 6, 2020 • edited Loading

The first fix

The second fix

Suggestion:

ProfJanetDavis commented Sep 6, 2020

elsayeaa commented Sep 6, 2020

dylanjwu commented Sep 7, 2020 • edited Loading

ProfJanetDavis commented Sep 7, 2020 via email

elsayeaa commented Sep 8, 2020

ProfJanetDavis commented Sep 8, 2020

elsayeaa commented Sep 13, 2020

ProfJanetDavis commented Sep 13, 2020 via email

elsayeaa commented Sep 13, 2020 • edited Loading

ProfJanetDavis commented Sep 13, 2020 via email

ProfJanetDavis commented Aug 12, 2020 •

edited

Loading

elsayeaa commented Sep 6, 2020 •

edited

Loading

dylanjwu commented Sep 7, 2020 •

edited

Loading

elsayeaa commented Sep 13, 2020 •

edited

Loading