Transcriptions/classifications that do not match the task/subject #370

denslowm · 2015-07-27T14:10:36Z

There are a number of transcriptions/classifications that do not match the task/subject. The pattern found is that whenever start and finish time are equal, this issue appears.
For example, 10 cases from Jul 1~7, and the transcriptions are completely not matching the image. You can quickly find it strange that the exact same transcription and user appears for a number of different tasks/subjects.

For example on Jul 1, you can find 6 identical transcriptions from foxx86 for the subjects SELU0010136,SELU0006547,SELU0004005,SELU0010669,SELU0006802,SELU0008005.
https://static.zooniverse.org/www.notesfromnature.org/subjects/sernec/selu_images/5570795de3f6661fa5b2055f.jpg
https://static.zooniverse.org/www.notesfromnature.org/subjects/sernec/selu_images/55707958e3f6661fa5b1f77c.jpg
https://static.zooniverse.org/www.notesfromnature.org/subjects/sernec/selu_images/55707954e3f6661fa5b1ed95.jpg
https://static.zooniverse.org/www.notesfromnature.org/subjects/sernec/selu_images/5570795fe3f6661fa5b20773.jpg
https://static.zooniverse.org/www.notesfromnature.org/subjects/sernec/selu_images/55707959e3f6661fa5b1f878.jpg
https://static.zooniverse.org/www.notesfromnature.org/subjects/sernec/selu_images/5570795ae3f6661fa5b1fd29.jpg

This is potentially affecting 5.5% of the transcriptions, which is really significant.

chrissnyder · 2015-07-28T18:53:52Z

Has this occurred since July 7? Does the transcription that was submitted look like a legit transcription or was it blank? Did this only happen to Herbarium records?

robgur · 2015-07-28T18:58:02Z

I think this has been a persistent problem by all accounts...

On Tue, Jul 28, 2015 at 2:53 PM, Chris Snyder [email protected]
wrote:

Has this occurred since July 7? Does the transcription that was submitted
look like a legit transcription or was it blank? Did this only happen to
Herbarium records?

—
Reply to this email directly or view it on GitHub
#370 (comment)
.

chrissnyder · 2015-07-28T19:10:00Z

Yes, but I'm trying to narrow down what it's root cause is. My questions serve two purposes:

Was this caused by code added around the first of the month? If so, that's a lot less area to look.
Was this seen in only one collection? If so, we can focus troubleshooting where that collection sends its data off to the API.

To note, I don't think it's an API issue, as other projects have not reported similar patterns of classifications from users. So it has to be something within the NfN codebase itself that might cause a user to submit identical transcriptions.

denslowm · 2015-07-28T19:12:43Z

I am looping in @ammatsun. She can help us answer.
We know it is the herbarium for sure at this point.

ammatsun · 2015-07-29T00:51:44Z

This is a persistent problem. I just selected July 1~7, 2015 to show that it has occurred recently, but I observed this in records since 2013 (so, not due to a recent change). My best guess at this time, without knowing the code, is that there is some concurrency problem and state from different workers are getting mixed and/or generating this situation. In particular, this might be happening when one worker skips a transcription work, but I could not locate a definitive pattern in the data.

I haven't looked at other collections closely, but I just glanced over the macrofungi collection, and found that transcription 5313467447bc7245280007be for subject 52545d9e5c2a110000000b7d (image http://www.notesfromnature.org/subjects/macrofungi/mich/52545d9e5c2a110000000b7d.jpg) has nothing about Canada in it as the transcription indicates, and the exact same transcription is also present for another subject 525468915c2a11000000121e.

Differences in start and finish time of 1 second and 2 seconds also point to cases where this issue appears.

JoyceGross · 2015-07-30T01:02:42Z

I can confirm that this is happening with CalBug records too.

An example is subject 519e5c7eea30523400000457 (EMEC593148 Undetermined sp.jpg). There are 3 transcriptions with the correct locality information (Nevada), and a 4th one with completely different locality information (Minnesota).

The 4th record (transcription 54bbfdb9832cec520b0000c3) has the exact same data as transcription 54bbfdb929a6f6290f0000cb, which is for a different specimen. Both records were recorded at almost exactly the same time.

The date is January 2015 for the two above-mentioned transcriptions.

I remember seeing this problem a year or more ago with the CalBug data.

denslowm · 2015-08-20T14:17:13Z

Hey @chrissnyder,
I just wanted to check to see if you have any updates on this issue. What do we need to do to move this forward?

denslowm · 2015-11-30T21:04:21Z

I just wanted to report that I am seeing this issue in the BRIT (herbarium) dataset that came last night.

In addition to what has already been reported, I will note one other thing.

As noted earlier, start and end times of the problematic records are the same (within each record and across all erroneous records) EXCEPT one record will have an earlier start time. Usually this is the last record in the set, but not always. The record which has this earlier start time seems to be for the image that was transcribed and applied to all other records in the set.

trouille · 2015-12-18T01:03:53Z

@denslowm Can you calculate how often you see this happening in the BRIT dataset? Since it appears as such a specific error, is it something you can code into your aggregation script to remove from the data? Have you already shared with the other research teams your approach for removing it, so they do the same with their data?

The reason to find out the rate is that this issue and #384 (which may be related) are proving very difficult to reproduce on command for our devs to troubleshoot. I am wondering whether, if the rate is less than a few %, we could do the following:

message very clearly in Talk that we're aware of the problem and also the rate at which it is happening and how you're removing it from the results so we know it's not contaminating the research
have people keep posting in Talk when they see it happen so we can have a sense for whether there's suddenly an uptick in breaks of this type
focus dev effort on the new platform rather than spending significantly more time trying to find the solution to this problem that may have a limited enough impact that we're willing to remove it in post-processing

parrish · 2016-06-27T16:37:02Z

Closing since the app has been relaunched. This was a weirdly intermittent bug in the front end that I never did manage to reproduce.

denslowm mentioned this issue Dec 3, 2015

Two different sheets showing at once (herbarium) #384

Open

trouille assigned parrish Dec 9, 2015

parrish closed this as completed Jun 27, 2016

camallen unassigned parrish Apr 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transcriptions/classifications that do not match the task/subject #370

Transcriptions/classifications that do not match the task/subject #370

denslowm commented Jul 27, 2015

chrissnyder commented Jul 28, 2015

robgur commented Jul 28, 2015

chrissnyder commented Jul 28, 2015

denslowm commented Jul 28, 2015

ammatsun commented Jul 29, 2015

JoyceGross commented Jul 30, 2015

denslowm commented Aug 20, 2015

denslowm commented Nov 30, 2015

trouille commented Dec 18, 2015

parrish commented Jun 27, 2016

Transcriptions/classifications that do not match the task/subject #370

Transcriptions/classifications that do not match the task/subject #370

Comments

denslowm commented Jul 27, 2015

chrissnyder commented Jul 28, 2015

robgur commented Jul 28, 2015

chrissnyder commented Jul 28, 2015

denslowm commented Jul 28, 2015

ammatsun commented Jul 29, 2015

JoyceGross commented Jul 30, 2015

denslowm commented Aug 20, 2015

denslowm commented Nov 30, 2015

trouille commented Dec 18, 2015

parrish commented Jun 27, 2016