Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

functionalities related to individual questions #2

Open
3 of 5 tasks
wenjunsun opened this issue May 11, 2021 · 2 comments
Open
3 of 5 tasks

functionalities related to individual questions #2

wenjunsun opened this issue May 11, 2021 · 2 comments

Comments

@wenjunsun
Copy link
Collaborator

wenjunsun commented May 11, 2021

Here are the desired functionalities provided by Thalie in this ODK thread

Goal for this quarter:

  • identify questions for which answers are regularly modified by data collectors. From a data management perspective this may indicate that the phrasing of the question / answers is too ambiguous.

  • I think a useful thing here is to provide a plot where x-axis is the question name, and y-axis is the total number of times each question is modified. (then there is no need to ask for user input as the threshold for how many counts as many modifications), and a visualization is better than just have texts that say which question is regularly modified.

  • how many modifications count as too many modifications? -- might need to let user determine this by passing in a number and our flagging of questions that are regularly modified depends on the user input.

  • identify questions which take more time for data collectors to answer.

  • Obviously some questions that asks for a short answer will take long time to answer than checking off a box. We want to identify those questions that specifically take a very long time to answer comparing to other questions in the survey. Again, how long of answer time will count as "long enough"?

  • Again, I think the best option here is to provide a bar graph where the x-axis is the question name, and y-axis is how long each question takes. From this, users can visually compare and see which question takes the longest.

  • Here we could either output the question that takes the max amount of time to answer (or the top 3), along with the amount of time it takes.

  • Or asks user to give a threshold for what counts as taking a "long" time, and output all the questions that take longer than that?

  • identify if an answer was modified a "very long time" after the first entry (Maybe next quarter)

  • Also provide a bar plot of x-axis being question, y-axis being how long is the time difference between first entry and second entry of the question. (if this question is only answered once, then put 0)

  • Again need to ask user for what counts as "very long time"

  • Specifically, if the first answer and second answer to a question differs by more than a threshold, we flag that as "modified very long time after first entry".

  • To provide an uniform UI for all possible analysis user want to do, I think a useful thing to do here is to have a dropdown menu of the possible options of analysis (like the above 3), and when user click on a specific analysis, we show the visual plot corresponding to that

Maybe? Next year?:

  • identify if a question that should be answered at a specific location was not answered where it should (could also help identify if the survey is completed from recall vs. while interviewing a respondent or if an observation was not direct)
  • This is more complicated. For this to possible, we need audit file to include geo information about where survey actually takes place, and we need users to provide for each question or survey, where that survey or question should take place.
@noorassan
Copy link
Collaborator

I think that the 3rd graph here might be beyond the scope of what we can do without submission IDs. Calculating any info about how much time occurs between modifications of a question would require us to be able to classify two "modification events" as having come from the same submission, which requires submission IDs.

@wenjunsun
Copy link
Collaborator Author

I think that the 3rd graph here might be beyond the scope of what we can do without submission IDs. Calculating any info about how much time occurs between modifications of a question would require us to be able to classify two "modification events" as having come from the same submission, which requires submission IDs.

You are definitely right Naisan. This would require submission IDs. Let's focus on the first two graphs for now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants