Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TWE: IS22: RP006: Intern Intake Interview Recordings: Clean Up & De-identify: Intern 009 #704

Open
pandanista opened this issue Jan 22, 2025 · 0 comments

Comments

@pandanista
Copy link
Member

pandanista commented Jan 22, 2025

Dependencies

Overview

We need to clean up and de-identify RP006's Intern Intake interview transcripts so that we can move on to the data analysis and insights generation phase.

Resources/Instructions: This section is at the bottom of this issue (scroll to the bottom to view it now). You will be asked to add links to this section while completing the issue.

Tip: Use two windows side by side. One with the issue open and the other one with resource links displayed to avoid back and forth. To prevent loss of work, refresh both windows after each edit.

Action Items

UX lead adds the assignee to the Internship - PII's "My Drive" as a Content Manager so that they have access to internship - PII's My Drive where the recordings and transcripts with PII are stored

  • UX lead accesses the Internship - PII's "My Drive"
  • Choose "Manage members", which is located towards the top right side of the browser Screenshot 2023-08-09 at 6 09 36 PM
  • Enter the assignee's email address and select the role as "Content Manager"
  • Confirm with the assignee that they have access to the Internship - PII's "My Drive"

Customize Resource Links

  • Customize Resource for Wiki Page Link

    • Go to the wiki page: Research Output Overview (Resources # 1.01)
    • Choose the link in the Research by Plan Number section
    • Locate relevant wiki page for RP006
    • Copy the link for the wiki page
    • In Resources # 2.01, place the link you just copied between parentheses at the end of the line with no space in between the right bracket ] and the left parenthesis (, so it turns into a hyperlink
    • Choose "Update comment" in Github and make sure all the checkboxes above have been checked Screenshot 2023-02-21 at 6 47 15 PM
  • Customize Resource for this Research Plan's Google Drive Folder

    • Open the Google Drive's Research by Type Folder (Resources # 1.02)
    • Choose the Intern folder
    • Choose the RP006 folder
    • Copy the link of the RP006 folder
      1. Choose the three vertical dots on the right side of the RP006 folder Screenshot 2023-10-10 at 5 58 13 PM
      2. Choose "Share"
      3. Choose "Copy Link"
    • In Resources # 2.02, place the link you just copied between parentheses at the end of the line with no space in between the right bracket ] and the left parenthesis (, so it turns into a hyperlink
    • Choose "Update comment" in Github and make sure all the checkboxes above have been checked
  • Customize Resource for Interview Recordings and Transcriptions Tracking Sheet

    • Open the Research Plan's Google Drive folder in the Internship's shared drive (Resource # 2.02)
    • Locate the TWE: IS22: RP006: Intern Intake Interview Recordings and Transcriptions Tracking Sheet in the folder
    • Copy the link of TWE: IS22: RP006: Intern Intake Interview Recordings and Transcriptions Tracking Sheet
      1. Choose the three vertical dots on the top right side of the file Screenshot 2023-10-10 at 5 58 13 PM
      2. Choose "Share"
      3. Choose "Copy Link"
    • In Resources # 2.03, place the link you just copied between parentheses at the end of the line with no space in between the right bracket ] and the left parenthesis (, so it turns into a hyperlink
    • Choose "Update comment" in Github and make sure all the checkboxes above have been checked
  • Customize Resource for De-identified Participants List from Interviews (UXR Excluded) spreadsheet

    • Open the Research Plan's Google Drive folder in the Internship's shared drive (Resource # 2.02) if it is not open yet
    • Locate the TWE: IS22: RP006: De-identified Participants List from Interviews (UXR Excluded) in the folder
    • Copy the link of TWE: IS22: RP006: De-identified Participants List from Interviews (UXR Excluded)
      1. Choose the three vertical dots on the top right side of the file Screenshot 2023-10-10 at 5 58 13 PM
      2. Choose "Share"
      3. Choose "Copy Link"
    • In Resources # 2.04, place the link you just copied between parentheses at the end of the line with no space in between the right bracket ] and the left parenthesis (, so it turns into a hyperlink
    • Choose "Update comment" in Github and make sure all the checkboxes above have been checked
  • Customize Resource for the De-identified Transcripts Folder in the Shared Drive 

    • Go to RP006 folder (Resources # 2.02)
    • Locate the RP006 De-identified Transcripts folder
    • Copy the link of the folder
      1. Choose the three vertical dots on the right side of the folder Screenshot 2023-10-10 at 5 58 13 PM
      2. Choose "Share"
      3. Choose "Copy Link"
    • In Resources # 2.05, place the link you just copied between parentheses at the end of the line with no space in between the right bracket ] and the left parenthesis (, so it turns into a hyperlink
    • Choose "Update comment" in GitHub and make sure all the checkboxes above have been checked
  • Customize Resource for Interview Recording Folder stored in the Internship - PII's My Drive

    • Log into your Google account associated with TWE project so you will be able to access the Internship - PII's 'My Drive' in the next steps
    • Choose TWE Interview Recordings by Plan # folder in Internship - PII's My drive (linked in Resources # 1.03)
    • Locate the video recording folder for RP006 inside the folder TWE Interview Recordings by Plan #
    • Copy the link of the video recording folder for RP006
      1. Choose the three vertical dots on the right side of the folder Screenshot 2023-10-10 at 5 58 13 PM
      2. Choose "Share"
      3. Choose "Copy Link"
    • In Resources # 2.06, place the link you just copied between parentheses at the end of the line with no space in between the right bracket ] and the left parenthesis (, so it turns into a hyperlink
    • Choose "Update comment" in Github and make sure all the checkboxes above have been checked
  • Customize Resource for Participants List from Interviews (UXR Excluded) spreadsheet

    • Open the RP006 Interview Recording Folder in the Internship - PII drive (Resource # 2.06)
    • Locate the TWE: IS22: RP006: Participants List from Interviews (UXR Excluded) in the folder
    • Copy the link of TWE: IS22: RP006: Participants List from Interviews (UXR Excluded)
      1. Choose the three vertical dots on the top right side of the file Screenshot 2023-10-10 at 5 58 13 PM
      2. Choose "Share"
      3. Choose "Copy Link"
    • In Resources # 2.07, place the link you just copied between parentheses at the end of the line with no space in between the right bracket ] and the left parenthesis (, so it turns into a hyperlink
    • Choose "Update comment" in Github and make sure all the checkboxes above have been checked
  • Customize Resources for the Recording and Transcript you are Assigned to

    • Check the title of this issue to identify the participant number, which comes after De-identify:
    • Open the TWE: IS22: RP006: Intern Intake Interview Recordings and Transcriptions Tracking Sheet (Resources # 2.03)
    • Locate the recording video in .mp4 format that matches the participant number in Column B of the tracking sheet
    • Copy the link of the recording that matches the participant number in Column B of the tracking sheet
    • In Resources # 2.08, place the link you just copied between parentheses at the end of the line with no space in between the right bracket ] and the left parenthesis (, so it turns into a hyperlink
    • Locate the transcript in .txt format that matches the participant number in Column C of the tracking sheet
    • Copy the link of the .txt file that matches the participant number in Column C of the tracking sheet
    • In Resources # 2.09, place the link you just copied between parentheses at the end of the line with no space in between the right bracket ] and the left parenthesis (, so it turns into a hyperlink
    • Prepare for cleaning up and de-identification by converting the .txt file into a Google Doc
      1. Click on the link in Column C Transcription Link that matches the participant number (Resources # 2.09)
      2. Choose Open with Google Docs Screenshot 2023-01-06 at 11 04 38 AM
      3. A Google Doc is generated in a new window with the same .txt file name
      4. The Google Doc is now saved into the same folder with the corresponding video and .txt file
    • Make a copy of the Google Doc that was just generated and rename it
      1. Choose "File" in the Google Doc you recently created
      2. Choose "Make a copy"
      3. In the Name text box, delete Copy of from the file name
      4. Copy
      To be de-identified-
      
      1. Paste what you copied into the beginning of the file name text box
        1. The new file name should be formatted like "To be de-identified-RP006-UX Researcher Abbreviation###-Intern Abbreviation###"
        2. An example: "De-identified Interview Transcript-RP006-U007-I001"
      2. Choose "Make a copy"
    • Copy the link of the "To be de-identified" Google Doc transcript you just created
      1. Choose "Share"
      2. Choose "Copy link" and "Done"
    • In Resources # 2.10, place the link you just copied between parentheses at the end of the line with no space in between the right bracket ] and the left parenthesis (, so it turns into a hyperlink
    • Choose "Update comment" in Github and make sure all the checkboxes above have been checked

Clean up and de-identify the transcript

  • Open the "To be de-identified" Google Doc transcript if it is not open yet (Resource # 2.10)

  • Listen to the recording to get yourself familiar with the transcript

  • Go to Resources # 1.04 to get a refresher on the cleaning up and de-identification process. This is particularly important if it is your first time cleaning up an interview transcript.

    • No need to correct all grammatical mistakes in the transcript because the transcript needs to stay authentic
    • Check Resources # 1.04.01 to understand the conventions for cleaning up and de-identifying the transcript, so you can learn to use them properly.
    • Watch the videos in Resources # 1.04.02 as they walk you through the basics on how to clean up and de-identify the transcript so we can stay consistent in this process
    • Read Resources # 1.04.03 and 1.04.04 for more best practices to clean up and de-identify an interview transcript
    • Follow the following guidelines on the transcript formatting:
      • Use the 'in-line' format as seen here in RP012 Intern 008's transcript
      • Use 6 digit format for all the timestamps, for example, 01:01:01 as hour one/minute one/second one
      • Use "UXR ###" and "Intern ###" throughout the transcript to indicate the interviewer and interviewee. See an example here in RP012 Intern 008's transcript.
        • If there were other people recorded on the transcript, name the unknown person based on the interview set-up or context, such as Program Manager, Notetaker, etc. If you are not sure what to name them, ask leads on the project for clarification.
      • If there are other numerical values in the transcript other than the timestamps, UXR ###, and Intern ###, transcribe them using APA formatting so we can easily scan the transcript. For example, write out numbers below ten as words ("one", "two"), and using numerical values for numbers ten and above ("10", "20"). For more info on numbers' formatting, please visit https://apastyle.apa.org/style-grammar-guidelines/numbers/numerals.
  • Listen to the interview recording again and clean up the inaccuracies in the transcript as you read along because the transcripts often contain mistakes since the auto transcription is not accurate

    • Separate the texts based on speakers and timestamps (timestamps should match the video file timestamps in case a researcher needs to go back and double-check the original video)
    • Add any missed words if the transcribing process missed any or to provide more context
    • Write down exactly what they say, even if it is grammatically incorrect or a topic you are not familiar with
    • If there are any typos and misidentified words in the transcript, please edit them based on what you hear in the original video because the transcribing software is not 100% accurate
    • Delete repeated words
    • Delete non-important verbal fillers such as "um" and "uh". However, you may keep interjections like “hmm”, “uh”, “woah”, “yeah”, “ohh”, “mmm” because they often contain emotions, reactions, and meanings (e.g. Mmmm [no], Mhmm [yes])
      • For verbal filler words, see Resources # 1.04.04 for more on this topic.
      • When the verbal fillers are distractions and don't serve any purpose, you may remove them.
      • When the verbal fillers can indicate the interviewee's emotions or thoughts. In this case, keep the verbal fillers and provide context to demonstrate the emotions or thoughts. E.g., (The participant hesitated for a while before coming up with the answer).
    • Where needed, add context so that a reader can understand what was happening without watching the video. E.g., he [the mentor] was helpful; or (steps off camera).
  • Read through the transcripts again and make edits to keep track of and de-identify any personally identifiable information (PII)

    • Search for the interviewee's name
    • Replace with their participant number, i.e. Intern 001 is I001.
    • Search for interviewer or unknown speakers in the transcript
    • Replace with either UXR number or their role (i.e. notetaker) associated with the project
    • Look for any other names being mentioned by the interviewee during the interviews
    • If a person's name is mentioned, write down the name and relevant information of that individual in the Participants List from Interviews (UXR Excluded) spreadsheet (Resources # 2.07), and assign a participant number to them. You may need to open the spreadsheet in a new tab for easier access.
      • Their name(s) in the transcript might be spelled wrong, so pay close attention to the original interview video.
      • If you are not sure of the roles of any of the names mentioned, please list the names, timestamps, and transcript links in the spreadsheet, and leave the role column and the participant number column blank. Then ask Research Lead or Project Lead for clarification.
      • Please check each participant type's abbreviation under the "Research Documents by Participant Type" section of the Research Output Overview Wiki Page. For example, mentor is M.
      • The participant numbers are sequential based on the time of entry and occurrence of the mention. I.e. if there are already two Hack for LA website team members being listed previously in the spreadsheet, then the next Hack for LA website team member will be HfLAWTM003.
      • If the name is already included in the sheet because other interviewees have mentioned them, please still list out their names and relevant information, and make sure to use the same role and the same participant number that has already been assigned to them so we can keep track of the people being mentioned across all interviews associated with one research plan.
      • If an interviewee repeatedly mentioned an individual throughout the interview, please list out all the timestamps when the individual was mentioned.
      • No need to include the interviewee and the UXR names since we track them in the roll call and session table.
    • After confirming the names, roles, and other relevant information of other individuals mentioned during the interviews with the Research Lead or Project Lead, use the search and replace function in Google Doc to replace their name(s) in the transcript with the participant number assigned to them in the spreadsheet
    • Open De-identified Participants List from Interviews (UXR Excluded) spreadsheet (Resources # 2.04), and enter the de-identified info based on Participants List from Interviews (UXR Excluded) spreadsheet (Resources # 2.07), so we have de-identified info in the research plan folder in the shared Internship drive
    • Read through the transcripts to search for any other identifiable information in the interviews, such as entities, places, etc.
    • Replace any personally identifiable information (PII) with non-identifiable terms. For example, if their specific school is mentioned, we should redact that info with either [high school] or [college].
    • If they mentioned any specific issues they worked on, make sure to remove the issue numbers and rephrase the issues they worked on
  • Read through the edited transcripts again

    • Focus on punctuation, readability, and formatting
    • One recommendation is to install LanguageTool Chrome Extension (see Resources 1.05) to clean up the punctuation and verbal ticks in the transcript. But no need to correct grammatical mistakes made by interviewers and interviewees.
  • When you're satisfied that the transcript is completely de-identified and cleaned up, create a new copy and save it into the shared Research folder

    • Make a new copy of the transcript by selecting "File" and "Make a copy". This is to make sure that the new version of the Google Doc does not include any editing history or PII.
    • "Copy document" window pops up
    • In the "Name" text box, delete the "Copy of To be de-identified-" text at the beginning of the file name
    • Copy
    De-identified Interview Transcript-
    
    • Paste the text you just copied to the beginning of the file name in the "Name" text box
      • The new file name should be formatted like "De-identified Interview Transcript-RP006-UX Researcher Abbreviation###-Intern Abbreviation###"
      • An example: "De-identified Interview Transcript-RP006-U007-I001"
    • Follow the steps below to save the renamed de-identified Google doc transcript into the RP006 De-identified Transcripts folder on the shared Google drive. This is to make sure that this new copy which does not include any identifiable information will be saved in the shared Google drive.
      1. Choose the folder icon under "Folder"
      2. Choose "All locations"
      3. Choose Shared drives > Internship > Internships > Research > Research by Participant Type > Intern > folder for RP006 > RP006 De-identified Transcripts
      4. Choose "Select"
      5. Select "Make a copy"
      6. The copy of the file is generated in a new browser
  • Update the de-identified transcript URL in the Recording and Transcription tracking sheet (Resources # 2.03) and Resource # 2.11

    • Copy the link of the de-identified transcript in the shared drive
      1. Choose "Share"
      2. Choose "Copy Link"
    • Open the tracking spreadsheet link in Resources # 2.03 in a new tab
    • Paste what you copied into the matching participant's cell in Column D Transcription Link - de-identified in the tracking sheet (Resources # 2.03)
    • In Resources # 2.11, place the link you just copied between parentheses at the end of the line with no space in between the right bracket ] and the left parenthesis (, so it turns into a hyperlink
    • Choose "Update comment" in GitHub and make sure all the checkboxes above have been checked

  • Review with UX Lead
  • UX Lead assigns another team member to conduct a peer review using a saved reply template if needed
  • Product sign-off
  • UXR Lead or PM removes the assignee from the Internship - PII's My Drive when closing the issue if the assignee no longer needs to access the PII drive.

Resources/Instructions

Resources for creating this issue

1.01 Wiki: Research Output Overview
1.02 Google Drive Folder: Research by Type
1.03* TWE Interview Recordings by Plan #
1.04 Guidelines to Interviews Page 8
1.04.01 Conventions for Transcribing Interviews
1.04.02 Video folder
1.04.03 Cleaning Up Zoom Transcriptions for Qualitative Research
1.04.04 Determining Best Practice for Filler Words in Captions and Transcripts
1.05 LanguageTool Chrome Extension

Resources gathered during the completion of this issue

2.01 [Wiki: Research Plan: RP006]
2.02 [Google Drive Folder: RP006]
2.03 [TWE: IS22: RP006: Intern Intake Interview Recordings and Transcriptions Tracking Sheet]
2.04 [TWE: IS22: RP006: De-identified Participants List from Interviews (UXR Excluded)]
2.05 [RP006 De-identified Transcripts Folder]
2.06* [RP006 Interview Recording Folder]
2.07* [TWE: IS22: RP006: Participants List from Interviews (UXR Excluded)]
2.08* [Intern 009 Recording Link]
2.09* [Intern 009 Transcript .txt File]
2.10* [Intern 009 To be De-identified Transcript Google Doc]
2.11 [Intern 009 De-identified Transcript Google Doc]

*This folder can only be accessed from the Internship - PII's "My Drive"
009

@github-project-automation github-project-automation bot moved this to New Issue Approval in P: TWE: Project Board Jan 22, 2025
@pandanista pandanista moved this from New Issue Approval to Prioritized Backlog in P: TWE: Project Board Jan 22, 2025
@pandanista pandanista added this to the 07.01.01 Research Analysis milestone Jan 22, 2025
@pandanista pandanista changed the title TWE: IS[Replace YY]: RP[Replace 000]: [Replace TYPE OF PARTICIPANT] [Replace TYPE OF INTERVIEW] Interview Recordings: Clean Up & De-identify: [Replace TYPE OF PARTICIPANT] [Replace PARTICIPANT NUMBER] TWE: IS22: RP006: Intern Intake Interview Recordings: Clean Up & De-identify: Intern 009 Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Prioritized Backlog
Development

No branches or pull requests

1 participant