Skip to content

Conversation

@bendichter
Copy link
Contributor

@bendichter bendichter commented Oct 25, 2025

Add comprehensive support for audio and video recordings in behavioral
experiments:

- Add audio file extensions (mp3, wav) and video file extensions
  (mp4, mkv, avi) with corresponding _audio and _video suffixes
- Document usage of audio/video recordings in beh directory for
  capturing vocalizations, speech, facial expressions, and body movements
- Add metadata schema for audio/video device information and stream
  properties
- Include privacy warnings about personally identifiable information
  in human subject recordings
- Update behavioral experiments title to remove "with no neural
  recordings" restriction, clarifying data can be stored with or
  without neural recordings
- Add examples for file organization including multi-angle recordings
  and split files
- Define optional entities: task, acquisition, run, recording, split
@yarikoptic yarikoptic changed the title SCHEMA: Add audio video SCHEMA: Add audio video behavioral data support Oct 25, 2025
@yarikoptic yarikoptic added the schema Issues related to the YAML schema representation of the specification. Patch version release. label Oct 25, 2025
…ee macros

- Change section title from 'Behavioral experiments' to 'Behavioral recordings'
- Convert file tree examples to use MACROS___make_filetree_example for consistent rendering
- Address review comments from @yarikoptic in PR #2231
Copy link
Collaborator

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this makes sense to me. It would be good to get some feedback from contributors to related BEPs, such as eye-tracking (20), motion (29), stimuli (44) and physio (45). Even if this PR doesn't propose adding this as an associated file to those data types, the potential is there and it's worth getting opinions and identifying potential conflicts.

cc @bids-standard/bep029 @bids-standard/bep044
cc @mszinte @julia-pfarr @oesteban (BEP020)
cc @m-miedema @smoia @SouravKulkarni (?) (BEP045)

@effigies effigies changed the title SCHEMA: Add audio video behavioral data support [ENH] Add audio/video recordings to behavioral experiments Oct 28, 2025
@codecov
Copy link

codecov bot commented Oct 28, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.81%. Comparing base (d97bcf9) to head (cc41d49).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2231   +/-   ##
=======================================
  Coverage   82.81%   82.81%           
=======================================
  Files          22       22           
  Lines        1693     1693           
=======================================
  Hits         1402     1402           
  Misses        291      291           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@bendichter
Copy link
Contributor Author

bendichter commented Dec 15, 2025

@neuromechanist

OK, I have submitted a PR to the website to make this an official BEP here: bids-standard/bids-website#759

@bendichter
Copy link
Contributor Author

bendichter commented Jan 6, 2026

Comparison: vs PR #2022

Summary

  • This PR: Adds audio/video recordings of subjects behaving to the beh/ datatype
  • PR [ENH] BEP044 - Stim-BIDS #2022 (BEP044 - Stim-BIDS): Adds organized stimulus files to the root-level /stimuli directory

Key Differences

Aspect This PR PR #2022 (Stim-BIDS)
Location sub-XX/beh/ (subject-scoped) /stimuli/ (root-level, shared)
Suffixes _audio, _video _audio, _video, _audiovideo, _image
Audio formats .flac, .mp3, .ogg, .wav .wav, .mp3, .aac, .ogg
Video formats .mp4, .mkv, .avi .mp4, .avi, .mkv, .webm
Unique formats .flac (audio) .aac (audio), .webm (video), image formats (.jpg, .png, .svg, .webp)
Key entity recording-<label> (for multiple angles) stim-<label> (stimulus identifier)
Catalog files None (uses scans.tsv for timing) stimuli.tsv, annotations.tsv
Events linking _events.tsv alongside recordings stim_id column in events.tsv
Metadata focus Technical (AudioSampleRate, FrameRate, Height, Width, Duration, CameraPosition, AudioBitDepth) Attribution (License, Copyright, URL, Description)
Privacy concern Explicit PII warning for human subjects Not emphasized (stimuli, not subjects)
Part splitting split-<index> for continuous recordings part-<label> for stimulus segments
Annotation system Standard _events.tsv Dedicated _annot-<label>_events.tsv

Notable Distinctions

  1. _audiovideo suffix: Only in PR [ENH] BEP044 - Stim-BIDS #2022 (BEP044), explicitly distinguishes files with both audio and video streams from video-only files

  2. _image suffix: Only in PR [ENH] BEP044 - Stim-BIDS #2022 (BEP044), adds support for static visual stimuli

  3. Reusability: PR [ENH] BEP044 - Stim-BIDS #2022 emphasizes stimulus reuse across subjects/studies (centralized in /stimuli), while PR [ENH] Add audio/video recordings to behavioral experiments #2231 ties recordings to specific subjects

  4. Annotation richness: PR [ENH] BEP044 - Stim-BIDS #2022 has a more elaborate annotation system with annotations.tsv and annot-<label> entity for multiple annotation sets per stimulus

  5. New columns: PR [ENH] BEP044 - Stim-BIDS #2022 adds stim_id column to events.tsv; PR [ENH] Add audio/video recordings to behavioral experiments #2231 doesn't add new event columns

  6. Timing alignment: PR [ENH] Add audio/video recordings to behavioral experiments #2231 references scans.tsv for synchronization with other modalities; PR [ENH] BEP044 - Stim-BIDS #2022 focuses on stimulus onset/duration in events files

These PRs are complementary—one captures what the subject does (behavioral recordings), the other captures what's shown to the subject (stimuli).

@neuromechanist , thoughts?

description: |
Width of the video in pixels (for example, `1920`).
type: integer
minimum: 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be quite valuable to expose at least some details on the underlying codec(s) used for audio/video within the files, so e.g. we could assess if e.g. browser would play it etc?

@satra
Copy link
Collaborator

satra commented Jan 7, 2026

thanks @bendichter - you have described the issue. if beh contains different types of acquisitions about subject behavior then func should not just be BOLD and ASL, it should include EEG/MEG/iEEG, etc.,. i think some clarity how the organizational principles are different in behavior versus others would be good to add. the last part of your proposed sentence doesn't match up with what i wrote about func above. bids clearly says modality specific files in its documentation.

@oesteban
Copy link
Collaborator

oesteban commented Jan 7, 2026

if beh contains different types of acquisitions about subject behavior then func should not just be BOLD and ASL, it should include EEG/MEG/iEEG, etc.

+1000

Indeed, add PET to the list. Looking beyond func/ BIDS seems to be choosing to define differences between modalities so data type folders and suffixes end up encoding the same thing, and most BEPs choose this path to move forward.

@bendichter
Copy link
Contributor Author

@oesteban, @satra, Thanks for engaging on this. I'm having trouble understanding what changes you're suggesting to this PR though. Are you asking for specific modifications to the proposed specification text, or is this a broader concern about BIDS organizational principles?

If there's something actionable I can do here, I'm happy to consider it. But if the concern is about how BIDS has historically organized modalities, that seems like a separate discussion from whether audio/video/events should live together in beh/.

@oesteban
Copy link
Collaborator

oesteban commented Jan 7, 2026

is this a broader concern about BIDS organizational principles?

Yes it is in my case. I haven't been able to follow on this particular PR so please don't take my message for an objection.

@satra
Copy link
Collaborator

satra commented Jan 8, 2026

is this a broader concern about BIDS organizational principles?

same here. and it is not an objection, but that people coming to this will see two conflicting grouping mechanisms. a note added to the PR to the general section (or in the beh section) in relation to these two grouping mechanisms would help people understand the difference and perhaps help addressing (by some group) in the future.

@yarikoptic
Copy link
Collaborator

yarikoptic commented Jan 8, 2026

Here are the issues relating to potentially changing the folder organization within BIDS, I think it is better to discuss it among those

and leave this BEP in alignment with current state of allowing func/ or other potentially "functional" data modalities ('eeg/' etc) to contain associated behavioral and then beh/ to absorb behavioral data in case of absent instrumental data or where it would make more sense to keep it separate.

P.S. Also in general let's prefer commenting on the diff, instead of using the main thread here, since we cannot easily group of related comments out from within main thread.

@oesteban
Copy link
Collaborator

oesteban commented Jan 8, 2026

Here are the issues relating to potentially changing the folder organization within BIDS, I think it is better to discuss it among those

IMHO, this is a problem of today. It'd be great that BIDS 2.0 had an elegant/more consistent response. However, BIDS needs to address this for future BEPs (cc/ @ericearl)

@yarikoptic
Copy link
Collaborator

Let's continue on that in

@bendichter
Copy link
Contributor Author

@neuromechanist

  1. On the audio video vs. audiovideo labeling, we went back and forth a bit in the issue. For us it doesn't make a huge difference, since you would be able to parse that from the metadata about the streams in the json sidecar anyway. If you feel strongly about audiovideo I would be fine with changing it in interest of consistency.

  2. _image. I suppose one could take a picture of a subject performing a task task. I don't know if that's recording behavior per se, but I'd be fine with adding it if you think we should.

  3. I think one of our biggest differences is with the metadata in the sidecar files. Yours is attribution (License, Copyright, URL, Description) and this one is technical (AudioSampleRate, FrameRate, Height, Width, Duration, CameraPosition, AudioBitDepth). I don't think adding attribution to ours makes much sense. It will generally share the license of the rest of the dataset. However, I do think it might make sense for you to adopt our technical attributes. Maybe not CameraPosition, but it might be nice to be able to get AudioSampleRate, FrameRate, Height, Width, Duration without reading the data file.

  4. Our splitting is different. I think split is more consistent with existing usage. The only mention I see of part in the existing schema is:

part
Full name: Part

Format: part-

Allowed values: mag, phase, real, imag

Definition: This entity is used to indicate which component of the complex representation of the MRI signal is represented in voxel data. The part- entity is associated with the DICOM Tag 0008, 9208. Allowed label values for this entity are phase, mag, real and imag, which are typically used in part-mag/part-phase or part-real/part-imag pairs of files.

Phase images MAY be in radians or in arbitrary units. The sidecar JSON file MUST include the "Units" of the phase image. The possible options are "rad" or "arbitrary".

When there is only a magnitude image of a given type, the part entity MAY be omitted.

whereas split already has to do with splitting large files:

split
Full name: Split

Format: split-

Definition: In the case of long data recordings that exceed a file size of 2Gb, .fif files are conventionally split into multiple parts. Each of these files has an internal pointer to the next file. This is important when renaming these split recordings to the BIDS convention.

Instead of a simple renaming, files should be read in and saved under their new names with dedicated tools like MNE-Python, which will ensure that not only the filenames, but also the internal file pointers, will be updated.

It is RECOMMENDED that .fif files with multiple parts use the split- entity to indicate each part. If there are multiple parts of a recording and the optional scans.tsv is provided, all files MUST be listed separately in scans.tsv and the entries for the acq_time column in scans.tsv MUST all be identical, as described in Scans file.

though I can see in your case why you might want to use part, if you are splitting the stimulus up into logical components, like chapters of an audiobook. I don't mind terribly if we use different approaches for this.

@yarikoptic
Copy link
Collaborator

On the audio video vs. audiovideo labeling... since you would be able to parse that from the metadata

which is pretty much the case with every _suffix -- metadata and type of data would allow to "figure it out" but the point is to assist a human being and tools of quickly grasping the overall content of the file. In the scope e.g. of "stimuli" in our https://github.com/ReproNim/reprostim/ project (so not capture of beh, but rather of stimuli) it would help to tell apart audio-video stimuli vs pure video capture, thus potentially identify inconsistencies across sessions easier etc.

2. _image. I suppose one could take a picture of a subject performing a task task. I don't know if that's recording behavior per se, but I'd be fine with adding it if you think we should.

fwiw, I will not miss an opportunity of promoting my https://github.com/mykrok where I capture my photos during behavior tasks ;) on a more serious note, could be selected frames from a video for feeding into deeplabcat etc, photos done by location -specific cameras upon subject approaching that location (e.g. in a maze), etc.

  1. ... . I don't think adding attribution to ours makes much sense. It will generally share the license of the rest of the dataset....

FWIW - could be video recording of real people from e.g. YouTube thus having different terms etc.

This aspect is IMHO an interesting demonstration case pointing to the duality of such data (and thus requiring coherent annotation) -- for someone "captured behavior" could be a source of analytics (expressed emotions etc, like was done for https://studyforrest.org "stimuli" -- Forrest Gump movie) and for others -- would be used a stimuli (IIRC @mvdoc had that in his fMRI experiment), and for someone then both bringing BBQS flavor in here of bringing behavior qualities into analytics over neural data.

  1. ... split vs part

I feel also that _part is more for separating out qualitatively different parts of the larger beast (e.g. "_part-head" vs "_part-feet" if we take a video of a full body and decide to produce such "parts"), whenever _split is for sequential (in time) splitting of a larger recording.

@bendichter
Copy link
Contributor Author

bendichter commented Jan 10, 2026

@yarikoptic

I do understand why these decisions were made on the stimulus side. My question is specifically about whether we want to make changes to homogenize.

  1. Looking back and the discussion, I don't think there was a strong argument against including audiovideo for this BEP. I'll make the change.

  2. OK, yes, training frames for pose estimation does make sense here. I'll add _image.

  3. Regarding the copyright scenario: it sounds like you're describing a situation where a task recording includes a copyrighted video the subject is watching, and our recording might inherit some of those terms. I think this could happen, but in my judgment it's outside the 80/20 scope. Users can always add custom metadata to indicate this kind of thing if they want.

  4. On _part vs _split: I'd rather extend from existing definitions of entities in the BIDS schema rather than from their English meanings. In BIDS, part is specifically for complex signal components: "Allowed label values for this entity are phase, mag, real and imag, which are typically used in part-mag/part-phase or part-real/part-imag pairs of files." That's quite different from body parts or logical segments. As it currently stands, this PR would handle different cameras recording different body parts with the recording entity, which I think fits more closely with the existing definition:

I feel also that _part is more for separating out qualitatively different parts of the larger beast (e.g. "_part-head" vs "_part-feet" if we take a video of a full body and decide to produce such "parts"), whenever _split is for sequential (in time) splitting of a larger recording.

I'd rather extend from the existing definitions of entities in the BIDS schema, rather than trying to extend from the English definition. In BIDS. part "...allows label values for this entity are phase, mag, real and imag, which are typically used in part-mag/part-phase or part-real/part-imag pairs of files...." That is very different from parts of a body. As it currently stands, this PR would handle different cameras recording different body parts with the recording attribute, which I think fits more closely with the existing definition of recording:

This entity is commonly applied when continuous recordings are from different acquisition instruments, or have different sampling frequencies or start times. For example, physiological recordings with different sampling frequencies may be distinguished using labels like recording-100Hz and recording-500Hz.

I don't see a strong argument for changing this BEP so I'd like to leave it as:

  • task - OPTIONAL for audio and video recordings
  • acq - OPTIONAL, can distinguish different recording setups
  • run - OPTIONAL, for multiple recordings with identical parameters
  • recording - OPTIONAL, to differentiate simultaneous recordings from different angles, locations, or devices
  • split - OPTIONAL, for continuous recordings split into multiple files

and to not use part. Logically different segments in time, like training vs. testing, can be captured using task.

@neuromechanist
Copy link
Member

  1. On the audio video vs. audiovideo labeling
  2. _image. I suppose one could take a picture of a subject performing a task task.

I echo @yarikoptic points. Also for image, no one mentioned it could be only one image ;). Another question here is whether we should allow subdirs to group multiple images, multi-part videos? I think BEP044 allows that (should double check and make an example for).

  1. I think one of our biggest differences is with the metadata in the sidecar files. Yours is attribution (License, Copyright, URL, Description) and this one is technical (AudioSampleRate, FrameRate, Height, Width, Duration, CameraPosition, AudioBitDepth).

Yes, will do. Same for adding .flac audio type. I believe the file extensions should be the same (or very close) across the two BEPs.

  1. split vs part

BEP044 extends the definition of split entity:

current def...
For stimulus files, part-<label> can be used to distinguish different parts of a single stimulus, such as
chapters in an audiobook or segments of a long movie (for example, part-1, part-2, part-epilog,
part-chapter1).

My bias comes from the literal meaning of the word, as split (according to old Google) means to break or cause to break forcibly into parts, especially into halves or along the grain. Therefore, part is "more general" in common sense, and does not carry the bias of "especially in halves." Even following the current BIDS definitions, it is assumed that splits are the files of the same size. This assumption might be quite salient for behavioral files as size could be the deciding factor to splitting, but for stimuli, there are several other ways to create parts, importantly, based on content. For example, stim-zootopia2_part-epilog_audiovideo.mp4 is more meaningful that stim-zootopia2_splot-epilog_audiovideo.mp4.

Regarding the copyright scenario:

Please consider that videos containing participants and their responses could often have more restrictions (and therefore, licenses) compared to the main anonymized dataset.

- Add new `_audiovideo` suffix for files containing both audio and video streams
- Update documentation to distinguish between audio-only, video-only, and combined recordings
- Split AudioVideoStreams sidecar table into separate AudioStreams and VideoStreams tables
- Add example files and JSON sidecars for audiovideo recordings
- Update schema suffixes to include audiovideo definition
@bendichter
Copy link
Contributor Author

@neuromechanist can we please try to keep the discussion here to this PR? We can discuss whether split or part (or both) is more appropriate for stimuli in your thread, but I'd prefer to keep this to what should be allowed here. I would rather not support part for this BEP. Is that OK with you?

Another question here is whether we should allow subdirs to group multiple images, multi-part videos?

I originally had this in and @effigies pushed back, saying that would make this PR substantially more complex. I agree. I'd rather move forward with what we have now. We can add subdirectories later if we need to, since that would be a purely additive change to the schema.

Regarding copyright on participant recordings: that's a fair point. I'll add an optional License field to the sidecar metadata.

…iments

Add `_image` suffix for storing still images captured during behavioral
experiments in the `beh` directory. Changes include:

- Add `.jpg` and `.png` as supported image file extensions
- Document use cases: pose estimation training frames, behavioral setup
  snapshots, and extracted video frames
- Update privacy/PII warnings to include images alongside audio/video
- Add ImageProperties sidecar table and example files
- Update AudioVideoDevice macro to AudioVideoImageDevice
- Add License field to AudioVideoImageDevice sidecar schema
- Update documentation to include images in audio/video section headings
- Add note explaining licensing considerations for recordings containing
  identifiable participant data
@oesteban
Copy link
Collaborator

oesteban commented Jan 11, 2026

+1 for clarity. But, I am afraid the argument is not as strong, especially as BIDS has strived for clear definitions, reproducible metadata, and community-led extensibility. Following this argument, one could argue since the stimuli/ directory does not have a mandated structure, why other directories should have a structure.

May I ask this long response be moved to #2296, which @yarikoptic opened for that purpose? Others will likely refuse to answer here not to pollute this PR with the tangent discussion, which may lead a confused reader to think that this message from a maintainer is the last word on the issue.

@bendichter
Copy link
Contributor Author

Would *_image.{png,jpeg,json} work? I know "image" can have many different meanings, but it would be the most convenient term for us to cover photos, screenshots, depth camera photos, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

schema Issues related to the YAML schema representation of the specification. Patch version release.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BEP for audio/video capture of behaving subjects

9 participants