Overview

X-AnyLabeling Visual Question Answering (VQA) Tool is a professional system designed for multimodal image question-answering dataset annotation. The tool not only supports the creation of image-based question-answer pairs but also integrates intelligent assistance, offering a wide variety of input components and extensive configurability. With its flexibility to adapt to different annotation tasks, it provides high-quality training data for supervised fine-tuning, reinforcement learning post-training, and similar tasks.

VQA-v2.mp4

Launching the Tool

To open the VQA tool, ensure that the main window’s image directory is loaded. Then, either click the VQA icon in the left toolbar of the main window or use the following keyboard shortcut:

Windows/Linux: Ctrl + 2
macOS: ⌘ + 2

On startup, the system automatically loads the default configuration from the following path. You may modify it as needed:

~/xanylabeling_data/vqa/components.json

Tutorial

The VQA tool adopts a dual-panel layout: the left panel displays image previews, while the right one provides annotation controls.

Left Panel – Image Preview

Filename and Progress Indicator: Shows the current image filename and its position within the dataset (e.g., 000000000154.jpg (33/128)).
Image Preview Area: Displays the image centered on the panel with adaptive zoom.
Panel Toggle: Use the sidebar icon to expand or collapse the left panel.

Right Panel – Annotation Controls

Toolbar Buttons:

Button	Description
Export Labels	Export annotations as JSONL format
Clear All	Remove all annotations for the current image
Add Component	Add a new annotation component
Del Component	Delete an existing component

Annotation Components:

Component	Type	Description
Text Input	QLineEdit	For open-ended QA, such as image descriptions or detailed answers
Radio Buttons	QRadioButton	For single-choice tasks, such as task type selection or dataset split
Checkboxes	QCheckBox	For multi-choice tasks, such as image tagging or attribute labeling
Dropdown Menu	QComboBox	For single-choice tasks with many options, supports custom lists

For text input components, the system integrates powerful AI assistance to improve annotation efficiency. To enable this feature, follow the configuration instructions in the Chatbot section.

Once configured, you can open the AI assistant dialog by clicking the magic wand (🪄) icon in the title bar.

The system supports both text-only and multimodal prompts with various reference tokens:

Basic References

@image: References the current image for AI analysis
@text: References the current text input field content

Cross-Widget References

@widget.component_name: References other QLineEdit component values, e.g., @widget.question references the "question" component

Label Data References

@label.shapes: References all annotation shapes in the current image
@label.imagePath: References the image file path
@label.imageHeight: References the image height
@label.imageWidth: References the image width
@label.flags: References annotation flags

Usage Examples

Describe objects in the image: @image
Analyze with existing annotations: @image Analyze based on shapes @label.shapes
Reference other components: Generate answer based on question "@widget.question"

To further enhance efficiency and reusability, the tool includes a prompt template gallery. Predefined templates are available for common use cases, and users can freely add, edit, or delete custom templates. Templates help build high-quality prompts quickly, improving annotation speed and consistency.

Hovering over a template displays the full content in a tooltip for quick preview. For custom templates, double-clicking a template field allows you to edit the title and content.

Data Management

X-AnyLabeling uses an autosave mechanism to ensure that no annotation work is lost. Annotations are automatically saved in JSON format in the same directory as the corresponding image. For VQA tasks, all annotation data is stored under the vqaData field. This field contains structured data collected through the configured components:

{
  "version": "3.2.1",
  "flags": {},
  "shapes": [],
  ...
  "vqaData": {
    "question": "How many zebras are there here?",
    "answer": 3,
    "split": "train",
    "task": "Counting",
    "tags": [
      "natural"
    ]
  },
  "imagePath": "0000000000154.jpg",
  "imageHeight": 640,
  "imageWidth": 480
}

After completing annotation tasks, click the Export Labels button to export the data. The export dialog provides flexible field selection, including:

Basic Fields: Image filename, width, and height
Custom Component Fields: All configured components and their corresponding data

Exported data is saved in JSONL format, with one record per line. Example output:

{"image": "0000000000154.jpg", "width": 640, "height": 480, "question": "How many zebras are in the image?", "answer": 3, "split": "train"}
{"image": "0000000000155.jpg", "width": 640, "height": 480, "question": "What is the cat doing?", "answer": "sleeping", "split": "val"}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overview

Launching the Tool

Tutorial

Left Panel – Image Preview

Right Panel – Annotation Controls

Data Management

FilesExpand file tree

vqa.md

Latest commit

History

vqa.md

File metadata and controls

Overview

Launching the Tool

Tutorial

Left Panel – Image Preview

Right Panel – Annotation Controls

Data Management