Skip to content

Conversation

@XinyuLiu1999
Copy link

@XinyuLiu1999 XinyuLiu1999 commented Jan 6, 2026

Summary
Fixes severe performance degradation when running operators with CLIP-based models (e.g., image_aesthetics_filter, image_text_similarity_filter) with num_proc > 1 with CPU-only by limiting PyTorch threads in worker processes.

Problem: When multiple worker processes are spawned, each worker defaults to using all CPU cores for PyTorch operations. This causes thread over-subscription (e.g., 3 workers × 8 threads = 24 threads competing for 8 cores), leading to massive context switching overhead and cache thrashing.

Solution: Call torch.set_num_threads(1) and torch.set_num_interop_threads(1) in worker processes when loading models, ensuring each worker uses only 1 thread.

Changes:

  • Add setup_worker_threads() utility function in process_utils.py
  • Call it from get_model() in model_utils.py for non-main processes

Test with num_proc=3, should complete in seconds instead of minutes with 8-core CPU

from data_juicer.core.data import NestedDataset as Dataset
from data_juicer.ops.filter.image_aesthetics_filter import ImageAestheticsFilter
from data_juicer.utils.constant import Fields

Create dataset with stats column

dataset = Dataset.from_list([{"images": ["tests/ops/data/img1.png"]}] * 10)
dataset = dataset.add_column(name=Fields.stats, column=[{}] * dataset.num_rows)

Run filter

op = ImageAestheticsFilter()
dataset = dataset.map(op.compute_stats, num_proc=3)

…subscription

When running data processing with multiple worker processes (num_proc > 1),
each worker using multiple threads causes severe performance degradation
due to thread contention. For example, 3 workers each using N threads
creates 3N threads competing for N cores.

Changes:
- Add setup_worker_threads() utility in process_utils.py that:
  - Calls torch.set_num_threads(1) and torch.set_num_interop_threads(1)
  - Only configures once per process (idempotent)

- Call setup_worker_threads() in get_model() for worker processes
  to limit threads before model loading/inference

This fix significantly improves multi-process CPU performance by preventing
thread over-subscription.
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @XinyuLiu1999, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the performance of CPU-only multi-process workloads, particularly when using models like those based on CLIP. By preventing thread over-subscription in worker processes, it resolves a critical bottleneck that previously led to massive context switching overhead and cache thrashing, transforming execution times from minutes to seconds.

Highlights

  • Performance Fix for Multi-Process CPU Workloads: Addresses severe performance degradation in CPU-only multi-process environments (e.g., num_proc > 1) when using CLIP-based models, caused by PyTorch thread over-subscription.
  • Thread Limiting Utility: Introduces a new utility function, setup_worker_threads, in process_utils.py to explicitly limit PyTorch's internal thread usage (via torch.set_num_threads and torch.set_num_interop_threads) to 1 per worker process.
  • Integration into Model Loading: The setup_worker_threads function is now invoked within get_model() in model_utils.py for all non-main processes, ensuring that models loaded in worker processes adhere to the single-thread policy.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a performance degradation issue when running multi-process workloads on CPU by limiting PyTorch threads in worker processes. The solution involves adding a setup_worker_threads utility to configure thread limits and calling it during model initialization in worker processes. The changes are logical and directly address the problem described. My review includes a couple of suggestions to improve code style and error handling robustness.

Comment on lines 42 to 52
try:
import torch
torch.set_num_threads(num_threads)
torch.set_num_interop_threads(num_threads)
logger.debug(f"Set torch threads to {num_threads}")
except ImportError:
pass
except RuntimeError as e:
# torch.set_num_interop_threads can only be called once
logger.debug(f"Could not set torch interop threads: {e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error handling in this block can be improved for better robustness and debugging experience:

  1. The except ImportError block currently uses pass, which silently ignores the case where torch is not installed. Adding a logger.debug message would be helpful for tracing.
  2. The except RuntimeError message is specific to set_num_interop_threads, but a RuntimeError could potentially be raised by other calls in the try block. A more generic error message would be more accurate.
Suggested change
try:
import torch
torch.set_num_threads(num_threads)
torch.set_num_interop_threads(num_threads)
logger.debug(f"Set torch threads to {num_threads}")
except ImportError:
pass
except RuntimeError as e:
# torch.set_num_interop_threads can only be called once
logger.debug(f"Could not set torch interop threads: {e}")
try:
import torch
torch.set_num_threads(num_threads)
torch.set_num_interop_threads(num_threads)
logger.debug(f"Set torch threads to {num_threads}")
except ImportError:
logger.debug("torch not found, skipping thread setup.")
except RuntimeError as e:
# This can happen if threads are already configured (e.g., by another library)
logger.debug(f"Could not set torch threads: {e}")

@yxdyc yxdyc requested review from HYLcool and Qirui-jiao January 6, 2026 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants