fix: add thread limiting for multi-process workloads to prevent over-subscription #877

XinyuLiu1999 · 2026-01-06T07:39:58Z

Summary
Fixes severe performance degradation when running operators with CLIP-based models (e.g., image_aesthetics_filter, image_text_similarity_filter) with num_proc > 1 with CPU-only by limiting PyTorch threads in worker processes.

Problem: When multiple worker processes are spawned, each worker defaults to using all CPU cores for PyTorch operations. This causes thread over-subscription (e.g., 3 workers × 8 threads = 24 threads competing for 8 cores), leading to massive context switching overhead and cache thrashing.

Solution: Call torch.set_num_threads(1) and torch.set_num_interop_threads(1) in worker processes when loading models, ensuring each worker uses only 1 thread.

Changes:

Add setup_worker_threads() utility function in process_utils.py
Call it from get_model() in model_utils.py for non-main processes

Test with num_proc=3, should complete in seconds instead of minutes with 8-core CPU

from data_juicer.core.data import NestedDataset as Dataset
from data_juicer.ops.filter.image_aesthetics_filter import ImageAestheticsFilter
from data_juicer.utils.constant import Fields

Create dataset with stats column

dataset = Dataset.from_list([{"images": ["tests/ops/data/img1.png"]}] * 10)
dataset = dataset.add_column(name=Fields.stats, column=[{}] * dataset.num_rows)

Run filter

op = ImageAestheticsFilter()
dataset = dataset.map(op.compute_stats, num_proc=3)

…subscription When running data processing with multiple worker processes (num_proc > 1), each worker using multiple threads causes severe performance degradation due to thread contention. For example, 3 workers each using N threads creates 3N threads competing for N cores. Changes: - Add setup_worker_threads() utility in process_utils.py that: - Calls torch.set_num_threads(1) and torch.set_num_interop_threads(1) - Only configures once per process (idempotent) - Call setup_worker_threads() in get_model() for worker processes to limit threads before model loading/inference This fix significantly improves multi-process CPU performance by preventing thread over-subscription.

gemini-code-assist · 2026-01-06T07:40:15Z

Summary of Changes

Hello @XinyuLiu1999, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the performance of CPU-only multi-process workloads, particularly when using models like those based on CLIP. By preventing thread over-subscription in worker processes, it resolves a critical bottleneck that previously led to massive context switching overhead and cache thrashing, transforming execution times from minutes to seconds.

Highlights

Performance Fix for Multi-Process CPU Workloads: Addresses severe performance degradation in CPU-only multi-process environments (e.g., num_proc > 1) when using CLIP-based models, caused by PyTorch thread over-subscription.
Thread Limiting Utility: Introduces a new utility function, setup_worker_threads, in process_utils.py to explicitly limit PyTorch's internal thread usage (via torch.set_num_threads and torch.set_num_interop_threads) to 1 per worker process.
Integration into Model Loading: The setup_worker_threads function is now invoked within get_model() in model_utils.py for all non-main processes, ensuring that models loaded in worker processes adhere to the single-thread policy.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request addresses a performance degradation issue when running multi-process workloads on CPU by limiting PyTorch threads in worker processes. The solution involves adding a setup_worker_threads utility to configure thread limits and calling it during model initialization in worker processes. The changes are logical and directly address the problem described. My review includes a couple of suggestions to improve code style and error handling robustness.

data_juicer/utils/model_utils.py

gemini-code-assist · 2026-01-06T07:41:20Z

data_juicer/utils/process_utils.py

+    try:
+        import torch
+        torch.set_num_threads(num_threads)
+        torch.set_num_interop_threads(num_threads)
+        logger.debug(f"Set torch threads to {num_threads}")
+    except ImportError:
+        pass
+    except RuntimeError as e:
+        # torch.set_num_interop_threads can only be called once
+        logger.debug(f"Could not set torch interop threads: {e}")


The error handling in this block can be improved for better robustness and debugging experience:

The except ImportError block currently uses pass, which silently ignores the case where torch is not installed. Adding a logger.debug message would be helpful for tracing.

The except RuntimeError message is specific to set_num_interop_threads, but a RuntimeError could potentially be raised by other calls in the try block. A more generic error message would be more accurate.

Suggested change

try:

import torch

torch.set_num_threads(num_threads)

torch.set_num_interop_threads(num_threads)

logger.debug(f"Set torch threads to {num_threads}")

except ImportError:

pass

except RuntimeError as e:

# torch.set_num_interop_threads can only be called once

logger.debug(f"Could not set torch interop threads: {e}")

try:

import torch

torch.set_num_threads(num_threads)

torch.set_num_interop_threads(num_threads)

logger.debug(f"Set torch threads to {num_threads}")

except ImportError:

logger.debug("torch not found, skipping thread setup.")

except RuntimeError as e:

# This can happen if threads are already configured (e.g., by another library)

logger.debug(f"Could not set torch threads: {e}")

XinyuLiu1999 requested a deployment to Testing January 6, 2026 07:40 — with GitHub Actions Waiting

gemini-code-assist bot reviewed Jan 6, 2026

View reviewed changes

yxdyc requested review from HYLcool and Qirui-jiao January 6, 2026 07:44

Merge branch 'datajuicer:main' into claude/thread-limiting-fix-r3tIM

192ba93

XinyuLiu1999 force-pushed the claude/thread-limiting-fix-r3tIM branch from cb7e9ef to 192ba93 Compare January 6, 2026 08:25

XinyuLiu1999 requested a deployment to Testing January 6, 2026 08:25 — with GitHub Actions Waiting

XinyuLiu1999 requested a deployment to Testing January 7, 2026 06:35 — with GitHub Actions Waiting

XinyuLiu1999 force-pushed the claude/thread-limiting-fix-r3tIM branch from 060a174 to 192ba93 Compare January 7, 2026 06:36

XinyuLiu1999 temporarily deployed to Testing January 7, 2026 06:36 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: add thread limiting for multi-process workloads to prevent over-subscription #877

fix: add thread limiting for multi-process workloads to prevent over-subscription #877

Uh oh!

XinyuLiu1999 commented Jan 6, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Jan 6, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: add thread limiting for multi-process workloads to prevent over-subscription #877

Are you sure you want to change the base?

fix: add thread limiting for multi-process workloads to prevent over-subscription #877

Uh oh!

Conversation

XinyuLiu1999 commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Create dataset with stats column

Run filter

Uh oh!

gemini-code-assist bot commented Jan 6, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

XinyuLiu1999 commented Jan 6, 2026 •

edited

Loading