Skip to content

fix(providers): add Zhipu AI GLM-4V vision models to ZHIPU_MODELS#3276

Closed
octo-patch wants to merge 2 commits intoagentscope-ai:mainfrom
octo-patch:fix/issue-3259-zhipu-glm4v-vision-models
Closed

fix(providers): add Zhipu AI GLM-4V vision models to ZHIPU_MODELS#3276
octo-patch wants to merge 2 commits intoagentscope-ai:mainfrom
octo-patch:fix/issue-3259-zhipu-glm4v-vision-models

Conversation

@octo-patch
Copy link
Copy Markdown
Contributor

Fixes #3259

Problem

Zhipu AI's GLM-4V series vision models (glm-4v, glm-4v-plus, glm-4v-flash, glm-4.6v-flash) were missing from the ZHIPU_MODELS list and the capability baseline registry. Because these models had no pre-defined capability entry, CoPaw treated them as text-only models (supports_multimodal=false), preventing users from using image input features with these vision-capable models.

Solution

  • provider_manager.py: Add glm-4v, glm-4v-plus, glm-4v-flash, and glm-4.6v-flash to ZHIPU_MODELS with supports_image=True and supports_video=False, matching the official Zhipu AI documentation.
  • capability_baseline.py: Register the same four models with expected_image=True for all four Zhipu provider IDs (zhipu-cn, zhipu-cn-codingplan, zhipu-intl, zhipu-intl-codingplan) so that the capability baseline prober can validate them correctly.
  • test_provider_manager.py: Update the expected model list assertion in test_builtin_zhipu_providers_registered to include the four new models.

Testing

  • Updated unit test test_builtin_zhipu_providers_registered verifies that all Zhipu provider instances expose the new vision model IDs in their model list.
  • Python ast.parse confirms both modified source files have valid syntax.

octo-patch added 2 commits April 11, 2026 09:49
Some model providers (e.g. certain OpenAI-compatible APIs) return
reasoning content wrapped in <thought>...</thought> tags instead of
the <think>...</think> tags handled previously. This caused the raw
tags to be shown inline in the response rather than being collapsed.

Changes:
- tag_parser.py: add THOUGHT_START/THOUGHT_END constants and
  _THOUGHT_RE regex; update text_contains_think_tag() to match
  either tag variant; update extract_thinking_from_text() to try
  both <think> and <thought> patterns (complete and unclosed/streaming)
- openai_chat_model_compat.py: import and wire up extract_thinking_from_text
  and text_contains_think_tag so that <think>/<thought> blocks embedded
  in text content blocks are extracted into proper thinking blocks before
  downstream rendering

Fixes agentscope-ai#3206
 agentscope-ai#3259)

The GLM-4V series (glm-4v, glm-4v-plus, glm-4v-flash, glm-4.6v-flash)
are vision-capable models from Zhipu AI but were missing from the
ZHIPU_MODELS list and capability_baseline registry. This caused CoPaw
to treat them as text-only models (supports_multimodal=false), preventing
users from using image input features.

Changes:
- provider_manager.py: add glm-4v, glm-4v-plus, glm-4v-flash, and
  glm-4.6v-flash to ZHIPU_MODELS with supports_image=True
- capability_baseline.py: register the same four models with
  expected_image=True for all four Zhipu provider IDs (zhipu-cn,
  zhipu-cn-codingplan, zhipu-intl, zhipu-intl-codingplan)
- test_provider_manager.py: update expected model list assertion
@github-actions
Copy link
Copy Markdown

Welcome to CoPaw! 🐾

Hi @octo-patch, this is your 16th Pull Request.

📋 About PR Template

To help maintainers review your PR faster, please make sure to include:

  • Description - What this PR does and why
  • Type of Change - Bug fix / Feature / Breaking change / Documentation / Refactoring
  • Component(s) Affected - Core / Console / Channels / Skills / CLI / Documentation / Tests / CI/CD / Scripts
  • Checklist:
    • Run and pass pre-commit run --all-files
    • Run and pass relevant tests (pytest or as applicable)
    • Update documentation if needed
  • Testing - How to test these changes
  • Local Verification Evidence:
    pre-commit run --all-files
    # paste summary result
    
    pytest
    # paste summary result

Complete PR information helps speed up the review process. You can edit the PR description to add these details.

🙌 Join Developer Community

Thanks so much for your contribution! We'd love to invite you to join the official CoPaw developer group! You can find the Discord and DingTalk group links under the "Developer Community" section on our docs page:
https://copaw.agentscope.io/docs/community

We truly appreciate your enthusiasm—and look forward to your future contributions! 😊

We'll review your PR soon.


Tip

⭐ If you find CoPaw useful, please give us a Star!

Star CoPaw

Staying ahead

Star CoPaw on GitHub and be instantly notified of new releases.

Your star helps more developers discover this project! 🐾

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for tags in reasoning extraction and registers several new GLM vision models across Zhipu providers. It also updates the OpenAI stream response parsing to extract and prepend thinking blocks from text. A review comment identifies a logic issue where prepending thinking blocks could bypass the filtering of empty text blocks, suggesting a reordering of these operations.

Comment on lines +323 to 332
if injected_thinking_blocks:
# Prepend extracted thinking blocks before existing content.
parsed.content = injected_thinking_blocks + list(
parsed.content,
)
# Rebuild new_content index offsets after prepending.
new_content = None

if new_content is not None:
parsed.content = [b for b in new_content if b is not None]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current logic for prepending injected_thinking_blocks resets new_content to None (line 329), which effectively bypasses the filtering of empty text blocks that were marked for removal earlier in the loop. If a text block becomes empty after extracting thinking or tool-call tags, it should be removed from the final content list.

The filtering should be applied to parsed.content before prepending the new blocks to ensure the response remains clean.

Suggested change
if injected_thinking_blocks:
# Prepend extracted thinking blocks before existing content.
parsed.content = injected_thinking_blocks + list(
parsed.content,
)
# Rebuild new_content index offsets after prepending.
new_content = None
if new_content is not None:
parsed.content = [b for b in new_content if b is not None]
if new_content is not None:
parsed.content = [b for b in new_content if b is not None]
if injected_thinking_blocks:
# Prepend extracted thinking blocks before existing content.
parsed.content = injected_thinking_blocks + list(parsed.content)

@xieyxclack
Copy link
Copy Markdown
Member

Closed as non-planned. Users can add by themselves.

@xieyxclack xieyxclack closed this Apr 16, 2026
@github-project-automation github-project-automation bot moved this from Todo to Done in QwenPaw Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Bug] Model detection incorrectly classifies Zhipu AI GLM-4.6V-Flash as pure text model instead of multimodal vision model

2 participants