fix(providers): add Zhipu AI GLM-4V vision models to ZHIPU_MODELS#3276
fix(providers): add Zhipu AI GLM-4V vision models to ZHIPU_MODELS#3276octo-patch wants to merge 2 commits intoagentscope-ai:mainfrom
Conversation
Some model providers (e.g. certain OpenAI-compatible APIs) return reasoning content wrapped in <thought>...</thought> tags instead of the <think>...</think> tags handled previously. This caused the raw tags to be shown inline in the response rather than being collapsed. Changes: - tag_parser.py: add THOUGHT_START/THOUGHT_END constants and _THOUGHT_RE regex; update text_contains_think_tag() to match either tag variant; update extract_thinking_from_text() to try both <think> and <thought> patterns (complete and unclosed/streaming) - openai_chat_model_compat.py: import and wire up extract_thinking_from_text and text_contains_think_tag so that <think>/<thought> blocks embedded in text content blocks are extracted into proper thinking blocks before downstream rendering Fixes agentscope-ai#3206
agentscope-ai#3259) The GLM-4V series (glm-4v, glm-4v-plus, glm-4v-flash, glm-4.6v-flash) are vision-capable models from Zhipu AI but were missing from the ZHIPU_MODELS list and capability_baseline registry. This caused CoPaw to treat them as text-only models (supports_multimodal=false), preventing users from using image input features. Changes: - provider_manager.py: add glm-4v, glm-4v-plus, glm-4v-flash, and glm-4.6v-flash to ZHIPU_MODELS with supports_image=True - capability_baseline.py: register the same four models with expected_image=True for all four Zhipu provider IDs (zhipu-cn, zhipu-cn-codingplan, zhipu-intl, zhipu-intl-codingplan) - test_provider_manager.py: update expected model list assertion
|
Hi @octo-patch, this is your 16th Pull Request. 📋 About PR TemplateTo help maintainers review your PR faster, please make sure to include:
Complete PR information helps speed up the review process. You can edit the PR description to add these details. 🙌 Join Developer CommunityThanks so much for your contribution! We'd love to invite you to join the official CoPaw developer group! You can find the Discord and DingTalk group links under the "Developer Community" section on our docs page: We truly appreciate your enthusiasm—and look forward to your future contributions! 😊 We'll review your PR soon. |
There was a problem hiding this comment.
Code Review
This pull request adds support for tags in reasoning extraction and registers several new GLM vision models across Zhipu providers. It also updates the OpenAI stream response parsing to extract and prepend thinking blocks from text. A review comment identifies a logic issue where prepending thinking blocks could bypass the filtering of empty text blocks, suggesting a reordering of these operations.
| if injected_thinking_blocks: | ||
| # Prepend extracted thinking blocks before existing content. | ||
| parsed.content = injected_thinking_blocks + list( | ||
| parsed.content, | ||
| ) | ||
| # Rebuild new_content index offsets after prepending. | ||
| new_content = None | ||
|
|
||
| if new_content is not None: | ||
| parsed.content = [b for b in new_content if b is not None] |
There was a problem hiding this comment.
The current logic for prepending injected_thinking_blocks resets new_content to None (line 329), which effectively bypasses the filtering of empty text blocks that were marked for removal earlier in the loop. If a text block becomes empty after extracting thinking or tool-call tags, it should be removed from the final content list.
The filtering should be applied to parsed.content before prepending the new blocks to ensure the response remains clean.
| if injected_thinking_blocks: | |
| # Prepend extracted thinking blocks before existing content. | |
| parsed.content = injected_thinking_blocks + list( | |
| parsed.content, | |
| ) | |
| # Rebuild new_content index offsets after prepending. | |
| new_content = None | |
| if new_content is not None: | |
| parsed.content = [b for b in new_content if b is not None] | |
| if new_content is not None: | |
| parsed.content = [b for b in new_content if b is not None] | |
| if injected_thinking_blocks: | |
| # Prepend extracted thinking blocks before existing content. | |
| parsed.content = injected_thinking_blocks + list(parsed.content) |
|
Closed as non-planned. Users can add by themselves. |

Fixes #3259
Problem
Zhipu AI's GLM-4V series vision models (
glm-4v,glm-4v-plus,glm-4v-flash,glm-4.6v-flash) were missing from theZHIPU_MODELSlist and the capability baseline registry. Because these models had no pre-defined capability entry, CoPaw treated them as text-only models (supports_multimodal=false), preventing users from using image input features with these vision-capable models.Solution
provider_manager.py: Addglm-4v,glm-4v-plus,glm-4v-flash, andglm-4.6v-flashtoZHIPU_MODELSwithsupports_image=Trueandsupports_video=False, matching the official Zhipu AI documentation.capability_baseline.py: Register the same four models withexpected_image=Truefor all four Zhipu provider IDs (zhipu-cn,zhipu-cn-codingplan,zhipu-intl,zhipu-intl-codingplan) so that the capability baseline prober can validate them correctly.test_provider_manager.py: Update the expected model list assertion intest_builtin_zhipu_providers_registeredto include the four new models.Testing
test_builtin_zhipu_providers_registeredverifies that all Zhipu provider instances expose the new vision model IDs in their model list.ast.parseconfirms both modified source files have valid syntax.