Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codegeex4 ERROR: ChatGLM4Tokenizer._pad() got an unexpected keyword argument 'padding_side'"} #2757

Open
1 of 3 tasks
Oaklight opened this issue Jan 12, 2025 · 3 comments
Open
1 of 3 tasks
Milestone

Comments

@Oaklight
Copy link

System Info / 系統信息

managed server, account without sudo privilege
singularity available:

$ singularity --version
singularity-ce version 4.1.2-focal

OS version

NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

  • docker / docker
  • pip install / 通过 pip install 安装
  • installation from source / 从源码安装

Version info / 版本信息

v1.1.0

The command used to start Xinference / 用以启动 xinference 的命令

singularity exec --fakeroot
--env XINFERENCE_MODEL_SRC=huggingface
--bind xinference/.xinference:/root/.xinference
--nv
--bind /tmp/.X11-unix:/tmp/.X11-unix
xinference/xinference_v1.1.0.sif
xinference-local -H 0.0.0.0 --log-level debug

Reproduction / 复现过程

  • Model Engine: Transformers Ccached)
  • Model Fomat: pytorch Ccached)
  • Model Size: 9(CACHED)
  • Quantization: 8-bit(cached)
  • N-GPU: 1
  • Replica: 1
  • Additional parameters passed tothe inference engine: Transformers
    • key: dtype
    • value: half

access via localhost:port/v1/chat/completions:

Error handling webview message: {
  "msg": {
    "messageId": "b88dd8cb-da49-4938-83d2-429313093b9c",
    "messageType": "llm/streamChat",
    "data": {
      "messages": [
        {
          "role": "user",
          "content": [
            {
              "type": "text",
              "text": "hi"
            }
          ]
        },
        {
          "role": "assistant",
          "content": ""
        }
      ],
      "title": "CodeGeeX4",
      "completionOptions": {}
    }
  }
}

Error: Malformed JSON sent from server: {"error": "[address=0.0.0.0:34705, pid=3795840] ChatGLM4Tokenizer._pad() got an unexpected keyword argument 'padding_side'"}

then got this error message

Expected behavior / 期待表现

should work normally. llama.cpp version is good to the exact same query

You can read THUDM/GLM-4#578 for discussion at ChatGLM4 repo

@XprobeBot XprobeBot added the gpu label Jan 12, 2025
@XprobeBot XprobeBot added this to the v1.x milestone Jan 12, 2025
@qinxuye
Copy link
Contributor

qinxuye commented Jan 14, 2025

@codingl2k1 Can you help with this?

@codingl2k1
Copy link
Contributor

The model tokenizer requires an update, related issue: https://huggingface.co/THUDM/codegeex4-all-9b/discussions/20

Just like this fix on LongWriter-glm4-9b: https://huggingface.co/THUDM/LongWriter-glm4-9b/commit/778b5712634889f5123d6c463ca383bc6dd5c621

Copy link

This issue is stale because it has been open for 7 days with no activity.

@github-actions github-actions bot added the stale label Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants