[Bug] Issue with reward model API #1943

dmakhervaks · 2024-11-07T01:24:05Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
5. Please use English, otherwise it will be closed.

Describe the bug

We are trying to follow the reward calling API described here:

sglang/examples/runtime/reward_model.py

Line 4 in a5e0def

import requests

However, we found that the latest version of the API (0.3.5 Docker) still uses /judge.

Even after changing it to /judge, we still got:

{'error': {'message': 'Either text or input_ids should be provided.'}}
Traceback (most recent call last):
  File "/home/dave.makhervaks/code/zlm/scratches/davemakhervaks/model_hosting_service/reward_model.py", line 32, in <module>
    print("scores:", [x["embedding"] for x in response])
  File "/home/dave.makhervaks/code/zlm/scratches/davemakhervaks/model_hosting_service/reward_model.py", line 32, in <listcomp>
    print("scores:", [x["embedding"] for x in response])
TypeError: string indices must be integers

After performing the tokenization ourselves, we were able to get a valid response (matching when running locally with transformers), but it is different than the example you provided. Please see the reproduction section below.

In addition we get also get a response, but it is incorrect (doesn't match result when running locally with transformers) when we just make the following modification to the reproduction below:

conv_template = []
for i, row in data['conv']:
    conv_tokenized = rm_tokenizer.apply_chat_template([row], tokenize=False)
    conv_template.append(conv_tokenized)

json_data = {
    "text": conv_template
}

Reproduction

# launch server
# python -m sglang.launch_server --model Skywork/Skywork-Reward-Llama-3.1-8B-v0.2 --is-embedding

import requests
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

url = "http://10.20.44.233:9942"

PROMPT = (
    "What is the range of the numeric output of a sigmoid node in a neural network?"
)
RESPONSE1 = "The output of a sigmoid node is bounded between -1 and 1."
RESPONSE2 = "The output of a sigmoid node is bounded between 0 and 1."

data = {
    "conv": [
        [
            {"role": "user", "content": PROMPT},
            {"role": "assistant", "content": RESPONSE1},
        ],
        [
            {"role": "user", "content": PROMPT},
            {"role": "assistant", "content": RESPONSE2},
        ],
    ],
}


model_name = "Skywork/Skywork-Reward-Llama-3.1-8B-v0.2"
rm_tokenizer = AutoTokenizer.from_pretrained(model_name)


conv_template = []
for i, row in data['conv']:
    conv_tokenized = rm_tokenizer.apply_chat_template([row], tokenize=True)
    conv_template.append(conv_tokenized)

json_data = {
    "input_ids": conv_template
}
response = requests.post(
    url + "/judge",
    json=json_data,
).json()

print(response)
print("scores:", [x["embedding"] for x in response])

Environment

Python: 3.10.15 (main, Sep 7 2024, 18:35:33) [GCC 9.4.0]
CUDA available: True
GPU 0: NVIDIA H100 80GB HBM3
GPU 0 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.4, V12.4.131
CUDA Driver Version: 535.129.03
PyTorch: 2.4.0+cu121
sglang: 0.3.5
flashinfer: 0.1.6+cu124torch2.4
triton: 3.0.0
transformers: 4.46.1
requests: 2.32.3
tqdm: 4.66.6
numpy: 1.26.4
aiohttp: 3.10.10
fastapi: 0.115.4
hf_transfer: 0.1.8
huggingface_hub: 0.26.2
interegular: 0.3.3
packaging: 24.1
PIL: 10.4.0
psutil: 6.1.0
pydantic: 2.9.2
uvicorn: 0.32.0
uvloop: 0.21.0
zmq: 26.2.0
vllm: 0.6.3.post1
multipart: 0.0.17
openai: 1.53.0
anthropic: 0.38.0
NVIDIA Topology:
GPU0 NIC0 NIC1 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X SYS SYS 0-55,112-167 0 N/A
NIC0 SYS X PIX
NIC1 SYS PIX X

Legend:

X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks

NIC Legend:

NIC0: rocep86s0f0
NIC1: rocep86s0f1

ulimit soft: 1024

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Issue with reward model API #1943

[Bug] Issue with reward model API #1943

dmakhervaks commented Nov 7, 2024

[Bug] Issue with reward model API #1943

[Bug] Issue with reward model API #1943

Comments

dmakhervaks commented Nov 7, 2024

Checklist

Describe the bug

Reproduction

Environment