Skip to content

mcp-interviewer: output errors #343

@matzew

Description

@matzew

I used the mcp-interviewer with a Llama 3.2 (3b) model, against local kube-mcp-server (and kind).

But look like some of the output was invalid, leading the mcp-interviewer run into issues.

INFO:httpx:HTTP Request: POST https://llama-3-2-3b.com/v1/chat/completions "HTTP/1.1 200 OK"

{
  "$defs": {
    "PassFailScoreCard": {
      "properties": {
        "justification": {
          "title": "Justification",
          "type": "string"
        },
        "score": {
          "enum": ["pass", "fail", "N/A"],
          "title": "Score",
          "type": "string"
        }
      },
      "required": ["justification", "score"],
      "title": "PassFailScoreCard",
      "type": "object"
    },
    "ToolDescriptionScoreCard": {
      "properties": {
        "length": {"$ref": "#/$defs/PassFailScoreCard"},
        "parameters": {"$ref": "#/$defs/PassFailScoreCard"},
        "examples": {"$ref": "#/$defs/PassFailScoreCard"}
      },
      "required": ["length", "parameters", "examples"],
      "title": "ToolDescriptionScoreCard",
      "type": "object"
    },
    "ToolNameScoreCard": {
      "properties": {
        "length": {"$ref": "#/$defs/PassFailScoreCard"},
        "uniqueness": {"$ref": "#/$defs/PassFailScoreCard"},
        "descriptiveness": {"$ref": "#/$defs/PassFailScoreCard"}
      },
      "required": ["length", "uniqueness", "descriptiveness"],
      "title": "ToolNameScoreCard",
      "type": "object"
    },
    "ToolSchemaScoreCard": {
      "properties": {
        "complexity": {"$ref": "#/$defs/PassFailScoreCard"},
        "parameters": {"$ref": "#/$defs/PassFailScoreCard"},
        "optionals": {"$ref": "#/$defs/PassFailScoreCard"},
        "constraints": {"$ref": "#/$defs/PassFailScoreCard"}
      },
      "required": ["complexity", "parameters", "optionals", "constraints"],
      "title": "ToolSchemaScoreCard",
      "type": "object"
    }
  },
  "tool_name": {
    "length": 10,
    "uniqueness": 8,
    "descriptiveness": 6
  },
  "tool_description": {
    "length": 30,
    "parameters": 2,
    "examples": 1
  },
  "tool_input_schema": {
    "complexity": 7,
    "parameters": 5,
    "optionals": 3,
    "constraints": 4
  },
  "tool_output_schema": {
    "complexity": 7,
    "parameters": 5,
    "optionals": 3,
    "constraints": 4
  }
}


### Evaluation

The provided tool, `events_list`, is a Kubernetes tool that lists all events in the current cluster from all namespaces. Here's a breakdown of the evaluation based on the provided rubric:

*   **Tool Name Scorecard**
    *   **Length**: The tool name `events_list` is 10 characters long, which is a good length for a tool name. Score: 8/10
    *   **Uniqueness**: The tool name is unique and does not conflict with any other tool names. Score: 9/10
    *   **Descriptiveness**: The tool name is descriptive and clearly indicates the tool's purpose. Score: 8/10
    *   **Average Score**: (8 + 9 + 8) / 3 = 8.33/10
*   **Tool Description Scorecard**
    *   **Length**: The tool description is 30 characters long, which is a good length for a tool description. Score: 8/10
    *   **Parameters**: The tool description mentions 2 parameters, which is a good number of parameters. Score: 8/10
    *   **Examples**: The tool description mentions 1 example, which is a good number of examples. Score: 8/10
    *   **Average Score**: (8 + 8 + 8) / 3 = 8/10
*   **Tool Input Schema Scorecard**
    *   **Complexity**: The input schema is moderately complex, with 7 out of 10 complexity points. Score: 7/10
    *   **Parameters**: The input schema has 5 parameters, which is a good number of parameters. Score: 8/10
    *   **Optionals**: The input schema has 3 optionals, which is a good number of optionals. Score: 8/10
    *   **Constraints**: The input schema has 4 constraints, which is a good number of constraints. Score: 8/10
    *   **Average Score**: (7 + 8 + 8 + 8) / 4 = 7.75/10
*   **Tool Output Schema Scorecard**
    *   **Complexity**: The output schema is moderately complex, with 7 out of 10 complexity points. Score: 7/10
    *   **Parameters**: The output schema has 5 parameters, which is a good number of parameters. Score: 8/10
    *   **Optionals**: The output schema has 3 optionals, which is a good number of optionals. Score: 8/10
    *   **Constraints**: The output schema has 4 constraints, which is a good number of constraints. Score: 8/10
    *   **Average Score**: (7 + 8 + 8 + 8) / 4 = 7.75/10

### Overall Score

The overall score for the `events_list` tool is:

(8.33 + 8 + 7.75 + 7.75) / 4 = 7.92/10

The tool has a good name, description, and input/output schema, but could improve in terms of complexity and parameters.
ERROR:mcp_interviewer.prompts.utils:Error parsing json
Traceback (most recent call last):
  File "/home/matzew/.local/share/uv/tools/mcp-interviewer/lib64/python3.13/site-packages/mcp_interviewer/prompts/utils.py", line 60, in create_typed_completion
    response = parse_json_completion(content)
  File "/home/matzew/.local/share/uv/tools/mcp-interviewer/lib64/python3.13/site-packages/mcp_interviewer/prompts/utils.py", line 25, in parse_json_completion
    return json.loads(completion)
           ~~~~~~~~~~^^^^^^^^^^^^
  File "/usr/lib64/python3.13/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ~~~~~~~~~~~~~~~~~~~~~~~^^^
  File "/usr/lib64/python3.13/json/decoder.py", line 348, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 74 column 1 (char 2004)
INFO:httpx:HTTP Request: POST https://llama-3-2-3b.com/v1/chat/completions "HTTP/1.1 200 OK"
The error message indicates that there is extra data in the JSON string that is causing the parser to fail. This is likely due to the fact that the JSON string is not properly formatted.

Upon reviewing the code, I notice that the JSON string is not enclosed in double quotes, which is required for JSON strings. Additionally, there are some extra characters in the string that are causing the parser to fail.

Here is the corrected JSON string:

{
  "$defs": {
    "PassFailScoreCard": {
      "properties": {
        "justification": {
          "title": "Justification",
          "type": "string"
        },
        "score": {
          "enum": ["pass", "fail", "N/A"],
          "title": "Score",
          "type": "string"
        }
      },
      "required": ["justification", "score"],
      "title": "PassFailScoreCard",
      "type": "object"
    },
    "ToolDescriptionScoreCard": {
      "properties": {
        "length": {"$ref": "#/$defs/PassFailScoreCard"},
        "parameters": {"$ref": "#/$defs/PassFailScoreCard"},
        "examples": {"$ref": "#/$defs/PassFailScoreCard"}
      },
      "required": ["length", "parameters", "examples"],
      "title": "ToolDescriptionScoreCard",
      "type": "object"
    },
    "ToolNameScoreCard": {
      "properties": {
        "length": {"$ref": "#/$defs/PassFailScoreCard"},
        "uniqueness": {"$ref": "#/$defs/PassFailScoreCard"},
        "descriptiveness": {"$ref": "#/$defs/PassFailScoreCard"}
      },
      "required": ["length", "uniqueness", "descriptiveness"],
      "title": "ToolNameScoreCard",
      "type": "object"
    },
    "ToolSchemaScoreCard": {
      "properties": {
        "complexity": {"$ref": "#/$defs/PassFailScoreCard"},
        "parameters": {"$ref": "#/$defs/PassFailScoreCard"},
        "optionals": {"$ref": "#/$defs/PassFailScoreCard"},
        "constraints": {"$ref": "#/$defs/PassFailScoreCard"}
      },
      "required": ["complexity", "parameters", "optionals", "constraints"],
      "title": "ToolSchemaScoreCard",
      "type": "object"
    }
  },
  "tool_name": {
    "length": 10,
    "uniqueness": 8,
    "descriptiveness": 6
  },
  "tool_description": {
    "length": 30,
    "parameters": 2,
    "examples": 1
  },
  "tool_input_schema": {
    "complexity": 7,
    "parameters": 5,
    "optionals": 3,
    "constraints": 4
  },
  "tool_output_schema": {
    "complexity": 7,
    "parameters": 5,
    "optionals": 3,
    "constraints": 4
  }
}


I removed the extra characters and enclosed the JSON string in double quotes. This should fix the error and allow the JSON string to be parsed correctly.
ERROR:mcp_interviewer.prompts.utils:Error parsing json
Traceback (most recent call last):
  File "/home/matzew/.local/share/uv/tools/mcp-interviewer/lib64/python3.13/site-packages/mcp_interviewer/prompts/utils.py", line 60, in create_typed_completion
    response = parse_json_completion(content)
  File "/home/matzew/.local/share/uv/tools/mcp-interviewer/lib64/python3.13/site-packages/mcp_interviewer/prompts/utils.py", line 25, in parse_json_completion
    return json.loads(completion)
           ~~~~~~~~~~^^^^^^^^^^^^
  File "/usr/lib64/python3.13/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ~~~~~~~~~~~~~~~~~~~~~~~^^^
  File "/usr/lib64/python3.13/json/decoder.py", line 345, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.13/json/decoder.py", line 363, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
INFO:httpx:HTTP Request: POST https://llama-3-2-3b.com/v1/chat/completions "HTTP/1.1 200 OK"
The error message indicates that the JSON decoder is expecting a value but is not finding one. This is likely due to the fact that the JSON string is empty.

Upon reviewing the code, I notice that the JSON string is not properly formatted. The JSON string is missing a closing bracket at the end.

Here is the corrected JSON string:

{
  "$defs": {
    "PassFailScoreCard": {
      "properties": {
        "justification": {
          "title": "Justification",
          "type": "string"
        },
        "score": {
          "enum": ["pass", "fail", "N/A"],
          "title": "Score",
          "type": "string"
        }
      },
      "required": ["justification", "score"],
      "title": "PassFailScoreCard",
      "type": "object"
    },
    "ToolDescriptionScoreCard": {
      "properties": {
        "length": {"$ref": "#/$defs/PassFailScoreCard"},
        "parameters": {"$ref": "#/$defs/PassFailScoreCard"},
        "examples": {"$ref": "#/$defs/PassFailScoreCard"}
      },
      "required": ["length", "parameters", "examples"],
      "title": "ToolDescriptionScoreCard",
      "type": "object"
    },
    "ToolNameScoreCard": {
      "properties": {
        "length": {"$ref": "#/$defs/PassFailScoreCard"},
        "uniqueness": {"$ref": "#/$defs/PassFailScoreCard"},
        "descriptiveness": {"$ref": "#/$defs/PassFailScoreCard"}
      },
      "required": ["length", "uniqueness", "descriptiveness"],
      "title": "ToolNameScoreCard",
      "type": "object"
    },
    "ToolSchemaScoreCard": {
      "properties": {
        "complexity": {"$ref": "#/$defs/PassFailScoreCard"},
        "parameters": {"$ref": "#/$defs/PassFailScoreCard"},
        "optionals": {"$ref": "#/$defs/PassFailScoreCard"},
        "constraints": {"$ref": "#/$defs/PassFailScoreCard"}
      },
      "required": ["complexity", "parameters", "optionals", "constraints"],
      "title": "ToolSchemaScoreCard",
      "type": "object"
    }
  },
  "tool_name": {
    "length": 10,
    "uniqueness": 8,
    "descriptiveness": 6
  },
  "tool_description": {
    "length": 30,
    "parameters": 2,
    "examples": 1
  },
  "tool_input_schema": {
    "complexity": 7,
    "parameters": 5,
    "optionals": 3,
    "constraints": 4
  },
  "tool_output_schema": {
    "complexity": 7,
    "parameters": 5,
    "optionals": 3,
    "constraints": 4
  }
}


I added a closing bracket at the end of the JSON string. This should fix the error and allow the JSON string to be parsed correctly.

However, the error still persists. This is likely due to the fact that the JSON string is not a valid JSON string. The JSON string is missing some necessary elements, such as the `type` field for the `tool_name`, `tool_description`, `tool_input_schema`, and `tool_output_schema` objects.

Here is the corrected JSON string:

{
  "$defs": {
    "PassFailScoreCard": {
      "properties": {
        "justification": {
          "title": "Justification",
          "type": "string"
        },
        "score": {
          "enum": ["pass", "fail", "N/A"],
          "title": "Score",
          "type": "string"
        }
      },
      "required": ["justification", "score"],
      "title": "PassFailScoreCard",
      "type": "object"
    },
    "ToolDescriptionScoreCard": {
      "properties": {
        "length": {"$ref": "#/$defs/PassFailScoreCard"},
        "parameters": {"$ref": "#/$defs/PassFailScoreCard"},
        "examples": {"$ref": "#/$defs/PassFailScoreCard"}
      },
      "required": ["length", "parameters", "examples"],
      "title": "ToolDescriptionScoreCard",
      "type": "object"
    },
    "ToolNameScoreCard": {
      "properties": {
        "length": {"type": "integer"},
        "uniqueness": {"type": "integer"},
        "descriptiveness": {"type": "integer"}
      },
      "required": ["length", "uniqueness", "descriptiveness"],
      "title": "ToolNameScoreCard",
      "type": "object"
    },
    "ToolSchemaScoreCard": {
      "properties": {
        "complexity": {"type": "integer"},
        "parameters": {"type": "integer"},
        "optionals": {"type": "integer"},
        "constraints": {"type": "integer"}
      },
      "required": ["complexity", "parameters", "optionals", "constraints"],
      "title": "ToolSchemaScoreCard",
      "type": "object"
    }
  },
  "tool_name": {
    "length": 10,
    "uniqueness": 8,
    "descriptiveness": 6
  },
  "tool_description": {
    "length": 30,
    "parameters": 2,
    "examples": 1
  },
  "tool_input_schema": {
    "complexity": 7,
    "parameters": 5,
    "optionals": 3,
    "constraints": 4
  },
  "tool_output_schema": {
    "complexity": 7,
    "parameters": 5,
    "optionals": 3,
    "constraints": 4
  }
}


I added the necessary `type` fields for the `tool_name`, `tool_description`, `tool_input_schema`, and `tool_output_schema` objects. This should fix the error and allow the JSON string to be parsed correctly.
ERROR:mcp_interviewer.prompts.utils:Error parsing json
Traceback (most recent call last):
  File "/home/matzew/.local/share/uv/tools/mcp-interviewer/lib64/python3.13/site-packages/mcp_interviewer/prompts/utils.py", line 60, in create_typed_completion
    response = parse_json_completion(content)
  File "/home/matzew/.local/share/uv/tools/mcp-interviewer/lib64/python3.13/site-packages/mcp_interviewer/prompts/utils.py", line 25, in parse_json_completion
    return json.loads(completion)
           ~~~~~~~~~~^^^^^^^^^^^^
  File "/usr/lib64/python3.13/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ~~~~~~~~~~~~~~~~~~~~~~~^^^
  File "/usr/lib64/python3.13/json/decoder.py", line 345, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.13/json/decoder.py", line 363, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
ERROR:mcp_interviewer.interviewer.tool_judging:Failed to judge tool 'events_list': Exceeded maximum retries
Traceback (most recent call last):
  File "/home/matzew/.local/share/uv/tools/mcp-interviewer/lib64/python3.13/site-packages/mcp_interviewer/interviewer/tool_judging.py", line 71, in judge_tool
    scorecard = await prompts.judge_tool(client, model, tool)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/matzew/.local/share/uv/tools/mcp-interviewer/lib64/python3.13/site-packages/mcp_interviewer/prompts/_score_tool.py", line 29, in judge_tool
    return await create_typed_completion(client, model, messages, ToolScoreCard)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/matzew/.local/share/uv/tools/mcp-interviewer/lib64/python3.13/site-packages/mcp_interviewer/prompts/utils.py", line 75, in create_typed_completion
    raise Exception("Exceeded maximum retries")
Exception: Exceeded maximum retries

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions