Problem
The V1 and V2 scorers share approximately 80% of their code, violating DRY principles.
Files affected:
src/judge/scoring/v1/scorer.py
src/judge/scoring/v2/scorer.py
Duplicated Methods
The following methods are identical or near-identical between V1 and V2:
| Method |
Status |
_call_llm() |
Identical |
_parse_single_response() |
Identical |
_parse_batch_response() |
Identical |
Proposed Solution
Extract shared code into a BaseScorer class:
class BaseScorer:
def __init__(self, model: str = "openai/gpt-4o"):
self.model = model
self.char_loader = CharacteristicLoader()
self.prompt_loader = PromptLoader(characteristic_loader=self.char_loader)
def _call_llm(self, prompt: str) -> str:
...
def _parse_single_response(self, response: str, characteristic_id: str) -> Score:
...
def _parse_batch_response(self, response: str) -> list[Score]:
...
Then have V1Scorer and V2Scorer extend it, only implementing:
score_all()
score_single()
_build_context()
_build_batch_prompt()
_build_single_prompt()
Problem
The V1 and V2 scorers share approximately 80% of their code, violating DRY principles.
Files affected:
src/judge/scoring/v1/scorer.pysrc/judge/scoring/v2/scorer.pyDuplicated Methods
The following methods are identical or near-identical between V1 and V2:
_call_llm()_parse_single_response()_parse_batch_response()Proposed Solution
Extract shared code into a
BaseScorerclass:Then have
V1ScorerandV2Scorerextend it, only implementing:score_all()score_single()_build_context()_build_batch_prompt()_build_single_prompt()