Skip to content

Conversation

@iamsims
Copy link
Collaborator

@iamsims iamsims commented Nov 19, 2025

Summary

  • Implements LLM-based reranker that evaluates search results using configurable criteria, weights and fields.
  • In addition to sorting provides the label for each critera
  • Two implemnetations:
    - Single LLM call per result evaluates all criteria simultaneously commit hash (02eb523)
    - Multiple LLM call per result (for per criteria) commit hash (166f475)
  • Supports weighted multi-criteria scoring with customizable categories

@github-actions
Copy link

❌ Tests failed (exit code: 1)

📊 Test Results

  • Passed: 548
  • Failed: 5
  • Skipped: 7
  • Warnings: 164
  • Coverage: 79%

Branch: feature/llm-reranker
PR: #278
Commit: 0386629

📋 Full coverage report and logs are available in the workflow run.

@NISH1001 NISH1001 self-requested a review November 20, 2025 21:04
@NISH1001
Copy link
Collaborator

@iamsims add the detail # Usage sectoin as well in the pr description.

See some of the older PRs we have done:
eg #279

Copy link
Collaborator

@NISH1001 NISH1001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial review

Comment on lines +212 to +219
agent_system_prompt: str = Field(
default=(
"You are an expert at evaluating search results. "
"Analyze the provided result for the query against all given criteria and "
"select the most appropriate category for each. Provide clear reasoning."
),
description="System prompt for the internal scoring agent",
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have this get from os.getenv as well so that we can globally modify whenever we want...and default to this value you have. something like AKD_LLM_RERANKER_SYSTEM_PROMPT

Comment on lines +236 to +240
ScoringCriterion(
name="Processing Level",
description="How well does this result match the required processing level?",
weight=0.5,
),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems too specific to data search agent. we need to find some general criteria that can be globally applied to any usecase...

debug: Enable debug logging
"""
super().__init__(config=config, debug=debug)
self.config: LLMRerankerToolConfig = self.config # type hint
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this...it will automatically be done through super()...


class ScoringAgent(LiteLLMInstructorBaseAgent):
input_schema = DummyInput
output_schema = dynamic_scoring_model
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to anotehr internal method like _setup_scoring_agent or something

and we can just do `self.scoring_agent = self._setup_scoring_agent(model....)

Similar to relevancy-ranker.py approach - creates explicit named fields
for each criterion so LLM can see them in the JSON schema.
"""
from pydantic import create_model
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to top-level import

)

if self.debug:
print(formatted_prompt)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger.debug(...)

},
]

print(messages)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove print stattement...or add if self.debug: logger.debug(...)

def create_reranker(
reranker_type: RerankerType,
config: RerankerToolConfig | None = None,
config: RerankerToolConfig | LLMRerankerToolConfig | None = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically don't need LLMRerankerToolConfig because it's also a type of RerankerToolConfig. Redundant type hint

Comment on lines +471 to +477

response = await self.scoring_agent.get_response_async(
messages=messages,
)

results = {}
response_dict = response.model_dump()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we do this at agent.arun level? since it's one-level higher abstraction. is it possible to do it with the formatted_prompt? you have to convert it to ap ydantic input schema

@github-actions
Copy link

github-actions bot commented Dec 2, 2025

❌ Tests failed (exit code: 1)

📊 Test Results

  • Passed: 548
  • Failed: 5
  • Skipped: 7
  • Warnings: 169
  • Coverage: 79%

Branch: feature/llm-reranker
PR: #278
Commit: b59272c

📋 Full coverage report and logs are available in the workflow run.

@github-actions
Copy link

github-actions bot commented Dec 2, 2025

❌ Tests failed (exit code: 1)

📊 Test Results

  • Passed: 548
  • Failed: 5
  • Skipped: 7
  • Warnings: 167
  • Coverage: 79%

Branch: feature/llm-reranker
PR: #278
Commit: ce52969

📋 Full coverage report and logs are available in the workflow run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants