-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Langchain #216
base: main
Are you sure you want to change the base?
Conversation
…into langchain Merging evaluation pipeline together
…langchain Updating the LLM setup, as well as new tests and protocols
…langchain Conflicts: pyproject.toml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @batwood-1 , i have left some comments. I feel some changes are needed here. You can start addressing my comments and we can meet early Jan and discuss this merge in person. what do you think?
@batwood-1 thank you for addressing some of my comments. We will check this deeply tomorrow in person |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed all comments, check out latest changes under 'fixed evaluations'
@@ -52,7 +52,12 @@ numpy = "~=1.26" | |||
umap-learn = "~=0.5" | |||
scikit-learn = "~=1.5" | |||
nltk = "~=3.9" | |||
sacrebleu = "^2.4.3" | |||
pytest-testmon = "^2.1.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect you don't want pytest-testmon
in production, but maybe only among the dev dependencies. and why do you want this?
@@ -52,7 +52,12 @@ numpy = "~=1.26" | |||
umap-learn = "~=0.5" | |||
scikit-learn = "~=1.5" | |||
nltk = "~=3.9" | |||
sacrebleu = "^2.4.3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency and easier dependency solving, can you use the tilde as for the other dependencies? ~=2.4
vocos = "~=0.1" | ||
deepeval = "~=2.2.6" | ||
rouge-score = "~=0.1.2" | ||
textstat = "^0.7.4" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
@@ -129,10 +134,7 @@ target-version = "py310" | |||
|
|||
[tool.ruff.lint] | |||
select = ["ANN", "D", "E", "F", "I"] | |||
ignore = [ | |||
"ANN101", # self should not be annotated. | |||
"ANN102" # cls should not be annotated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this? I suspect ANN101 and ANN102 should be there since we never annotate self
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is actually empty
class LLM: | ||
"""Wrapper for invoking various LLMs. | ||
|
||
This class provides a unified interface for interacting with LLMs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the external server the easiest way to interact with these models? sounds like an extra effort that we are asking to our users
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this is more code for the ALI project than code that will should be part of a reusable API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why cannot llm response be a scriptline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how much of this and transcript_output is ALI -specific or reusable by others? My general comment is to try split the code that is specific to ALI and the code that you want to make available to everyone in the community
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have left comments here and there. I feel that we are going in the right direction, but you may want to think about a more generalizable API for interacting with the llms (not just in the context of the ALI project)
Description
Related Issue(s)
Motivation and Context
How Has This Been Tested?
Screenshots (if appropriate):
Types of changes
Checklist: