Fully hackable LLMOps. Build custom interfaces for: logging, evals, guardrails, labelling, tracing, agents, human-in-the-loop, hyperparam sweeps, and anything else you can think of β¨
Just unify.log
your data, and add an interface using the four building blocks:
- tables π’
- views π
- plots π
- editor πΉοΈ (coming soon)
Every LLM product has unique and changing requirements, as do the users. Your infra should reflect this!
We've tried to make Unify as (a) simple, (b) modular and (c) hackable as possible, so you can quickly probe, analyze, and iterate on the data that's important for you, your product and your users β‘
Sign up, pip install unifyai
, run your first eval β¬οΈ, and then check out the logs in your first interface π
import unify
from random import randint, choice
# initialize project
unify.activate("Maths Assistant")
# build agent
client = unify.Unify("o3-mini@openai", traced=True)
client.set_system_message(
"You are a helpful maths assistant, "
"tasked with adding and subtracting integers."
)
# add test cases
qs = [
f"{randint(0, 100)} {choice(['+', '-'])} {randint(0, 100)}"
for i in range(10)
]
# define evaluator
@unify.traced
def evaluate_response(question: str, response: str) -> float:
correct_answer = eval(question)
try:
response_int = int(
"".join(
[
c for c in response.split(" ")[-1]
if c.isdigit()
]
),
)
return float(correct_answer == response_int)
except ValueError:
return 0.
# define evaluation
@unify.traced
def evaluate(q: str):
response = client.generate(q)
score = evaluate_response(q, response)
unify.log(
question=q,
response=response,
score=score
)
# execute + log your evaluation
with unify.Experiment():
unify.map(evaluate, qs)
Despite all of the hype, abstractions, and jargon, the process for building quality LLM apps is pretty simple.
create simplest possible agent π€
while True:
create/expand unit tests (evals) ποΈ
while run(tests) failing: π§ͺ
Analyze failures, understand the root cause π
Vary system prompt, in-context examples, tools etc. to rectify π
Beta test with users, find more failures π¦
We've tried to strip away all of the excessive LLM jargon, so you can focus on your product, your users, and the data you care about, and nothing else π
Unify takes inspiration from:
- PostHog / Grafana / LogFire for powerful observability π¬
- LangSmith / BrainTrust / Weave for LLM abstractions π€
- Notion / Airtable for composability and versatility π§±
Whether you're technical or non-technical, we hope Unify can help you to rapidly build top-notch LLM apps, and to remain fully focused on your product (not the LLM).
Check out our docs, and if you have any questions feel free to reach out to us on discord πΎ
Unify is under active development π§, feedback in all shapes/sizes is also very welcome! π
Happy prompting! π§βπ»