This project tests LLM responses for pizza order processing using promptfoo. The LLMs are prompted to convert voice transcripts of pizza orders into structured JSON data.
- Install promptfoo:
npm install -g promptfoo- Make sure you have the necessary API keys set up for the LLM providers you're using.
The project uses the following configuration:
promptfooconfig.yaml: Main configuration file for promptfooprompts.py: Contains the prompt templates for different LLM approachespizza_orders_tests.json: Test cases with transcripts and expected JSON outputs
To run the tests:
promptfoo evalPoint the config file to single-test.json if you want to test a single case.
This will:
- Process each transcript in the test file
- Send it to each LLM provider with each prompt template
- Compare the JSON output with the expected output
- Save results to
results.json
The test file (pizza_orders_tests.json) contains an array of test cases, each with:
description: A clear description of what the test case is checkingvars: Contains the input variables (transcript)assert: Contains assertions to validate the output
Each test case includes an equality assertion that compares the LLM output with the expected JSON structure.
The configuration uses the equals assertion type to validate that the LLM output exactly matches the expected JSON structure. This includes checking:
- Pizza type (cheese, pepperoni, vegetarian)
- Size (small, medium, extra large)
- Extra toppings
- Processing status
Results are saved to results.json and include:
- The original prompt
- The LLM output
- Test variables
- Assertion results
Run promptfoo view to visualize the results in the Promptfoo interface.
To add new tests:
- Add a new object to the array in
pizza_orders_tests.json - Include a clear
descriptionof what the test is checking - Add the transcript in the
varsobject - Add an equality assertion with the expected output
- Run the tests again to validate
Try these exercises to deepen your understanding of prompt engineering and LLM behavior:
-
Prompt Optimization
-
Can you improve the prompt to achieve a higher success rate with a cheaper model?
-
Try making the prompt shorter while maintaining or improving the success rate
-
Experiment with different prompt structures (few-shot examples, step-by-step reasoning, etc.)
-
Identify patterns in where models succeed or fail
-
-
Prompt Engineering Techniques
- Try implementing chain-of-thought prompting
- Experiment with different formats for the expected JSON output
- Test the impact of including/excluding certain context in the prompt
-
Performance Optimization
- Measure and compare response times across different prompt versions
- Try to optimize for both accuracy and speed
- Experiment with temperature settings and their impact on consistency
Share your findings and improvements with the community!