layout

title

permalink

redirect_from

sp25

Advanced Large Language Model Agents

/sp25

/

Prospective Students

To sign up for the course, please fill in this form.
For course discussion and questions, please join our LLM Agents Discord.
This course is built upon the fundamentals from the Fall 2024 LLM Agents MOOC.

Course Staff

Instructor	(Guest) Co-instructor	(Guest) Co-instructor

Dawn Song	Xinyun Chen	Kaiyu Yang
Professor, UC Berkeley	Research Scientist, Google DeepMind	Research Scientist, Meta FAIR

Guest Speakers


Jason Weston	Yu Su	Hanna Hajishirzi


Charles Sutton	Ruslan Salakhutdinov	Caiming Xiong


Thomas Hubert	Sean Welleck	Swarat Chaudhuri

Course Description

Large language model (LLM) agents have been an important frontier in AI, however, they still fall short critical skills, such as complex reasoning and planning, for solving hard problems and enabling end-to-end applications in real-world scenarios. Building on our previous course, this course dives deeper into advanced topics in LLM agents, focusing on reasoning, AI for mathematics, code generation, and program verification. We begin by introducing advanced inference and post-training techniques for building LLM agents that can search and plan. Then, we focus on two application domains: mathematics and programming. We study how LLMs can be used to prove mathematical theorems, as well as generate and reason about computer programs. Specifically, we will cover the following topics:

Inference-time techniques for reasoning
Post-training methods for reasoning
Search and planning
Agentic workflow, tool use, and functional calling
LLMs for code generation and verification
LLMs for mathematics: data curation, continual pretraining, and finetuning
LLM agents for theorem proving and autoformalization

Syllabus

Date	Guest Lecture (4:00PM-6:00PM PT)	Supplemental Readings
Jan 27th	Inference-Time Techniques for LLM Reasoning Xinyun Chen, Google DeepMind Livestream Intro Slides Quiz 1	- Large Language Models as Optimizers - Large Language Models Cannot Self-Correct Reasoning Yet - Teaching Large Language Models to Self-Debug
Feb 3rd	Learning to reason with LLMs Jason Weston, Meta Livestream Slides Quiz 2	- Direct Preference Optimization: Your Language Model is Secretly a Reward Model - Iterative Reasoning Preference Optimization - Chain-of-Verification Reduces Hallucination in Large Language Models
Feb 10th	On Reasoning, Memory, and Planning of Language Agents Yu Su, Ohio State University Livestream Slides Quiz 3	- Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization - HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models - Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
Feb 17th	No Class - Presidents' Day
Feb 24th	Open Training Recipes for Reasoning in Language Models Hanna Hajishirzi, University of Washington Livestream Slides Quiz 4	- Tulu 3: Pushing Frontiers in Open Language Model Post-Training - Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback - OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs
Mar 3rd	Coding Agents and AI for Vulnerability Detection Charles Sutton, Google DeepMind Livestream Slides Quiz 5	- Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities - From Naptime to Big Sleep: Using Large Language Models To Catch Vulnerabilities In Real-World Code
Mar 10th	Multimodal Autonomous AI Agents Ruslan Salakhutdinov, CMU/Meta Livestream Slides	- Mind2Web: Towards a Generalist Agent for the Web - WebArena: A Realistic Web Environment for Building Autonomous Agents - VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks - Tree Search for Language Model Agents
Mar 17th	Multimodal Agents – From Perception to Action Caiming Xiong, Salesforce AI Research	- OSWORLD: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments - AGUVIS: Unified Pure Vision Agents For Autonomous GUI Interaction
Mar 24th	No Class - Spring Recess
Mar 31st	AlphaProof Thomas Hubert, Google DeepMind 10am-noon PT
Apr 7th	Language models for autoformalization and theorem proving Kaiyu Yang, Meta FAIR
Apr 14th	Advanced Topics in Neural Theorem Proving Sean Welleck, CMU
Apr 21st	Program verification & generating verified code Swarat Chaudhuri, UT Austin 10am-noon PT
Apr 28th	Agent safety & security Dawn Song, UC Berkeley

Completion Certificate

Coming Soon!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sp25.md

sp25.md

Prospective Students

Course Staff

Guest Speakers

Course Description

Syllabus

Completion Certificate

Files

sp25.md

Latest commit

History

sp25.md

File metadata and controls

Prospective Students

Course Staff

Guest Speakers

Course Description

Syllabus

Completion Certificate