Skip to content

Latest commit

 

History

History
151 lines (125 loc) · 8.72 KB

sp25.md

File metadata and controls

151 lines (125 loc) · 8.72 KB
layout title permalink redirect_from
sp25
Advanced Large Language Model Agents
/sp25
/

Prospective Students

Course Staff

Instructor (Guest) Co-instructor (Guest) Co-instructor
Dawn Song Xinyun Chen Kaiyu Yang
Professor, UC Berkeley Research Scientist,
Google DeepMind
Research Scientist,
Meta FAIR

Guest Speakers

<style> .table { width: 100%; table-layout: fixed; border-collapse: collapse; } .table td { width: 25%; text-align: center; vertical-align: top; padding: 10px; } </style>
Jason Weston Yu Su Hanna Hajishirzi
Charles Sutton Ruslan Salakhutdinov Caiming Xiong
Thomas Hubert Sean Welleck Swarat Chaudhuri

Course Description

Large language model (LLM) agents have been an important frontier in AI, however, they still fall short critical skills, such as complex reasoning and planning, for solving hard problems and enabling end-to-end applications in real-world scenarios. Building on our previous course, this course dives deeper into advanced topics in LLM agents, focusing on reasoning, AI for mathematics, code generation, and program verification. We begin by introducing advanced inference and post-training techniques for building LLM agents that can search and plan. Then, we focus on two application domains: mathematics and programming. We study how LLMs can be used to prove mathematical theorems, as well as generate and reason about computer programs. Specifically, we will cover the following topics:

  • Inference-time techniques for reasoning
  • Post-training methods for reasoning
  • Search and planning
  • Agentic workflow, tool use, and functional calling
  • LLMs for code generation and verification
  • LLMs for mathematics: data curation, continual pretraining, and finetuning
  • LLM agents for theorem proving and autoformalization

Syllabus

Date Guest Lecture
(4:00PM-6:00PM PT)
Supplemental Readings
Jan 27th Inference-Time Techniques for LLM Reasoning
Xinyun Chen, Google DeepMind
Livestream Intro Slides Quiz 1
- Large Language Models as Optimizers
- Large Language Models Cannot Self-Correct Reasoning Yet
- Teaching Large Language Models to Self-Debug
Feb 3rd Learning to reason with LLMs
Jason Weston, Meta
Livestream Slides Quiz 2
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model
- Iterative Reasoning Preference Optimization
- Chain-of-Verification Reduces Hallucination in Large Language Models
Feb 10th On Reasoning, Memory, and Planning of Language Agents
Yu Su, Ohio State University
Livestream Slides Quiz 3
- Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
- HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
- Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
Feb 17th No Class - Presidents' Day
Feb 24th Open Training Recipes for Reasoning in Language Models
Hanna Hajishirzi, University of Washington
Livestream Slides Quiz 4
- Tulu 3: Pushing Frontiers in Open Language Model Post-Training
- Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback
- OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs
Mar 3rd Coding Agents and AI for Vulnerability Detection
Charles Sutton, Google DeepMind
Livestream Slides Quiz 5
- Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities
- From Naptime to Big Sleep: Using Large Language Models To Catch Vulnerabilities In Real-World Code
Mar 10th Multimodal Autonomous AI Agents
Ruslan Salakhutdinov, CMU/Meta
Livestream Slides
- Mind2Web: Towards a Generalist Agent for the Web
- WebArena: A Realistic Web Environment for Building Autonomous Agents
- VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
- Tree Search for Language Model Agents
Mar 17th Multimodal Agents – From Perception to Action
Caiming Xiong, Salesforce AI Research
- OSWORLD: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
- AGUVIS: Unified Pure Vision Agents For Autonomous GUI Interaction
Mar 24th No Class - Spring Recess
Mar 31st AlphaProof
Thomas Hubert, Google DeepMind
10am-noon PT
Apr 7th Language models for autoformalization and theorem proving
Kaiyu Yang, Meta FAIR
Apr 14th Advanced Topics in Neural Theorem Proving
Sean Welleck, CMU
Apr 21st Program verification & generating verified code
Swarat Chaudhuri, UT Austin
10am-noon PT
Apr 28th Agent safety & security
Dawn Song, UC Berkeley

Completion Certificate

Coming Soon!