Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks

This is the official codebase for our 2022 CogSci paper:

Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks

Katherine M. Collins*, Lionel Wong*, Jiahai Feng, Megan Wei, and Joshua B. Tenenbaum

*Contributed equally

Paper: https://arxiv.org/pdf/2205.05718.pdf

Project Page: https://sites.google.com/view/structured-flexible-and-robust/home

Spreadsheet of Collected Human and GPT-3 Generations: https://docs.google.com/spreadsheets/d/1K7Jspk4Yb-fDl2bucejxgg9mT-1-CPgAfPkv2nOZss8/edit?usp=sharing

Abstract

Human language offers a powerful window into our thoughts -- we tell stories, give explanations, and express our beliefs and goals through words. Abundant evidence also suggests that language plays a developmental role in structuring our learning. Here, we ask: how much of human-like thinking can be captured by learning statistical patterns in language alone? We first contribute a new challenge benchmark for comparing humans and distributional large language models (LLMs). Our benchmark contains two problem-solving domains (planning and explanation generation) and is designed to require generalization to new, out-of-distribution problems expressed in language. We find that humans are far more robust than LLMs on this benchmark. Next, we propose a hybrid Parse-and-Solve model, which augments distributional LLMs with a structured symbolic reasoning module. We find that this model shows more robust adaptation to out-of-distribution planning problems, demonstrating the promise of hybrid AI models for more human-like reasoning.

Repository Details

Code for "Part I: Linguistic reasoning benchmark for humans and language models" can be found in the Part_I directory.

Human and LLM generations, as well as all stimuli, can be found in the nested data directories for each domain (plans and explanations, respectively). Please keep in mind that our LLM generations were elicited from the GPT-3 variant (davinci-003) prior to any instruction fine-tuning.

Code for "Part II: Integrating language with structured reasoning models" can be found in the Part_II directory.

Citing

If citing us, please consider the following bibtex entry:

@misc{collinsWong2022,
  doi = {10.48550/ARXIV.2205.05718},
  
  url = {https://arxiv.org/abs/2205.05718},
  
  author = {Collins, Katherine M. and Wong, Catherine and Feng, Jiahai and Wei, Megan and Tenenbaum, Joshua B.},
  
  keywords = {Computation and Language (cs.CL), Artificial Intelligence (cs.AI), Machine Learning (cs.LG), Symbolic Computation (cs.SC), FOS: Computer and information sciences, FOS: Computer and information sciences},
  
  title = {Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks},
  
  publisher = {CogSci},
  
  year = {2022},
  
  copyright = {Creative Commons Attribution 4.0 International}
}

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
Part_I		Part_I
Part_II		Part_II
.Rhistory		.Rhistory
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks

Abstract

Repository Details

Citing

About

Releases

Packages

Contributors 3

Languages

collinskatie/structured_flexible_and_robust

Folders and files

Latest commit

History

Repository files navigation

Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks

Abstract

Repository Details

Citing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages