Automatic Curriculum Design for Human-AI Coordination

Motivation

Training AI agents to coordinate with humans in cooperative tasks is difficult because humans are unpredictable and environments vary. Most zero-shot coordination methods focus on partner diversity but ignore environment diversity. This project asked: can UED — originally designed for competitive games — be adapted to improve zero-shot human-AI coordination in cooperative settings?

Issues

Existing multi-agent UED (MAESTRO) uses regret-based utility designed for competitive zero-sum games — inappropriate for cooperative tasks.
Standard self-play co-player sampling ignores joint environment/co-player difficulty.
Evaluating against real humans (not just proxies) is essential but rarely done.

Method

Automatic Curriculum Design (ACD):

Return-based utility: replaces regret with cumulative return as a measure of learning potential in cooperative settings.
Prioritized co-player sampling: selects the co-player whose worst-performing environment has the lowest return, jointly optimizing the lower bound over all environment/co-player pairs.
Replay distribution: blends rank-based coordination score priority with staleness-based freshness.

Trained in Overcooked-AI on 6,000 automatically generated layouts; evaluated on 5 graduated-difficulty test layouts.

Results & Contribution

Outperforms MAESTRO, Robust PLR, and Domain Randomization on all 5 evaluation layouts.
Real human study (N=20): highest collaborativeness and preference ratings.
Demonstrates return > regret for cooperative UED.
Published in IEEE Access (2025).