Projects
Automatic Curriculum Design for Human-AI Coordination
Motivation
Training AI agents to coordinate with humans in cooperative tasks is difficult because humans are unpredictable and environments vary. Most zero-shot coordination methods focus on partner diversity but ignore environment diversity. This project asked: can UED — originally designed for competitive games — be adapted to improve zero-shot human-AI coordination in cooperative settings?
Issues
- Existing multi-agent UED (MAESTRO) uses regret-based utility designed for competitive zero-sum games — inappropriate for cooperative tasks.
- Standard self-play co-player sampling ignores joint environment/co-player difficulty.
- Evaluating against real humans (not just proxies) is essential but rarely done.
Method
Automatic Curriculum Design (ACD):
- Return-based utility: replaces regret with cumulative return as a measure of learning potential in cooperative settings.
- Prioritized co-player sampling: selects the co-player whose worst-performing environment has the lowest return, jointly optimizing the lower bound over all environment/co-player pairs.
- Replay distribution: blends rank-based coordination score priority with staleness-based freshness.
Trained in Overcooked-AI on 6,000 automatically generated layouts; evaluated on 5 graduated-difficulty test layouts.
Results & Contribution
- Outperforms MAESTRO, Robust PLR, and Domain Randomization on all 5 evaluation layouts.
- Real human study (N=20): highest collaborativeness and preference ratings.
- Demonstrates return > regret for cooperative UED.
- Published in IEEE Access (2025).