Human-Aligned Procedural Level Generation RL via Text-Level-Sketch Shared Representation
In-Chang Baek* , Seo-Young Lee* , Sung-Hyun Kim , Geumhwan Hwang , Kyung-Joong Kim
Human-aligned AI is a critical component of co-creativity. This paper proposes VIPCGRL (Vision-Instruction PCGRL), a novel deep RL framework that incorporates three modalities — text, level, and sketches — to extend control modality and enhance human-likeness in procedural content generation. A shared embedding space is trained via quadruple contrastive learning across modalities and human-AI styles. The policy is aligned using an auxiliary reward based on embedding similarity. Experimental results show VIPCGRL outperforms existing baselines in human-likeness and demonstrates zero-shot cross-modal generalization.