Seo-Young Lee

M.S. AI, GIST

Seoul, Korea

AI Engineer · Researcher

Seo-Young Lee 이서영

I am an AI engineer who works across both research and implementation. During my M.S. at GIST, I researched reinforcement learning and multimodal representation learning — developing agents that interact naturally with humans, from natural language-guided procedural content generation and human-AI coordination to 3D pose estimation from tactile signals. I am drawn to problems at the intersection of perception, language, and decision-making, and have applied this across diverse domains using PyTorch and JAX.

Reinforcement Learning Procedural Content Generation Multimodal Representation Human-AI Interaction Human Pose Estimation

News

Education

2023–2025

M.S. Artificial Intelligence

GIST (Gwangju Institute of Science and Technology)

GPA 4.20 / 4.50 · Advisor: Prof. Kyung-Joong Kim

2017–2023

B.S. Computer Science & Engineering

Dongguk University

GPA 3.96 / 4.50 · Major GPA 4.25 / 4.50

Research Experience

2025–2026

Graduate Research Assistant

Cognition and Intelligence Lab, GIST · Gwangju, Korea

Post-graduation research in game AI and human-aligned agent systems (Sep 2025 – Jan 2026).

2023–2025

Graduate Researcher (M.S.)

Cognition and Intelligence Lab, GIST · Gwangju, Korea

RL-based PCG, multimodal representation learning, human-AI alignment; GIST–MIT tactile sensing collaboration. NRF Research Grant PI.

2020–2023

Undergraduate Research Assistant

Artificial Intelligence Lab, Dongguk University · Seoul, Korea

Grants & Funding

2024–2025

NRF Master's Research Grant · ₩12,000,000

National Research Foundation of Korea · Principal Investigator

Instruction Agent Research based on Natural Language-Reward Function Embedding

자연어-보상함수 임베딩을 활용한 명령기반 에이전트 연구

Selected Publications

View all →
Under Review 2025 · IEEE Transactions on Games (ToG)

Human-Aligned Procedural Level Generation RL via Text-Level-Sketch Shared Representation

In-Chang Baek* , Seo-Young Lee* , Sung-Hyun Kim , Geumhwan Hwang , Kyung-Joong Kim

Human-aligned AI is a critical component of co-creativity. This paper proposes VIPCGRL (Vision-Instruction PCGRL), a novel deep RL framework that incorporates three modalities — text, level, and sketches — to extend control modality and enhance human-likeness in procedural content generation. A shared embedding space is trained via quadruple contrastive learning across modalities and human-AI styles. The policy is aligned using an auxiliary reward based on embedding similarity. Experimental results show VIPCGRL outperforms existing baselines in human-likeness (both quantitative metrics and human evaluations) and demonstrates zero-shot cross-modal generalization.

procedural content generation reinforcement learning multimodal representation human-AI alignment contrastive learning
Published 2025 · IEEE Conference on Games (CoG 2025)

IPCGRL: Language-Instructed RL for Procedural Level Generation

In-Chang Baek , Sung-Hyun Kim , Seo-Young Lee , Dong-Hyeon Kim , Kyung-Joong Kim

IPCGRL introduces a language-instructed PCGRL framework that uses sentence embeddings to condition a deep RL agent for procedural level generation. IPCGRL fine-tunes task-specific embedding representations to compress game-level conditions from natural language. Evaluated on a 2D level generation task, IPCGRL achieves up to 21.4% improvement in controllability and 17.2% improvement in generalizability for unseen instructions with varied condition expressions.

procedural content generation reinforcement learning natural language processing instruction following
Published 2024 · NeurIPS Workshop on Touch Processing: From Data to Knowledge (2024)

Smart Insole: Predicting 3D Human Pose from Foot Pressure

Isaac Han , Seoyoung Lee , Sangyeon Park , Ecehan Akan , Yiyue Luo , Kyung-Joong Kim

This study introduces a novel method of 3D human pose estimation using foot pressure data captured by a low-cost, high-resolution smart insole with over 600 pressure sensors per foot. Unlike prior carpet-type sensors, the wireless smart insole enables pose estimation regardless of location. Synchronized tactile and visual data (105,000+ frames, 5 participants, 7 actions) are collected. A deep neural network predicts 3D human poses using only foot pressure data, achieving 7.43 cm average localization error and 96.88% action classification accuracy.

human pose estimation tactile sensing foot pressure wearable computing deep learning

Selected Projects

View All Projects →
Funded

VIPCGRL: Human-Aligned Procedural Level Generation

Co-Principal Investigator · 2023.09 – 2025.07 · Team of 4

Multimodal (text + level + sketch) shared embedding for human-aligned procedural content generation via reinforcement learning.

PCGRL reinforcement learning multimodal contrastive learning
Funded

Smart Insole: Tactile-Based Human Sensing

Research Contributor (2nd Author) · 2023.03 – 2025.07 · Team of 5

3D human pose estimation and action recognition from foot pressure data using a wireless high-resolution smart insole sensor (GIST-MIT collaboration).

pose estimation tactile sensing wearable multi-task learning
Academic

Automatic Curriculum Design for Human-AI Coordination

Research Contributor (3rd Author) · 2023.09 – 2024.08 · Team of 3

Extends multi-agent Unsupervised Environment Design (UED) to zero-shot human-AI coordination using return-based utility and prioritized co-player sampling in Overcooked-AI.

human-AI coordination curriculum learning multi-agent RL zero-shot generalization
Industry

Samsung C-Lab: Art Activity Recognition & Detection

Research Engineer (Undergraduate) · 2022.03 – 2022.12 · Team of 4

Android app for real-time art creation behavior recognition (HAR) and artwork region detection (PFD) in collaboration with Samsung C-Lab.

activity recognition object detection mobile AI android