Smart Insole: Tactile-Based Human Sensing
Motivation
Vision-based human pose estimation is ubiquitous but suffers from occlusion and privacy concerns. Foot pressure data, captured by a wearable insole, offers a privacy-preserving, location-independent alternative. This GIST-MIT collaboration project explored how much information about full-body 3D pose can be recovered from foot contact alone.
Issues
- Prior insole/carpet sensors are large, fixed, and expensive.
- 600+ sensors per foot generates high-dimensional temporal streams requiring efficient architectures.
- Ground truth 3D keypoints require synchronized multi-camera capture.
- Generalizing to unseen participants and actions is challenging with small datasets.
Method
Smart Insole (NeurIPS Workshop 2024): Dual CNN encoders process left/right foot pressure frames → 3D heatmap decoder predicts 19 keypoints. A linear probe on frozen encoders classifies actions (96.88% accuracy).
SCOTTI (arXiv 2025) — Extension: CNN + Transformer (SCOTTI) jointly optimizes three tasks: pose estimation, action classification, and novel action progress prediction (first in tactile sensing). Multi-task learning improves all three tasks simultaneously over single-task baselines.
Results & Contribution
- 7.43 cm average pose estimation error on 7 actions (NeurIPS 2024).
- 96.88% action classification accuracy as a linear probe.
- SCOTTI: first tactile action progress prediction; 96.63 mm MPJPE, 90.06% accuracy.
- Dataset: 15 participants, 200,000+ synchronized frames, 8 actions (to be open-sourced).
- Hardware: ~$50 wireless insole with 500+ sensors per foot.