LipLearner: Customizable Silent Speech Interactions on Mobile Devices
Zixiong Su,
Shitao Fang, and Jun Rekimoto
In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems 2023 (CHI’23) 🏆 Best Paper Award
Silent speech interface is a promising technology that enables private communications in natural language. However, previous approaches only support a small and inflexible vocabulary, which leads to limited expressiveness. We leverage contrastive learning to learn efficient lipreading representations, enabling few-shot command customization with minimal user effort. Our model exhibits high robustness to different lighting, posture, and gesture conditions on an in-the-wild dataset. For 25-command classification, an F1-score of 0.8947 is achievable only using one shot, and its performance can be further boosted by adaptively learning from more data. This generalizability allowed us to develop a mobile silent speech interface empowered with on-device fine-tuning and visual keyword spotting. A user study demonstrated that with LipLearner, users could define their own commands with high reliability guaranteed by an online incremental learning scheme. Subjective feedback indicated that our system provides essential functionalities for customizable silent speech interactions with high usability and learnability.
Existing research confirms that the appearance and affordance of avatars can influence the users’ perception, attitudes, and behavior. However, such studies focus on perception and behavioral changes of the user who directly controls the given avatar. As a result, the social and societal implications of avatar designs are still underexplored. We argue that an emerging platform of social VR would enable further explorations on avatar’s effects in the context of interaction with other users, potentially opening up a new research horizon. In this paper, we describe our research direction and discuss potential research questions.