LipLearner: Customizing Silent Speech Commands from Voice Input Using One-Shot Lipreading
Zixiong Su,
Shitao Fang, and Jun Rekimoto
In The Adjunct Publication of the 35th Annual ACM Symposium on User Interface Software and Technology 2022 (UIST’22)
We present LipLearner, a lipreading-based silent speech interface that enables in-situ command customization on mobile devices. By leveraging contrastive learning to learn efficient representations from existing datasets, it performs instant fine-tuning for unseen users and words using one-shot learning. To further minimize the labor of command registration, we incorporate speech recognition to automatically learn new commands from voice input. Conventional lipreading systems provide limited pre-defined commands due to the time cost and user burden of data collection. In contrast, our technique provides expressive silent speech interaction with minimal data requirements. We conducted a pilot experiment to investigate the real-time performance of LipLearner, and the result demonstrates that an average accuracy of is achievable with only one training sample for each command.
Existing research confirms that the appearance and affordance of avatars can influence the users’ perception, attitudes, and behavior. However, such studies focus on perception and behavioral changes of the user who directly controls the given avatar. As a result, the social and societal implications of avatar designs are still underexplored. We argue that an emerging platform of social VR would enable further explorations on avatar’s effects in the context of interaction with other users, potentially opening up a new research horizon. In this paper, we describe our research direction and discuss potential research questions.