Proactive Conversational Agents with Inner Thoughts
Xingyu Bruce Liu,
Shitao Fang, Weiyan Shi, Chien-Sheng Wu, Takeo Igarashi, and Xiang ’Anthony’ Chen
In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems 2025 (CHI’25)
One of the long-standing aspirations in conversational AI is to allow them to autonomously take initiatives in conversations, i.e., being proactive. This is especially challenging for multi-party conversations. Prior NLP research focused mainly on predicting the next speaker from contexts like preceding conversations. In this paper, we demonstrate the limitations of such methods and rethink what it means for AI to be proactive in multi-party, human-AI conversations. We propose that just like humans, rather than merely reacting to turn-taking cues, a proactive AI formulates its own inner thoughts during a conversation, and seeks the right moment to contribute. Through a formative study with 24 participants and inspiration from linguistics and cognitive psychology, we introduce the Inner Thoughts framework. Our framework equips AI with a continuous, covert train of thoughts in parallel to the overt communication process, which enables it to proactively engage by modeling its intrinsic motivation to express these thoughts. We instantiated this framework into two real-time systems: an AI playground web app and a chatbot. Through a technical evaluation and user studies with human participants, our framework significantly surpasses existing baselines on aspects like anthropomorphism, coherence, intelligence, and turn-taking appropriateness.
Existing research confirms that the appearance and affordance of avatars can influence the users’ perception, attitudes, and behavior. However, such studies focus on perception and behavioral changes of the user who directly controls the given avatar. As a result, the social and societal implications of avatar designs are still underexplored. We argue that an emerging platform of social VR would enable further explorations on avatar’s effects in the context of interaction with other users, potentially opening up a new research horizon. In this paper, we describe our research direction and discuss potential research questions.