← Back to home
Published · HHAI 2024 · Human-AI Interaction · Augmented Reality

Keep Gesturing

A HoloLens 2 game for pragmatic, gesture-only communication with an LLM-controlled avatar.

János Adrián Gulyás · Miklós Máté Badó · Kristian Fenech · András Lőrincz

Department of Artificial Intelligence, Eötvös Loránd University, Budapest

TL;DR. Keep Gesturing is an augmented reality communication game where a human player and a GPT-4-controlled avatar solve object-configuration puzzles without speech or text. The human can see and manipulate the AR objects; the avatar knows the target configuration. To succeed, they must develop an efficient gesture language through repeated interaction, feedback, and adaptation.

01Problem

Most LLM collaboration systems assume language as the interaction channel. That is too narrow. Real human communication is multimodal, contextual, and often non-verbal. In noisy, time-critical, or accessibility-constrained environments, speech and text may be slow, unavailable, or simply the wrong interface.

The question behind Keep Gesturing is direct: can a large language model participate in a shared, task-grounded gesture language without being given a fixed symbolic vocabulary in advance?

02Game Setup

The game runs in augmented reality on HoloLens 2. A room contains interactive puzzle elements — objects whose properties can be changed, such as number, color, orientation, and size. Each level has a target configuration, but information is split asymmetrically between the two players.

Human player

  • Can see the current AR configuration.
  • Can interact with and edit puzzle elements.
  • Does not know the target configuration.

LLM avatar

  • Knows the target configuration.
  • Cannot directly manipulate the room.
  • Responds through animated hand gestures.

Neither party may communicate verbally. The only channel is gesture. This forces the pair to compress object attributes and corrective instructions into short, reusable, context-dependent motions.

Observe
The human sees the current puzzle state in AR.
Gesture
The human encodes the observed state using hand motions.
Interpret
GPT-4 receives the encoded motion sequence and infers intent.
Reply
The avatar animates a gesture response guiding the correction.

03Gesture Interface

A raw hand trajectory is too continuous and too noisy to expose directly to an LLM. Keep Gesturing therefore discretizes each interaction into a small number of animation keyframes. Each keyframe stores compact hand state: wrist position, wrist rotation, hand activity, and finger openness values.

The representation is deliberately lossy. Instead of sending every finger joint coordinate, the system compresses each finger into an openness value. This makes the gesture format easier for GPT-4 to parse and generate, while still preserving enough structure for meaningful communication.

A hard limit of three keyframes per gesture was used. This was not only a technical constraint; it was a game-design choice. If gestures are allowed to become arbitrarily long, players can over-describe. If gestures are short, both human and model are pressured to invent concise, reusable signals.

04System

The implementation combines HoloLens 2 hand tracking, Unity/MRTK, asynchronous GPT-4 API calls, and an animated AR avatar. The avatar is not a decorative character; it is the model’s embodied communication channel.

Core system components.
Component Role
HoloLens 2 Captures hand gestures and anchors interactive puzzle elements in the physical room.
Unity + MRTK Builds the mixed-reality environment, hand interaction, spatial UI, and object-editing panels.
Gesture JSON Encodes keyframes into a compact format that GPT-4 can interpret and produce.
GPT-4 avatar Interprets the player’s gesture sequence and responds with a generated gesture animation.
Feedback loop Scores attempts and allows both the player and the model to refine the emerging language.

05Prototype Evolution

The final game came after several failed or partial prototypes. The important design lesson was that pragmatic communication requires both enough structure for the model to reason over and enough freedom for meanings to adapt through context.

Iterations leading to the final Keep Gesturing design.
Iteration Problem Design consequence
Twister game Rule-based gesture dictionaries were too rigid and costly to expand. Move away from fixed one-gesture / one-word mappings.
Balls and baskets Communication was too one-sided: the human mostly interpreted the model. Give both parties private information they must exchange.
Rule-based KeepGesturing Randomly generated rules required validation and increased complexity. Constrain puzzles enough to keep levels solvable and interpretable.
Task-based KeepGesturing Multiple simultaneous tasks became too hard for both humans and GPT-4. Use one task per level in the final prototype.

06Demo

The demo shows the AR puzzle environment, gesture capture, and avatar-mediated response loop.

Demo video · click to open

HoloLens 2 gameplay demo for Keep Gesturing

07What We Observed

The TDK experiments were small but informative: two participants completed structured tutorial and gameplay sessions while HoloLens video, external video, gameplay metrics, and ChatGPT interactions were logged.

Participant 1

  • Developed a clearer and more modular gesture language.
  • Adapted signals to new object concepts and properties.
  • Successfully completed all levels.

Participant 2

  • Had more difficulty maintaining a stable language.
  • Encountered larger hurdles in later levels.
  • Showed why modularity and clarity matter for gesture pragmatics.

The main takeaway is not that the system already solves general non-verbal AI interaction. It does not. The stronger claim is narrower and more interesting: LLMs can participate in the creation and reuse of a personalized gesture protocol when the task gives them enough grounding, feedback, and pressure for efficiency.

08Future Directions

Experimental scale

  • Run larger studies across more participants.
  • Study how personality and strategy affect emergent gesture languages.
  • Compare GPT-4 with other LLMs under model-specific prompts.

Game mechanics

  • Reverse roles so the LLM can manipulate the environment.
  • Hide object-editing mechanics and let players discover them through interaction.
  • Integrate NeRF-based avatars or real-world object reconstructions.

09Cite

@incollection{gulyas2024keep,
  title={Keep Gesturing: A Game for Pragmatic Communication},
  author={Guly{\'a}s, J{\'a}nos Adri{\'a}n and Bad{\'o}, Mikl{\'o}s M{\'a}t{\'e} and Fenech, Kristian and L{\H{o}}rincz, Andr{\'a}s},
  booktitle={HHAI 2024: Hybrid Human AI Systems for the Social Good},
  pages={463--465},
  year={2024},
  publisher={IOS Press}
}