Case Study

Calling

A user says "call Stephanie", there are two Stephanies across WhatsApp, Messenger, and native phone. This is the problem I solved across 5 devices for 3 years.

Owned the Calling domain across 5 Meta devices for 3 years, leading the transition from a rules based system to a model based architecture that reduced task failures by 55% and became the foundational interaction pattern for Calling, Messaging and Sharing across the wearable ecosystem.

I was brought in to own this domain at the moment it needed to transition from rules-based to model-based, a shift that required someone who could bridge conversation design and machine learning.

Drove alignment

Aligned Calling, Messaging and Sharing owners around model based dialog as the shared interaction pattern.

Mentored designers

Onboarded and trained every new designer on model based dialog, having built the system from scratch.

Scaled the team

Mentored junior designers during team expansion on Portal, growing the team's domain expertise.

Collaboration

Worked closely with research, engineering, annotators and PMs. Authored annotation guidelines for model training and evaluation.

Localization

As a native Spanish speaker, I designed all calling interactions in both English and Spanish. A key launch was WhatsApp calling in Spanish on Portal, critical during the pandemic when 46% of U.S. Latino adults used WhatsApp (double the general population) to stay connected with family.

Design Decisions

Principles

Speed

Prioritize quick and efficient communication by minimizing the time and steps required to initiate, answer and end calls.

Seamlessness

Strive for a seamless integration across smart glasses and smart phones creating a unified experience.

Integration

Connect through our different family of apps as providers respecting a cohesive experience for smart glasses.

Confidence

Design with elements that instill confidence to our users on the UI and Voice, fostering trust in the calling experience.

Intuitiveness

Ensure a clear and concise interface that allows users to easily navigate and initiate calls without confusion.

Key trade-offs

Speed vs. Accuracy

Choosing between fast, implicit resolution vs. explicit user confirmation. Faster flows reduce friction, but explicit confirmation reduces errors in high stakes actions.

Simplicity vs. Density

Designing for an ultra constrained heads up display required prioritizing simplicity over detailed information to avoid cognitive load.

Voice vs. Multimodal

Determining when voice should lead vs. when to offload to visuals or gestures. Voice is natural for intent, but inefficient for selection or repeated actions.

System Architecture

Model Based Dialog for Calling

Users don't follow scripts. They interrupt, change their mind, use nicknames, say "call my mom" instead of a contact name. Rules-based systems break on all of this. Model-based dialog replaces rigid decision trees with a system that interprets intent, resolves ambiguity, and recovers from misunderstandings.

Rules based

40% failure rate

"Hey Meta, call Stephanie"

Exact match lookup

Found 2 contacts named "Stephanie"

No exact match

"Sorry, I can't help with that."

Call not placed

Model based

80%+ success rate

"Hey Meta, call Stephanie"

Interpret signals

2 contacts found. Analyzing conversation context...

"On WhatsApp, call Stephanie De Luna, right?"

"Yes"

Calling Stephanie De Luna

Call placed successfully

What I did

Redesigned core patterns

Replaced deterministic flows with a model that interprets conversational signals and adapts to how people actually speak.

Reframed failure as recovery

Instead of ending interactions on errors, designed context-aware strategies that recover and keep the conversation going.

Built the ML framework

Established the end-to-end pipeline for annotation, training, and evaluation of the dialog model.

Scaled across the ecosystem

What began as an experiment on Portal became the shared pattern for Calling, Messaging, and Sharing across all devices.

Calling Across Devices

Each device pushed the interaction model further: from a screen with voice, to voice with no screen, to voice with a heads-up display.

Meta Portal

Challenge

Portal was the first device where we applied model based dialog to calling. We were figuring out what worked. The device became a hit during the pandemic as people relied on it for family connection when in-person wasn't possible. A key challenge was multi-generational users: senior users were power users who relied entirely on voice since traditional tech interfaces weren't intuitive for them.

User Landscape

10x

Portal sales increase from mid-March 2020, leading to stock shortages.

Source

1 in 3

Seniors were using video chat weekly by 2020. 70% had used it at least once during the pandemic.

Source, AARP

"Regular video interactions can reduce depression symptoms in older adults by up to 50%."UCHealth

Approach

Designed voice + visual calling flows using Model Based Dialog, focused on making interactions simple enough for non-tech-savvy users while keeping the visual UI clear and accessible on the large display.

Portal

Video

User (voice)

"Hey Portal, call Stephanie De Luna"

System

Finding contacts on screen...

Portal display

Stephanie (WhatsApp) · Stephanie (Messenger)

Assistant (TTS)

"Which one?"

User (voice)

"WhatsApp"

State

Calling Stephanie De Luna on WhatsApp

Voice

TTS

Display

Gesture

State

Portal was a home device, which meant users were often not standing next to it. They could be cooking, walking around the house, or in another room. The voice assistant had to be fast and accurate enough to place a call or guide users through calling features from across the room, without needing to touch the screen.

Key Feature

In-Call Capture

Let users capture photos during calls with a hand wave or by asking the assistant. The voice experience was critical here because it happened mid-call. The interaction had to capture the moment without interrupting the conversation. Users could review photos later on the phone, turning video calls into something more than just communication.

Ray-Ban Meta

Challenge

No UI at all. Everything was voice first. Calling has a unique speed urgency where users don't want multi turn conversations with the assistant. It had to be fast and accurate because there's no visual confirmation. Multiple providers (WhatsApp, Messenger, native phone, Instagram Direct) made accuracy critical. You don't want to accidentally call a Facebook connection from 10 years ago.

Approach

Redefined user flows as sample dialogues that served as both design artifacts and training signals. The model attempts recovery for up to 2 turns before gracefully ending, prioritizing disambiguation accuracy across all providers.

Exploration: mapping disambiguation patterns and scaling displayless vs. display behavior

Result: sample dialogue shipped as design spec and model training data

Ray-Ban Meta Display

Challenge

An entirely new form factor with no existing prototypes. A very small screen in the user's eyeball meant choosing wisely when to use visuals vs voice. We also introduced earcons as audio signals and had to manage cognitive load carefully. People already know calling from their phone, so the challenge was making it feel familiar while bringing a wow factor.

Approach

Heavy reliance on prototyping to feel out interactions before the hardware existed. Carefully selected when to use the display vs voice vs earcons. Designed for three input methods (voice, gestures, display) working together without overwhelming the user.

Prototyping with AI

AI has become a core part of my design process. I use Claude to rapidly prototype interactions and test ideas before investing time in high fidelity mockups. This lets me explore multiple directions in hours instead of days.

The prototype below was built entirely with Claude. I described the calling flow, the disambiguation logic, and the visual language I wanted, and iterated on it through conversation. This approach lets me validate interaction patterns, test edge cases, and communicate design intent to engineers and PMs with a working artifact rather than a static spec.

Describe the interaction

Start by describing the flow, edge cases, and modality decisions in natural language.

Iterate through conversation

Refine the prototype by asking for changes, testing different states, and adding detail incrementally.

Ship a working artifact

The result is interactive, not a static mockup. Stakeholders can experience the flow, not just read about it.

This prototype was built with Claude and is for exploration purposes only. It does not use the official design system.

Cross-Domain Influence

TTS shortening initiative

Initiated a cross-domain effort to reduce TTS length across wearables. Collaborated with Messaging and Sharing designers to shorten prompts across all three domains, resulting in a 40% reduction in TTS errors.

Earcons for calling states

Collaborated with the sound design team to create earcons for each calling state and provider. Designed the assistant listening state earcon, a cross-functional effort that became consistent across all voice interactions.

Outcome

Over 3 years, I shaped calling from a rules-based system into a model-based architecture that became the foundational interaction pattern across Meta's wearable ecosystem.

Reduction in task failures

Devices shipped

WhatsApp success rate

Reduction in TTS errors

The model-based dialog system I designed for Calling was adopted by Messaging and Sharing, becoming the shared interaction pattern across the Comms domain. The cross-domain TTS shortening initiative and earcon design work further standardized voice interactions across the wearable ecosystem.

Reflection

Annotation guidelines as a design surface

Writing annotation guidelines for model training taught me that the boundary between design and ML is itself a design surface. The guidelines I authored directly shaped model behavior, which means conversation design at this level is as much about training data as it is about user flows.

Design for the hardest user first

Designing for seniors on Portal gave me a perspective I carry into every project: if the interaction works for someone who has never used a smartphone, it works for everyone. That constraint made the voice experience stronger for all users, not just the target audience.