the lab

Open notebook

Running experiments. Recording findings. Working things out.

The best interface never asks.

currently thinking about

→

What does an interface look like when the model already knows the answer before you ask?

→

How do you design a feedback loop that makes the model better without making the user feel like a trainer?

→

At what point does a design system become self-modifying, where usage patterns inform what components should exist?

→

If voice is the most natural input, why does every AI product still default to text?

→

What does the next evolution of the AI interface look like?

→

Can the Knicks really win the championship this year? Knicks in 5

things I hold true

AI should make you more capable, not more dependent.

The measure of a good AI product is whether the user gets better at something through using it. If they can only do the thing with the AI present, the product failed.

The best prompt is the one you never have to write.

Every prompt is evidence of an interface that did not do enough inference. The goal is to eliminate the prompt, not to perfect it.

Design systems are infrastructure, not decoration.

A design system is the foundation that lets a team move fast without making things worse. When it is treated as a style guide, it becomes a constraint. When it is treated as infrastructure, it becomes leverage.

Context is more valuable than capability.

A less capable model with full context will outperform a more capable model without it. Most AI products are solving the wrong problem.

Voice is the most natural interface we keep ignoring.

We speak before we type. We speak faster than we type. We speak in contexts where typing is impossible. The industry has treated voice as a novelty. It is actually the default.

The aha moment is a design problem, not a model problem.

Most users never experience what a model is actually capable of because the interface does not get them there. The model is ready. The design is not.

The interface is the model's first impression of itself.

How a model is presented shapes how people understand what it can do. A bad interface makes a good model look bad. A great interface makes the capability legible.

experiments

waypoint-sync-mcp.exein progress

Waypoint-sync via Figma MCP

The question I started with: what if the design file was not the source of a handoff, but the source of truth the codebase reads from directly?

The Figma MCP console makes this possible in a way that was not practical before. Instead of exporting tokens, converting them, committing them, and reviewing drift, the pipeline becomes: designer changes a color token in Figma, the MCP reads it, waypoint-sync maps it to the correct CSS custom property and Tailwind config key, the code updates. No PR for a color change. No "did you sync the tokens" in code review.

The deeper finding is about what a design file actually is. For 15 years the design file has been a spec: a picture of what the code should look like. The MCP turns it into an API. The design file is now a live data source the codebase subscribes to. That changes what a design system fundamentally is.

✓Pull: Figma variables to CSS custom properties

✓Push: CSS changes back to Figma via MCP

✓Parity scoring per component (0-100)

○Change report generation and channel logging

○Automated drift detection on PR

3 / 5 steps complete

The design file stops being a spec and becomes an API. That changes what a design system fundamentally is.

design systemsmcpfigmaautomation

swirl-prompt-engineering.exeongoing

Protecting generative UI from model drift

The Swirl animation on this site went through 50+ Cursor sessions across several months. The challenge was not building it. It was keeping it intact.

The finding: LLMs cannot hold visual intent across long contexts. You can describe the animation in as much detail as you want, document every constant, write rules that say "do not modify this file." The model will still drift. It optimizes for what looks like correct code rather than what produces the correct visual output.

The solution was a canonical reference artifact. Not rules about the code, but an immutable file that represents the intended output. The model cannot modify it. Every session it gets the file and the instruction: this is what it should look like. Make changes elsewhere that stay loyal to this.

This is a general finding about working with LLMs on design artifacts: the model needs something to be loyal to, not just rules to follow. Rules describe constraints. The artifact describes intent. Intent survives context windows in a way that rules do not. Currently in the documentation phase, mapping the broader implications for how teams should structure AI-assisted design work.

✓Initial finding documented (canonical reference artifact)

✓Pattern validated across 50+ sessions

✓Applied to full portfolio site build

○Documentation and generalization in progress

○Publish findings as a lab post

3 / 5 steps complete

Rules describe constraints. Artifacts describe intent. Intent survives context windows better than rules do.

prompt engineeringgenerative uicursorllm

beyond-chat.exein progress

Designing beyond the chat interface

Every AI product defaults to a text box. Type a question, get an answer. This is the command-line era of AI interfaces: functional, powerful, and requiring the user to do all the work of formulating intent into language.

Victor argued in 2006 that interactivity is a last resort. The best interface infers what you need from context and shows it without being asked. Chat is the opposite of that: it maximizes interactivity, requiring explicit input for every output.

The work I am doing with Seudo is the most direct exploration of this. Voice removes the typing friction but it is still interaction. The more interesting problem is inference: what does the interface know before you speak? The Swirl and orbital card system on this homepage is a small version: as you chat, relevant cards come forward without you clicking on them. You did not search for them. The system inferred relevance from the conversation.

Most of the work right now is mapping what the future of interfaces looks like when they are not tied to a chat box. The patterns are starting to emerge.

✓Chat interface limitations documented

✓Voice-first patterns explored via Seudo

✓Inference-driven UI prototyped (orbital card system)

○Mapping post-chat interaction paradigms

○Pattern library for ambient AI interfaces

3 / 5 steps complete

The best interface never asks. The second best interface asks once and remembers.

ai uxinteraction designvoiceambientinferenceseudo

interrupt-model.exein progress

The interrupt model: when should AI surface?

Working on Wafer has sharpened a question I keep returning to: when should an intelligent system interrupt you?

Too early is noise. Too late is useless. The threshold between the two is the entire design problem. An OS-level context layer that knows your calendar, your current task, your recent conversations, and your location has extraordinary capability. But capability is not the hard part. The hard part is knowing when to use it.

Victor framed this as the context inference problem in Magic Ink. The ideal interface already knows. But "already knows" only matters if the system also knows when to speak. A model that always answers is just another notification stream. A model that answers at exactly the right moment feels like it understands you.

I am working on a framework for thinking about interrupt timing: the conditions under which a proactive surface adds value versus friction, how to design for the threshold rather than the capability, and what signals indicate that a user is in a receptive state versus a focused one.

✓Problem defined and framed via Wafer work

✓Component patterns for proactive surfaces designed

○Interrupt timing framework in progress

○User signal taxonomy (receptive vs focused states)

○Apply framework back to Wafer component system

2 / 5 steps complete

The interrupt model is the entire product. Get it wrong and the system feels intrusive. Get it right and it feels like it understands you.

ai uxwaferambientproactive uiinference

model-in-the-loop-design.exein progress

The model as a design system collaborator

Most design systems treat AI as a consumer: the system exists, the AI queries it. Sherpa does this. You ask, it answers. That is useful but it is still the old paradigm.

The more interesting direction is treating the model as a collaborator in the design system itself. waypoint-sync points at this: the design-map.json schema is written in a form the model can read and reason about. Claude Code and Cursor can invoke sync operations through natural language. The model is not just querying the system, it is operating it.

The question I am exploring: what does a design system look like when it is built to be model-readable from the start? Not just documented for humans, but structured so a model can reason about changes, catch drift, propose components based on usage patterns, and surface inconsistencies before they become bugs. The design system stops being a static spec and becomes a living inference engine.

✓design-map.json schema designed for model readability

✓Natural language sync invocation via Claude Code and Cursor

✓Parity checking via model-evaluated component specs

○Usage pattern analysis for component proposals

○Self-modifying documentation from model observations

3 / 5 steps complete

The design system stops being a static spec and becomes a living inference engine.

design systemsllmwaypointinferenceai

voice-cognitive-rhythm.exeongoing

Designing for cognitive rhythm in voice interfaces

At Seudo and Mushroom I kept running into the same problem: timing is the entire design problem in voice interfaces. Not what you say. When you say it.

The latency threshold for voice is brutally specific. A 200ms response feels intelligent. A two-second response feels broken, regardless of output quality. This is not about performance optimization. It is about how humans assign intelligence to systems. Speed is a proxy for understanding. A slow response signals that the system did not know the answer immediately, which signals that it does not really understand you.

Beyond response latency, there is a deeper timing problem: cognitive rhythm. People think in waves, not streams. There are moments of articulation and moments of processing. A voice interface that interrupts a processing moment feels intrusive. One that waits for the next articulation moment feels collaborative.

I am mapping what cognitive rhythm looks like in practice and what interface patterns emerge from designing around it rather than against it.

✓Latency threshold documented from Mushroom work

✓Cognitive rhythm hypothesis formed via Seudo

✓Pause-gated clustering as cognitive rhythm design

○Pattern documentation in progress

○Apply to next voice interface project

3 / 5 steps complete

200ms feels intelligent. Two seconds feels broken. The gap between them is the entire design problem.

voiceseudomushroomcognitive designtiming

feed

note2025-04

Building before there was a playbook

At Mushroom in 2022 we were building LLM-powered voice interfaces before ChatGPT existed as a public product. There were no established patterns. No prior art for how users form intent with voice, how a model should respond to build trust, how the interface should handle the gap between what someone said and what the model understood.

The design challenge was not just building something. It was figuring out what to build at all. Every decision was first principles. The standard UX toolkit did not apply. We built our own evaluation methods, our own trust patterns, our own vocabulary for what good even meant.

I keep returning to that experience when I see teams moving slowly on AI products because they are waiting for best practices to emerge. The best practices come from building. The playbook does not exist until someone writes it.

voicemushroomai uxfirst principles

note2025-03

The abstraction problem in open-source AI

Working on Channel, the hardest design problem was not capability. The models were extraordinary. The problem was abstraction. How do you make the vast, fragmented landscape of open-source AI feel coherent to someone who has never heard of Stable Diffusion or Mistral?

Abstraction that erases capability is just dumbing it down. Abstraction that makes capability legible is product design. The difference is whether the user feels like the system is hiding things from them or translating things for them. One produces distrust. The other produces confidence.

This is the same problem at every layer of AI product design. The model can be extraordinary and still fail the user if the interface does not create the conditions for understanding.

channelai uxabstractionopen source

note2025-03

The tyranny of the text box

Every AI product I open defaults to a text input. The cursor blinks. The placeholder says "Ask me anything." The assumption embedded in that UI is that the user knows what to ask, how to ask it, and when. That is a lot of work to put on the user before they have even experienced what the product can do.

The chat interface is not wrong. It is just a beginning that the industry has mistaken for a destination. We learned to walk and stopped there. The next interface is the one that watches you work and surfaces what you need before you know you need it. We have the models for it. We are still building the interfaces.

ai uxchatinteraction design

eval2025-03

The prompt is evidence of interface failure

I have been running a personal eval on every AI tool I use: how much work does the user have to do before the model is useful?

The finding is consistent. The tools that feel magical are the ones that infer context from what you are doing and act without being asked. The tools that feel like work are the ones that require a perfectly crafted prompt before they produce anything useful.

The prompt is not a feature. It is a failure mode. Every character the user has to type is an interaction the interface failed to make unnecessary. The goal of prompt engineering is to eventually engineer prompts out of existence.

prompt engineeringevalai ux

note2025-02

The threshold problem

The hardest design problem at Wafer is not capability. It is timing. An OS-level context layer that knows your calendar, your current task, your open files, and your recent conversations has more than enough signal to be useful. The question is when to use it.

Surface too early: noise. Surface too late: missed the moment. The threshold between the two is not a setting you configure. It is a model of the user the system has to build and maintain. That is a fundamentally different design problem than building a feature.

Victor described this as context sensitivity in Magic Ink. The ideal interface infers from environment, history, and behavior. What he could not have known in 2006 is that we would eventually have models capable of exactly that inference. The infrastructure caught up. The design patterns have not.

waferambientproactive uimagic ink

experiment2025-02

Parallel state as a UX problem

Building Nexus surfaced a design problem I had not seen framed clearly before: parallel state divergence. When you run the same prompt across four AI models simultaneously, each model is in a different state at any given moment. One is still thinking. One has finished streaming. One returned an error. One gave a two-sentence answer twenty seconds ago.

The user needs to read all four states at a glance. Traditional chat UI is designed for sequential states: one thing happens, then the next. Parallel state requires a completely different visual language. State needs to be legible in parallel, not just sequentially.

This turns out to be a general problem for any AI product that runs multiple operations simultaneously. The design patterns for sequential UX do not transfer.

channelnexusai uxparallel state

experiment2025-02

When should a design system propose its own components?

A design system is a spec for how things should be built. But a spec is static. It describes what exists, not what should exist.

The experiment: can Waypoint observe how teams are building and propose components before they are requested? If three teams independently build a similar pattern, the system should surface that as a candidate component before a fourth team builds it again.

This requires instrumenting the system: tracking which tokens are used together, which patterns appear in multiple components, where designers are going off-system and why. It is closer to a linter than a library. The design system becomes something that learns.

design systemsinferencewaypointai

note2025-02

Capability through use

The products I find most meaningful are the ones that make you better at something. Not just more efficient. Actually better. Where using the tool repeatedly results in improved judgment, faster pattern recognition, deeper understanding.

Aim Lab was this at its best. The AI did not just tell you where you were weak, it built a training program that made you less weak. The coaching was embedded in the product mechanics, not in a report you read afterward.

Most AI products do the opposite. They make you dependent. You outsource the thinking to the model and get worse at the underlying skill. The design question that keeps me up: how do you build AI that makes you more capable in its absence, not just more productive in its presence?

aicapabilitydesign philosophystatespace

note2025-01

Cognitive rhythm and voice

People think in waves, not streams. There are moments of articulation and moments of processing. A voice interface that interrupts a processing moment feels intrusive. One that waits for the next articulation moment feels collaborative.

At Seudo, the decision to gate clustering updates to natural pauses came directly from this observation. The clustering ran continuously in the background. The UI updates waited. Users experienced the intelligence without feeling interrupted by the process.

This is a general principle for any ambient interface: the interface should be aware of the user's cognitive state and time its surfaces accordingly. Not just what to show. When.

voiceseudocognitive designambient

read2025-01

Dynamicland by Bret Victor →

If Magic Ink is Victor's argument against interaction-heavy information software, Dynamicland is the physical manifestation of what comes next. A communal computer where programs live on paper, tables, and walls. Where the interface is the room.

It sounds like a research project. It is actually a critique of every assumption we have built into computing since 1984. The window, the mouse, the screen. All of these are choices, not inevitabilities. Dynamicland asks: what if computation was ambient? What if the interface was not a device but an environment?

I am not building rooms. But the question it asks: what if the interface was everywhere and nowhere? That is the same question behind every ambient AI product I find compelling.

readbret victorambientinterface

read2024-10

Magic Ink, Bret Victor (2006) →

Required reading for anyone building AI interfaces. Victor argued in 2006 that most software is information software, that interactivity is a failure mode rather than a feature, and that the ideal interface infers context and shows what is relevant without being asked. We now have the AI infrastructure to actually build what he described. The irony is that the chat-first AI paradigm is exactly the interaction-heavy anti-pattern he was arguing against.

readdesign theorybret victor