ux beyond prompting [wip]
just dove into this whole ai agent space, A2A by Google following MCP and i'm honestly mind-blown at what's possible with such simple code. wanted to capture my thoughts while they're fresh...
i define an agent as:
agent = AI + tools + autonomy to reach goals & decide when to stop
the coolest part is watching it solve problems on its own:
USER: "can you check if anyone's in the bedroom?"
SMART HOME AGENT: *thinking*
1. Need to see bedroom
2. It's dark in there
3. Should turn on lights first
↓
SMART HOME AGENT: *turns on bedroom lights*
*activates camera*
"The bedroom is empty."
what blows my mind is how SIMPLE the code flow is to get this kind of emergent behavior. like, figuring out it needed to turn lights on before searching. that's the emergent behavior i'm obsessed with.
& the ux questions this raises:
- how do users understand what the agent is capable of?
- how do they build a mental model of its capabilities?
- what happens when it fails? how do users debug it?
- where's the line between explicit commands and letting it figure things out?
the A2A GitHub Repo shows how simple the code is to create this emergence, but that creates a whole new set of problems:
SIMPLE CODE ──> COMPLEX BEHAVIOR
│ │
│ ▼
│ ┌──────────┐
│ │UNEXPECTED│
│ │SOLUTIONS │
│ └──────────┘
│ │
▼ ▼
┌──────────┐ ┌──────────┐
│ HOW TO │ │ HOW TO │
│UNDERSTAND│ │ CONTROL? │
└──────────┘ └──────────┘
now, say an ai agent embedded in a stove... it even raises HUGE ux questions:
USER INTENT
│
▼
┌─────────┐
│ CONFIRM │◀───┐
│ REPAIR │ │
└────┬────┘ │
│ │
▼ │
┌─────────┐ │
│ VISIBLE │ │
│ ACTIONS │────┘
└────┬────┘
│
▼
┌─────────┐
│ STOVE │
│ AGENT │
└─────────┘
like....
- how does the user actually SEE what the agent is planning? [folded 'Thinking...' ui not helping]
- what's the right level of transparency?
- how do you let users veto or modify the agent's plans? [google's deep research asking for user confirmation before starting]
- what's the interaction model for "wait, don't do that"? [need more than just interrupted state after pressing ⏹️ button]
- how do you balance autonomy with safety?
- what happens when the agent makes a bad decision with a physical device?
research directions i'm obsessed with right now...
i spent way too much time on this ascii diagram, but finally got something that feels right for my research areas. sometimes i need to just see things visually to make them click in my head...
+-------------------+
| my research zones |
+---------+---------+
|
+-----------------+----------------+
| | |
+---------v--------+ +------v------+ +------v-----------+
| human-ai teams | |simple agents| | text beyond |
+---------+--------+ +-----+-------+ | prompting |
| | +------+-----------+
+---------v--------+ +-----v-------+ |
| multiplayer | |ai that acts | |
| experiences | |on the world | +------v-----------+
+------------------+ +-------------+ | ubiquitous |
| intelligence |
+------+-----------+
|
+------v-----------+
| embodiment & |
| physical ai |
+------------------+
there we go. i keep staring at this, trying to see the connections between all these pieces. they're so obviously related in my head, but putting it down on paper (or in ascii maker in obsidian) makes me realize how much i still need to articulate but in the best way possible.
and here's the run-away thoughts i tried to enter in my notes...
- human-ai collaboration in multiplayer spaces
- like those on-canvas Notion AI editors
- humans + ai together, not just 1:1
┌────────┐ ┌────────┐ │ HUMAN │◀───▶│ HUMAN │ └───┬────┘ └────┬───┘ │ │ ▼ ▼ ┌────────┐ ┌────────┐ │ AI │◀───▶│ AI │ └────────┘ └────────┘
the interaction design challenges:
- how do humans and ai's communicate intent to each other?
- how do multiple ai's coordinate without overwhelming humans?
- what's the right level of ai autonomy in a group setting?
- how do you design for the handoffs between human-human, human-ai, and ai-ai?
- what visual language works for showing ai thought processes? (think, grok's teleprompter style)
- how do you handle conflicts between multiple agent goals?
- simple agents that act on the world
- not just knowledge assistants
- real impact on physical/digital environments with toolings
the search problem nobody's talking about...
┌───────────┐ ┌───────────┐ ┌───────────┐
│ AGENT 1 │ │ AGENT 2 │ │ AGENT 3 │
└─────┬─────┘ └─────┬─────┘ └─────┬─────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────┐
│ MCP/A2A TOOL DISCOVERY │
└─────────────────────────────────────────────┘
if agents become ubiquitous, we're going to need entirely new ways for them to discover MCP tools and capabilities. we might need search engines specifically designed for agents, not humans.
- text beyond prompting
- visualizing text in new ways
- automating hermeneutics (text interpretation now happens by RAG)
- text editors with photoshop-like tools?
TEXT
│
┌───┴───┐
│ │
▼ ▼
┌─────┐ ┌─────┐
│PARSE│ │DRAW │
└──┬──┘ └──┬──┘
│ │
▼ ▼
┌─────────────┐
│ INTERACTIVE │
│ ENVIRONMENT │
└─────────────┘
i'm thinking about:
- how would a ui work when text becomes manipulable like objects?
- what's the right visual metaphor for "text as computation"?
- how do you design for the transition between reading and manipulating?
- tiny embedded intelligence everywhere
- intelligence too cheap to meter
- what happens when AI is just... ambient?
- what's the interaction model for "please stop helping me"?
there's something really human-scale about all these interests... they're simple but lead to emergent complexity, and they all connect to how we actually live and work.
when things break down... ux for negative ai experiences
USER AGENT
│ │
│ REQUEST │
│──────────────────────▶ │
│ │
│ │
│ ┌────────────────┐ │
│◀─┤sorry, i can't │ │
│ │do that because │ │
│ │[generic reason]│ │
│ └────────────────┘ │
│ │
│ FRUSTRATION │
│──────────────────────▶ │
│ │
i keep thinking about how terrible we are at handling the negative spaces in ai interfaces. like, we've all seen those "i'm sorry, i can't do that" messages that explain nothing and solve nothing.
the deeper ux questions nobody's solving:
- how do we show users what happened when context exceeds? it's such an abstract concept but it makes their experience break completely.
- what's the right visual metaphor for "i understood what you asked but i'm not allowed to do it"? right now it's this weird deflection that makes users feel gaslit.
- how do we design graceful degradation for ai systems? they don't degrade gradually... they just hit walls and stop.
┌──────────────────────────┐
│ NEGATIVE SPACE │
│ │
│ ┌─────┐ ┌────────┐ │
│ │LIMIT│─────>│BOUNDARY│ │
│ └─────┘ └────┬───┘ │
│ │ │
│ ▼ │
│ ┌─────────┐ │
│ │USER │ │
│ │RECOVERY │ │
│ └─────────┘ │
└──────────────────────────┘
i had this experience yesterday where ChatGPT just kept saying "i can't help with that" and i had no idea why or how to rephrase. it's like trying to find your way out of a dark room with no feedback.
maybe the hardest ux problem... when an ai system says "context exceeds" but the user doesn't understand why or how to fix it.
like you're in the middle of something important and suddenly:
"i'm sorry, but your context length exceeds my capacity" it's such a technical, cold way of saying "i can't remember what we were talking about anymore."
that negative space - the gap between what users want and what systems can provide... that's a design challenge nobody's exploring enough. i'm starting to think we need a whole grammar of "graceful degradation" for ai interfaces. i've been sketching (ascii-ing) out some ideas for better approaches:
┌─────────────────────────────────────┐
│ NEGATIVE EXPERIENCES │
│ │
│ ┌─────────┐ ┌──────────────┐ │
│ │CONTEXT │ │ALTERNATIVE │ │
│ │EXCEEDED │────▶│PATHS OFFERED │ │
│ └─────────┘ └──────────────┘ │
│ │
│ ┌─────────┐ ┌──────────────┐ │
│ │NOT │ │PARTIAL │ │
│ │ALLOWED │────▶│FULFILLMENT │ │
│ └─────────┘ └──────────────┘ │
│ │
│ ┌─────────┐ ┌──────────────┐ │
│ │ERROR │ │TRANSPARENT │ │
│ │STATE │────▶│EXPLANATION │ │
│ └─────────┘ └──────────────┘ │
└─────────────────────────────────────┘
what's next for me?
i'm looking for meaty projects for my freelance or a role - something that could justify pulling together a small team for 3+ months, ideally with public outcomes.
my experience shows that focused prototypes with tangible outputs lead to:
PROTOTYPE ──→ NEW PRODUCT IDEAS
│
├───→ INTERESTING UX CHALLENGES
│
└───→ DRIVING TECHNICAL RESEARCH
maybe it's not a typical client project though? open to other approaches or even starting something completely new.
honestly just putting this out there to see what resonates. this feels like such a fertile space right now and i'm itching to build something that matters, specifically on one of these ux problems:
- agent transparency and control
- multi-agent coordination
- physical ai interfaces
- text as interactive medium
note to self: should reach out to folks working in 'ai provisional service' and 'multiplayer ai chat' spaces - seems like the most natural way