Real-time Voice AI + Video Bug Reports

Talk to your AI. It actually does stuff.

Voice assistant with ~300ms latency. Record your screen to report bugs or request features. AI sees, hears, and fixes.

voice chat
Clawd Avatar
latency: 287ms — speech-to-speech
~300ms
Latency
Speech²
End-to-end
2 Brains
Voice + Slack
screen recording
REC 0:42
"This drawer slides up too fast. Make it 300ms."
10 frames transcribed fix ready
🔴
Record
🎤
Whisper
👁️
GPT-4o

▶ Demo

🎬 Watch the demo — voice chat + screen recording in action

Not just a voice bot.
A voice-first operating system.

Every feature is designed for speed, privacy, and actually getting things done.

~300ms Latency

Native speech-to-speech via OpenAI's Realtime API. No Whisper → GPT → TTS pipeline. Just instant, natural conversation.

🧠

Two-Brain Architecture

Voice AI is the fast front desk. Slack AI is the back office. Tasks get handed off seamlessly — you keep talking while work happens.

📱

PWA + Cross-Device

Install on any device. Conversations sync via SQLite. Start on your phone, continue on your laptop. Works offline.

🎨

3 Beautiful Themes

Neon cyberpunk, clean light, and dark mode. Full visual customization because your AI should look as good as it sounds.

📰

Article Reader

Paste a URL and Clawd reads it aloud with natural voice. Perfect for articles, docs, and content consumption on the go.

📷

Image Upload

Send images directly in voice chat. Clawd sees and describes them, answers questions about what's in the image — all by voice.

🎬

Video Bug Reports

Record your screen, narrate the bug, stop — AI extracts frames, transcribes your voice, and analyzes the issue with GPT-4o Vision. Fix suggestions in seconds.

The Two-Brain System

Voice AI handles real-time conversation. Slack AI handles real work. They talk to each other so you don't have to wait.

🎙️
The Front Desk

Voice Brain

Powered by OpenAI Realtime API. Handles conversation, answers questions instantly, and knows when to hand off tasks that need deeper work.

tasks results
🏢
The Back Office

Slack Brain

A full AI team in Slack. Runs code, searches the web, manages files, sends emails — real work that happens in the background while you keep talking.

┌──────────────────────────┐       ┌──────────────────────────┐
│    📱 Your Device        │       │    💬 Slack Workspace    │
│                          │       │                          │
│  PWA / Browser (any)     │       │  AI Agents (Clawd & co)  │
│  WebRTC Audio Stream     │       │  File ops, web, code     │
│  Push-to-talk / VAD      │       │  Email, calendar, etc.   │
└────────────┬─────────────┘       └────────────┬─────────────┘
             │ wss://                                │ Slack API
                                                    
┌─────────────────────────────────────────────────────────────┐
│                   🖥️  Express Server                       │
│                                                             │
│   WebSocket ↔ OpenAI Realtime API    Slack Bot Integration  │
│   Session Manager                    Task Queue & Results   │
│   SQLite (conversations)             Cloudflare Tunnel      │
└─────────────────────────────────────────────────────────────┘
Client (PWA)
Slack AI Team
Node.js Server

Up and running in 5 steps

From zero to voice-chatting with your AI in under 5 minutes.

1

Clone the Repo

Grab the source code from GitHub.

git clone https://github.com/clawd21/clawd-voice-chat.git && cd clawd-voice-chat
2

Install Dependencies

Just plain Node.js — no build tools, no bundlers.

npm install
3

Configure Environment

Add your OpenAI API key and Slack bot token. Copy the example and fill in your keys.

cp .env.example .env && nano .env
4

Start the Server

Fire up Express and the WebSocket server. That's it.

node server.js
5

Start Talking

Open your browser, allow mic access, and say hello. Clawd is listening. 🎙️

open https://localhost:3000

Record. Narrate. Auto-fix.

Show the bug instead of describing it. Record your screen, talk through the issue, and let GPT-4o Vision figure out what's wrong.

1

Hit Record

Tap the 🔴 button next to the attachment icon. Your browser will ask to share your screen. Mic audio is captured simultaneously.

2

Narrate the Bug

Move your mouse, click through the UI, and talk. "See this button? When I click it, the drawer slides up too fast. I want it slower."

3

Stop & Auto-Upload

Tap ⏹️ to stop. The video uploads automatically — no extra steps. Max 60 seconds, 50MB.

4

AI Analyzes & Speaks

The server extracts ~10 key frames + transcribes your narration with Whisper. GPT-4o Vision sees what you see, hears what you said, and speaks the fix back to you.

ffmpeg → key frames → Whisper transcription → GPT-4o Vision → voice response + Slack post
🖥️
Screen Capture API
getDisplayMedia + MediaRecorder
👁️
GPT-4o Vision
Sees your screen, reads your UI
🎤
Whisper Transcription
Understands your narration

Connect via Tailscale

Tailscale creates a private network between your devices. No port forwarding, no public exposure — just install, join, and you're in.

1

Install Tailscale

Download Tailscale for your device. Available on macOS, Windows, Linux, iOS, and Android.

2

Sign Up & Log In

Create a free Tailscale account (Google, Microsoft, or GitHub sign-in). Then log in on the app you just installed.

tailscale up   # CLI — or just click "Log in" in the app
3

Accept the Invite

I'll send you a Tailscale invite link. Click it to join the shared network (tailnet). Your device gets a private IP like 100.x.y.z.

💌 Ask the host for an invite link
4

Access Voice Chat

Once connected to the tailnet, open the private URL in your browser. That's it — you're in. Works from any device on the network.

open http://clawd:8470   # or the Tailscale IP
🔒
End-to-End Encrypted
WireGuard® under the hood
🌍
Works Anywhere
NAT traversal, no port forwarding
Zero Config
Install → login → done

Built with proven tools

No frameworks, no magic. Just solid, well-understood technology.

🤖
OpenAI Realtime API
Speech-to-speech engine
🟢
Node.js
Server runtime
🚂
Express
HTTP + WebSocket server
🗄️
SQLite
Local conversation store
☁️
Cloudflare Tunnel
Secure public access
💬
Slack SDK
Task handoff & AI team
🌐
Vanilla JS
Zero-dependency frontend
📦
PWA
Installable, offline-capable

Ready to talk to your AI?

Open source. Self-hosted. Your voice, your data, your rules.

⭐ Star on GitHub