Real-time Voice AI + Video Bug Reports

Talk to your AI. It actually does stuff.

Voice assistant with ~300ms latency. Record your screen to report bugs or request features. AI sees, hears, and fixes.

🚀 Get Started 🎬 Video Reports 🏗️ Architecture

voice chat

latency: 287ms — speech-to-speech

~300ms

Latency

Speech²

End-to-end

2 Brains

Voice + Slack

screen recording

REC 0:42

"This drawer slides up too fast. Make it 300ms."

10 frames transcribed fix ready

🔴

Record

🎤

Whisper

👁️

GPT-4o

▶ Demo

🎬 Watch the demo — voice chat + screen recording in action

Features

Not just a voice bot.
A voice-first operating system.

Every feature is designed for speed, privacy, and actually getting things done.

⚡

~300ms Latency

Native speech-to-speech via OpenAI's Realtime API. No Whisper → GPT → TTS pipeline. Just instant, natural conversation.

🧠

Two-Brain Architecture

Voice AI is the fast front desk. Slack AI is the back office. Tasks get handed off seamlessly — you keep talking while work happens.

📱

PWA + Cross-Device

Install on any device. Conversations sync via SQLite. Start on your phone, continue on your laptop. Works offline.

🎨

3 Beautiful Themes

Neon cyberpunk, clean light, and dark mode. Full visual customization because your AI should look as good as it sounds.

📰

Article Reader

Paste a URL and Clawd reads it aloud with natural voice. Perfect for articles, docs, and content consumption on the go.

📷

Image Upload

Send images directly in voice chat. Clawd sees and describes them, answers questions about what's in the image — all by voice.

🎬

Video Bug Reports

Record your screen, narrate the bug, stop — AI extracts frames, transcribes your voice, and analyzes the issue with GPT-4o Vision. Fix suggestions in seconds.

Architecture

The Two-Brain System

Voice AI handles real-time conversation. Slack AI handles real work. They talk to each other so you don't have to wait.

🎙️

The Front Desk

Voice Brain

Powered by OpenAI Realtime API. Handles conversation, answers questions instantly, and knows when to hand off tasks that need deeper work.

→ tasks ← results

🏢

The Back Office

Slack Brain

A full AI team in Slack. Runs code, searches the web, manages files, sends emails — real work that happens in the background while you keep talking.

┌──────────────────────────┐       ┌──────────────────────────┐
│    📱 Your Device        │       │    💬 Slack Workspace    │
│                          │       │                          │
│  PWA / Browser (any)     │       │  AI Agents (Clawd & co)  │
│  WebRTC Audio Stream     │       │  File ops, web, code     │
│  Push-to-talk / VAD      │       │  Email, calendar, etc.   │
└────────────┬─────────────┘       └────────────┬─────────────┘
             │ wss://                                │ Slack API
             ▼                                       ▼
┌─────────────────────────────────────────────────────────────┐
│                   🖥️  Express Server                       │
│                                                             │
│   WebSocket ↔ OpenAI Realtime API    Slack Bot Integration  │
│   Session Manager                    Task Queue & Results   │
│   SQLite (conversations)             Cloudflare Tunnel      │
└─────────────────────────────────────────────────────────────┘

Client (PWA)

Slack AI Team

Node.js Server

Get Started

Up and running in 5 steps

From zero to voice-chatting with your AI in under 5 minutes.

Clone the Repo

Grab the source code from GitHub.

git clone https://github.com/clawd21/clawd-voice-chat.git && cd clawd-voice-chat

Install Dependencies

Just plain Node.js — no build tools, no bundlers.

npm install

Configure Environment

Add your OpenAI API key and Slack bot token. Copy the example and fill in your keys.

cp .env.example .env && nano .env

Start the Server

Fire up Express and the WebSocket server. That's it.

node server.js

Start Talking

Open your browser, allow mic access, and say hello. Clawd is listening. 🎙️

open https://localhost:3000

Video Bug Reports

Record. Narrate. Auto-fix.

Show the bug instead of describing it. Record your screen, talk through the issue, and let GPT-4o Vision figure out what's wrong.

Hit Record

Tap the 🔴 button next to the attachment icon. Your browser will ask to share your screen. Mic audio is captured simultaneously.

Narrate the Bug

Move your mouse, click through the UI, and talk. "See this button? When I click it, the drawer slides up too fast. I want it slower."

Stop & Auto-Upload

Tap ⏹️ to stop. The video uploads automatically — no extra steps. Max 60 seconds, 50MB.

AI Analyzes & Speaks

The server extracts ~10 key frames + transcribes your narration with Whisper. GPT-4o Vision sees what you see, hears what you said, and speaks the fix back to you.

ffmpeg → key frames → Whisper transcription → GPT-4o Vision → voice response + Slack post

🖥️

Screen Capture API

getDisplayMedia + MediaRecorder

👁️

GPT-4o Vision

Sees your screen, reads your UI

🎤

Whisper Transcription

Understands your narration

Private Access

Connect via Tailscale

Tailscale creates a private network between your devices. No port forwarding, no public exposure — just install, join, and you're in.

Install Tailscale

Download Tailscale for your device. Available on macOS, Windows, Linux, iOS, and Android.

https://tailscale.com/download

Sign Up & Log In

Create a free Tailscale account (Google, Microsoft, or GitHub sign-in). Then log in on the app you just installed.

tailscale up # CLI — or just click "Log in" in the app

Accept the Invite

I'll send you a Tailscale invite link. Click it to join the shared network (tailnet). Your device gets a private IP like 100.x.y.z.

💌 Ask the host for an invite link

Access Voice Chat

Once connected to the tailnet, open the private URL in your browser. That's it — you're in. Works from any device on the network.

open http://clawd:8470 # or the Tailscale IP

🔒

End-to-End Encrypted

WireGuard® under the hood

🌍

Works Anywhere

NAT traversal, no port forwarding

⚡

Zero Config

Install → login → done

Tech Stack

Built with proven tools

No frameworks, no magic. Just solid, well-understood technology.

🤖

OpenAI Realtime API

Speech-to-speech engine

🟢

Node.js

Server runtime

🚂

Express

HTTP + WebSocket server

🗄️

SQLite

Local conversation store

☁️

Cloudflare Tunnel

Secure public access

💬

Slack SDK

Task handoff & AI team

🌐

Vanilla JS

Zero-dependency frontend

📦

PWA

Installable, offline-capable

Talk to your AI. It actually does stuff.

▶ Demo

Not just a voice bot.A voice-first operating system.

~300ms Latency

Two-Brain Architecture

PWA + Cross-Device

3 Beautiful Themes

Article Reader

Image Upload

Video Bug Reports

The Two-Brain System

Voice Brain

Slack Brain

Up and running in 5 steps

Clone the Repo

Install Dependencies

Configure Environment

Start the Server

Start Talking

Record. Narrate. Auto-fix.

Hit Record

Narrate the Bug

Stop & Auto-Upload

AI Analyzes & Speaks

Connect via Tailscale

Install Tailscale

Sign Up & Log In

Accept the Invite

Access Voice Chat

Built with proven tools

Ready to talk to your AI?

Not just a voice bot.
A voice-first operating system.