Changelog


All notable changes to CallProxy will be documented in this file.
> **Bubble quality tracking** → see [docs/BUBBLE-HISTORY.md](docs/BUBBLE-HISTORY.md)

[0.9.0] - 2026-03-22

**Milestone: Unified Stable Voice Engines** | All three engines (Gemini+Twilio, Retell, Vapi) produce correct transcripts

Fixed — Vapi Engine

  • Eliminated duplicate transcript bubbles — `conversation-update` is now the single source of truth; individual `transcript` webhook events are skipped when conversation tracking is active
  • Strengthened dedup logic — `_emitTranscript()` now checks ALL previously sent entries (not just the last one) to prevent duplicates from overlapping event paths
  • Final transcript guard — `_processFinalTranscript()` correctly skips when entries were already sent during the call

  • Fixed — Retell Engine

  • Prevented webhook/WS transcript collision — `transcript_updated` webhook now only pushes entries when Custom LLM WebSocket hasn't delivered any (WS is the primary source since v0.7.10)
  • call_ended webhook guard — already had the `lastTranscriptLength === 0` check (confirmed correct)

  • Verified — Gemini+Twilio Engine

  • Turn-based transcript (delta → turn_complete → final bubble) — no changes needed, already stable
  • Briefing data (collectedData, conversationStrategy, researchContext, objective) passes through correctly

  • Architecture

  • Transcript source hierarchy per engine:
  • Gemini+Twilio: turn-based events (no dedup needed)
  • Retell: Custom LLM WebSocket → webhook fallback → API polling fallback
  • Vapi: conversation-update → transcript event fallback → final transcript fallback
  • Each fallback only activates when the primary source hasn't delivered data

  • [0.7.14] - 2026-03-20 22:15

    **Баблы: ✅ РАБОТАЮТ** | Источник: Custom LLM WS

    Fixed

  • Agent bubbles appear only after Retell confirms speech (no phantom bubbles)
  • Track `lastTranscriptShown` to prevent duplicates during call
  • All transcript entries (agent + user) from `response_required` event

  • Known Issue

  • After call ends (receptionist hangs up), bubbles re-arrive and mix with existing ones
  • Root cause: likely final transcript fetch duplicates already-shown entries

  • Test

  • Transcript `0d4551d9` — clean 12-entry dialog, no dupes, no garbage

  • [0.7.13] - 2026-03-20 22:00

    **Баблы: ❌ фрагменты** | Источник: Custom LLM WS

    Changed

  • Show only finalized bubbles (no partials) — but agent fragments got truncated to single words

  • [0.7.12] - 2026-03-20 21:53

    Fixed

  • Single transcript source: Custom LLM WS only (removed polling/webhook for transcript)

  • [0.7.10] - 2026-03-20 21:45

    Added

  • Retell Custom LLM WebSocket integration — real-time transcript via WS instead of polling

  • [0.7.05] - 2026-03-20 20:07

    Fixed

  • Always fetch final transcript on endCall
  • Dynamic version display everywhere

  • [0.6.0] - 2026-03-20


    Added

  • Retell AI Voice Engine — full integration with Retell AI as alternative voice stack
  • Dynamic LLM + Agent creation per call
  • Phone calls via Retell API with SIP trunking through Twilio
  • Webhook handling for call_started, call_ended, call_analyzed events
  • Polling for real-time call status updates
  • Final transcript retrieval after call ends
  • SIP Trunking — Twilio Elastic SIP Trunk configured for Retell AI
  • Retell IP whitelist (18.98.16.120/30)
  • Origination URI for inbound (sip:sip.retellai.com)
  • Phone number +14587777791 imported to Retell via SIP trunking
  • Enables calling international numbers (not just US)
  • Claude 4.6 Sonnet as default LLM for Retell voice engine
  • Live transcript in UI — call transcripts pushed to browser during Retell calls
  • WS stays alive during call phase (call_started event + reconnect)
  • Normalized transcript format (role/text ↔ speaker/original)
  • Final transcript fetch with retry after call ends
  • Number pronunciation — prompt instructs AI to dictate numbers in pairs (48, 34, 11) instead of large numbers
  • Changelog — version link in UI opens changelog

  • Fixed

  • Retell API endpoint paths (mixed v2/non-v2 across endpoints)
  • WS disconnect during Retell calls (missing call_started handler in UI)
  • Transcript format mismatch between Retell and Gemini+Twilio engines
  • Session cleanup race condition (polling vs webhook vs endCall)
  • SIP trunk auth error (removed credential list requirement, IP ACL only)

  • Changed

  • Retell LLM model updated from gpt-4o to claude-4.6-sonnet
  • RETELL_FROM_NUMBER switched from Retell-purchased to Twilio SIP trunk number

  • [0.5.0] - 2026-03-20


    Added

  • VoiceEngine abstraction — pluggable voice engine architecture
  • Base class with startCall/endCall/sendUserAudio/injectText interface
  • Factory pattern for engine selection (gemini-twilio, retell, vapi)
  • Per-call engine creation with voice stack selector
  • Setup panel with 5 configuration toggles
  • Languages (I speak / Call in)
  • Briefing mode (Pre-filled / Empty)
  • Call Mode (AI Test / I Test / Real Call)
  • Call Channel (Browser / Test Phone / Real Phone)
  • Voice Stack (Gemini+Twilio / Retell AI / Vapi)
  • localStorage persistence for all settings
  • Call recording — CallRecorder for real calls (inbound/outbound/mixed WAV)
  • Transcript saving to data/transcripts/{sessionId}.json
  • Retell AI account registered (nina@bookly.net), phone number purchased
  • Vapi AI account registered (nina@bookly.net)
  • Receiver.html full refactor — 3 screens (waiting/incoming/active), dark theme
  • 4-Agent browser mode — Proxy + Oxy + Receptionist + User in browser

  • Fixed

  • Phone number cleaning (strips text like "(Sanitas)" from numbers)
  • Auto-sync between Call Mode and Call Channel
  • Real Call mode forces Real Phone channel

  • [0.4.0] - 2026-03-19


    Added

  • Initial voice briefing with Gemini Live
  • Case context system with pre-loaded facts
  • Gemini+Twilio voice engine for real phone calls
  • Browser-based call mode with WebRTC

  • [0.6.1] - 2026-03-20


    Added

  • Real-time transcript via `transcript_updated` webhook — bubbles appear during the call
  • Ask User detection — `[ВОПРОС]` pattern in agent speech triggers ask_user UI prompt

  • Fixed

  • Event listeners kept alive until `endCall()` completes (transcript was lost)
  • Server crash from escaped backticks in changelog route
  • Debug logging when WS is dead during transcript push