voiceChain

voiceChain is a high-performance voice AI framework built from the ground up to run entirely offline on Apple Silicon. It orchestrates STT, LLM, and TTS models into a seamless, parallelized stream, enabling natural, human-like conversations that you can interrupt mid-sentence.

Built With

  • MLX
  • LLama.cpp
  • AsyncIO

Technical Breakdown

The core innovation is the asynchronous, multi-service architecture built with Python's AsyncIO. It solves the primary latency problem of voice pipelines by overlapping computation.

  • STT (Whisper), LLM (Llama.cpp), and TTS (Kokoro) run in parallel managed by a central orchestrator.
  • The LLM streams tokens as they are generated, which are immediately buffered into sentences.
  • The TTS engine synthesizes the first sentence while the LLM is still generating the rest of the response, drastically reducing perceived latency.
  • The entire flow is managed by non-blocking asyncio.Queues, decoupling each stage of the pipeline.
1// In ConversationManager: The main event loop listens for a VAD utterance.
2async def run(self):
3    logger.info("Conversation Manager started. Listening for speech...")
4    while True:
5        user_audio_data = await self.services.user_utterance_queue.get()
6
7        if self.state == AgentState.RESPONDING:
8            is_barge_in = await self.check_for_barge_in(user_audio_data)
9            if is_barge_in: await self.handle_barge_in(user_audio_data)
10        
11        elif self.state == AgentState.IDLE:
12            # ... start new turn ...