Cenbot is a blockchain technology solution that aims to provide native Telegram bot functionalities on the SUI blockchain.
An AI phone translator that translates calls, messages, and more in real time
Confidential
9 months
AI
2024
AWS, Twilio, WebRTC, Whisper ASR, Google Translate API, ElevenLabs TTS, NodeJS, WebSockets, Docker
Translingo Call Translator is a sophisticated, AI-powered communication tool designed to eliminate language barriers instantly by translating phone calls, messages, and more in real-time. Leveraging advanced machine learning, it provides a seamless, instantaneous, and reliable bridge for global communication, maintaining context and nuance.
Translingo Call Translator must deliver seamless, real-time communication by chaining together multiple high-latency services via real-time communication protocols. However, several challenges stand in the way of achieving this goal:
– Cumulative latency across the multi-step pipeline: real-time translation requires a sequential chain of events: speech → text → translate → speech. Each step adds its own processing time, which, when combined with network latency, results in a significant, cumulative delay.
– Maintaining context and turn-taking in real-time: The system must accurately determine when one speaker has finished and the other has started (Voice Activity Detection, or VAD) to correctly segment audio for translation. Misidentification of speech boundaries (due to accents, background noise, or short pauses) causes the system to translate incomplete thoughts or interrupt the current speaker, leading to mistranslations and a severely disruptive conversational flow.
– High operational cost and resource management: The core process relies on computationally intensive, per-use external services (for speech recognition, translation, and text-to-speech) and highly concurrent infrastructure. Running these demanding services constantly for real-time translation leads to unpredictably high external service costs, significant internal resource consumption, and challenges in efficiently scaling the stateless processing backend.
To overcome the challenges, Hola Tech adhered to the best practices. Key components of the solution included:
– Implement streaming & chunk-based translation: Instead of waiting for a full sentence or speaker turn, segment the audio stream (WebRTC/Twilio) into 500ms-1 second chunks. Use a streaming ASR solution (or an optimized, low-latency deployment of Whisper), perform translation on these partial chunks, and utilize streaming TTS (ElevenLabs) to begin playing the translated audio immediately. This overlaps the processing steps, drastically reducing the perceived latency.
– Integrate advanced Voice Activity Detection (VAD): Supplement the ASR module with a dedicated, highly tuned VAD component at the NodeJS/WebSocket layer. This VAD microservice should analyze the incoming audio stream (WebRTC) to reliably detect speaker start and stop points. This ensures that transcription (Whisper) is only initiated when speech is active, and that the translation step (Google Translate API) receives complete, contextual speaker segments, preventing interruptions and enhancing translation accuracy.
– Leverage AWS spot instances and container orchestration for inference: Package self-hosted ASR and TTS models (or their optimized open-source variants) in Docker containers. Deploy these containers on AWS ECS/EKS utilizing Spot Instances (for significant cost reduction) and orchestrate the load dynamically based on real-time call volume. This replaces constant, expensive API calls with internal, scalable compute resources that are managed efficiently.
– Supported simultaneous real-time translation for over 25 languages and major dialects
– Processed over 1.5 million minutes of translated audio since the platform’s launch
The implemented system successfully met Translingo Call Translator’s requirements for real-time performance, communication flow integrity, and language coverage. The platform has significantly lowered the friction for international communication, attracting over 50 major enterprise clients in customer support and cross-border sales. Translingo Call Translator has empowered businesses and individuals to conduct seamless multilingual calls and messaging, evidenced by the 1.5 million minutes of audio successfully translated. This demonstrates Translingo’s success in providing an instantaneous, reliable, and high-quality solution for breaking down global language barriers.
Cenbot is a blockchain technology solution that aims to provide native Telegram bot functionalities on the SUI blockchain.
Wingswap is a decentralized finance (DeFi) platform that aims to provide automated, blazing-fast, and all-round solutions for the industry, enabling users to engage in yield farming, execute swaps and limit orders with low fees, and explore advanced NFT functionalities on a user-centric platform.
Kangamoon is an engaging two-player fighting game designed to offer an immersive combat experience. It allows players to customize their fighters by purchasing in-game items to enhance power and provides integrated deposit and withdrawal functionalities.
Sui YOLO Fun is positioned as Sui’s premier rug-resistant fun token launchpad, aiming to redefine community-driven engagement within the Sui blockchain ecosystem. It provides a secure and transparent platform for the launch of meme coins, embodying a bold yet responsible approach to digital finance where participants can rally around assets like “$ARTHUR” with a focus on mitigating common risks associated with speculative ventures.