AI

Translingo Call Translator

An AI phone translator that translates calls, messages, and more in real time

Translingo Call Translator 1st image
Client

Confidential

Duration

9 months

Category

AI

Year

2024

Technology

AWS, Twilio, WebRTC, Whisper ASR, Google Translate API, ElevenLabs TTS, NodeJS, WebSockets, Docker

Introduction

Translingo Call Translator is a sophisticated, AI-powered communication tool designed to eliminate language barriers instantly by translating phone calls, messages, and more in real-time. Leveraging advanced machine learning, it provides a seamless, instantaneous, and reliable bridge for global communication, maintaining context and nuance.

Translingo Call Translator 2nd image

Challenges

Translingo Call Translator must deliver seamless, real-time communication by chaining together multiple high-latency services via real-time communication protocols. However, several challenges stand in the way of achieving this goal:

– Cumulative latency across the multi-step pipeline: real-time translation requires a sequential chain of events: speech → text → translate → speech. Each step adds its own processing time, which, when combined with network latency, results in a significant, cumulative delay.

– Maintaining context and turn-taking in real-time: The system must accurately determine when one speaker has finished and the other has started (Voice Activity Detection, or VAD) to correctly segment audio for translation. Misidentification of speech boundaries (due to accents, background noise, or short pauses) causes the system to translate incomplete thoughts or interrupt the current speaker, leading to mistranslations and a severely disruptive conversational flow.

– High operational cost and resource management: The core process relies on computationally intensive, per-use external services (for speech recognition, translation, and text-to-speech) and highly concurrent infrastructure. Running these demanding services constantly for real-time translation leads to unpredictably high external service costs, significant internal resource consumption, and challenges in efficiently scaling the stateless processing backend.

Translingo Call Translator 3rd image

Solutions

To overcome the challenges, Hola Tech adhered to the best practices. Key components of the solution included:

– Implement streaming & chunk-based translation: Instead of waiting for a full sentence or speaker turn, segment the audio stream (WebRTC/Twilio) into 500ms-1 second chunks. Use a streaming ASR solution (or an optimized, low-latency deployment of Whisper), perform translation on these partial chunks, and utilize streaming TTS (ElevenLabs) to begin playing the translated audio immediately. This overlaps the processing steps, drastically reducing the perceived latency.

– Integrate advanced Voice Activity Detection (VAD): Supplement the ASR module with a dedicated, highly tuned VAD component at the NodeJS/WebSocket layer. This VAD microservice should analyze the incoming audio stream (WebRTC) to reliably detect speaker start and stop points. This ensures that transcription (Whisper) is only initiated when speech is active, and that the translation step (Google Translate API) receives complete, contextual speaker segments, preventing interruptions and enhancing translation accuracy.

– Leverage AWS spot instances and container orchestration for inference: Package self-hosted ASR and TTS models (or their optimized open-source variants) in Docker containers. Deploy these containers on AWS ECS/EKS utilizing Spot Instances (for significant cost reduction) and orchestrate the load dynamically based on real-time call volume. This replaces constant, expensive API calls with internal, scalable compute resources that are managed efficiently.

Translingo Call Translator 4th image

Featured numbers

– Supported simultaneous real-time translation for over 25 languages and major dialects
– Processed over 1.5 million minutes of translated audio since the platform’s launch

Results

The implemented system successfully met Translingo Call Translator’s requirements for real-time performance, communication flow integrity, and language coverage. The platform has significantly lowered the friction for international communication, attracting over 50 major enterprise clients in customer support and cross-border sales. Translingo Call Translator has empowered businesses and individuals to conduct seamless multilingual calls and messaging, evidenced by the 1.5 million minutes of audio successfully translated. This demonstrates Translingo’s success in providing an instantaneous, reliable, and high-quality solution for breaking down global language barriers.

Other Projects