Tyrant is a Telegram mini app that aims to deliver an engaging combat and tap-to-earn experience, fostering community interaction through social missions within the Telegram ecosystem.
An AI phone translator that translates calls, messages, and more in real time
Confidential
9 months
AI
2024
AWS, Twilio, WebRTC, Whisper ASR, Google Translate API, ElevenLabs TTS, NodeJS, WebSockets, Docker
Translingo Call Translator is a sophisticated, AI-powered communication tool designed to eliminate language barriers instantly by translating phone calls, messages, and more in real-time. Leveraging advanced machine learning, it provides a seamless, instantaneous, and reliable bridge for global communication, maintaining context and nuance.
Translingo Call Translator must deliver seamless, real-time communication by chaining together multiple high-latency services via real-time communication protocols. However, several challenges stand in the way of achieving this goal:
– Cumulative latency across the multi-step pipeline: real-time translation requires a sequential chain of events: speech → text → translate → speech. Each step adds its own processing time, which, when combined with network latency, results in a significant, cumulative delay.
– Maintaining context and turn-taking in real-time: The system must accurately determine when one speaker has finished and the other has started (Voice Activity Detection, or VAD) to correctly segment audio for translation. Misidentification of speech boundaries (due to accents, background noise, or short pauses) causes the system to translate incomplete thoughts or interrupt the current speaker, leading to mistranslations and a severely disruptive conversational flow.
– High operational cost and resource management: The core process relies on computationally intensive, per-use external services (for speech recognition, translation, and text-to-speech) and highly concurrent infrastructure. Running these demanding services constantly for real-time translation leads to unpredictably high external service costs, significant internal resource consumption, and challenges in efficiently scaling the stateless processing backend.
To overcome the challenges, Hola Tech adhered to the best practices. Key components of the solution included:
– Implement streaming & chunk-based translation: Instead of waiting for a full sentence or speaker turn, segment the audio stream (WebRTC/Twilio) into 500ms-1 second chunks. Use a streaming ASR solution (or an optimized, low-latency deployment of Whisper), perform translation on these partial chunks, and utilize streaming TTS (ElevenLabs) to begin playing the translated audio immediately. This overlaps the processing steps, drastically reducing the perceived latency.
– Integrate advanced Voice Activity Detection (VAD): Supplement the ASR module with a dedicated, highly tuned VAD component at the NodeJS/WebSocket layer. This VAD microservice should analyze the incoming audio stream (WebRTC) to reliably detect speaker start and stop points. This ensures that transcription (Whisper) is only initiated when speech is active, and that the translation step (Google Translate API) receives complete, contextual speaker segments, preventing interruptions and enhancing translation accuracy.
– Leverage AWS spot instances and container orchestration for inference: Package self-hosted ASR and TTS models (or their optimized open-source variants) in Docker containers. Deploy these containers on AWS ECS/EKS utilizing Spot Instances (for significant cost reduction) and orchestrate the load dynamically based on real-time call volume. This replaces constant, expensive API calls with internal, scalable compute resources that are managed efficiently.
– Supported simultaneous real-time translation for over 25 languages and major dialects
– Processed over 1.5 million minutes of translated audio since the platform’s launch
The implemented system successfully met Translingo Call Translator’s requirements for real-time performance, communication flow integrity, and language coverage. The platform has significantly lowered the friction for international communication, attracting over 50 major enterprise clients in customer support and cross-border sales. Translingo Call Translator has empowered businesses and individuals to conduct seamless multilingual calls and messaging, evidenced by the 1.5 million minutes of audio successfully translated. This demonstrates Translingo’s success in providing an instantaneous, reliable, and high-quality solution for breaking down global language barriers.
Tyrant is a Telegram mini app that aims to deliver an engaging combat and tap-to-earn experience, fostering community interaction through social missions within the Telegram ecosystem.
Sui YOLO Fun is positioned as Sui’s premier rug-resistant fun token launchpad, aiming to redefine community-driven engagement within the Sui blockchain ecosystem. It provides a secure and transparent platform for the launch of meme coins, embodying a bold yet responsible approach to digital finance where participants can rally around assets like “$ARTHUR” with a focus on mitigating common risks associated with speculative ventures.
PreBuild is a software development company that aims to streamline the software planning and development process for businesses.
DentalCall AI Reception is an AI voice receptionist for dental clinics, handling calls, booking appointments, answering typical queries. This platform acts as an intelligent front-desk extension, ensuring every patient call is answered with professional clarity while clinical staff focus on providing high-quality dental care.