Technology & AI

DeepL, known for translating text, now wants to translate your voice

DeepL, a translation company best known for its text tools, released a speech-to-speech translation system today that covers use cases such as meetings, mobile and web chats, and front-line staff group discussions with custom apps. The company also releases an API that allows external developers and businesses to build on DeepL’s technology for customized use cases, such as call centers.

“After spending so many years translating text, voice was a natural step for us,” DeepL CEO Jarek Kutylowski told TechCrunch in an interview. “We’ve come a long way when it comes to translating text and translating documents. But we thought there was no good real-time voice translation product.”

Kutylowski said the challenges in creating a real-time translation product center on finding a balance between minimizing latency — the delay between the person speaking and the playback of the translated audio — and maintaining accurate results.

DeepL releases add-ons for platforms like Zoom and Microsoft Teams, where listeners can hear real-time translation while others speak in native languages ​​or follow real-time translated text on the screen. The program is currently in early access, and the company is inviting organizations to join the waiting list. The company also has a mobile and web-based chat product that can be in person or remotely.

DeepL also allows users to participate in group discussion in settings such as training sessions or workshops, allowing participants to join with a QR code.

DeepL said its voice-to-speech technology can also learn and practice familiar vocabulary, such as industry-specific words and company and personal names.

Kutylowski said AI is reimagining what customer service will look like in the coming years. He noted that the translation layer helps companies provide support in languages ​​where trained staff are scarce and expensive to hire.

Techcrunch event

San Francisco, CA
|
October 13-15, 2026

The company said it controls the entire word-for-word stack. However, the current system converts speech to text, applies translation, and then converts that back to speech. DeepL believes that since it has been working on text translation for years, it has limitations in translation quality. Going forward, the company wants to develop an end-to-end voice translation model that skips the text step entirely.

DeepL faces competition from several well-funded startups working in nearby corners of the space. Sanas, which last year raised $65 million from Quadrille Capital and Teleperformance, uses AI to change a speaker’s pronunciation in real time — a tool aimed primarily at call center agents.

Dubai-based Camb.AI specializes in speech synthesis and translation for media and entertainment companies Amazon Web Services, helping to dub and localize video content at scale.

Palabra, backed by Reddit co-founder Alexis Ohanian, Seven Seven Six, is building a real-time speech translation engine designed to preserve both the meaning and the speaker’s original voice, putting it in very direct competition with what DeepL is currently building.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button