Aktiviere Job-Benachrichtigungen per E-Mail!
A leading customer service software provider is seeking an experienced Senior Voice AI Agent Engineer in Berlin. This role focuses on innovating voice applications, integrating AI models, and optimizing real-time interactions. The ideal candidate will have strong expertise in voice AI, Python, and deployment to cloud platforms. This position offers a hybrid work model combining onsite collaboration and remote flexibility.
The Agentic Tribe is revolutionizing the chatbot and voice assistance landscape with Gen3, a cutting-edge AI Agent system that is goal-oriented, dynamic, and truly conversational. Gen3 is capable of reasoning, planning, and adapting to user needs in real time, delivering personalized experiences through a multi-agent architecture and advanced language models.
We are seeking a passionate and experienced Senior Voice AI Agent Engineer with a strong focus on Voice AI to join our team. You will be dedicated to innovating at the forefront of conversational AI, engineering intelligent, autonomous agents that can listen, understand, and speak with human-like fluidity.
You will build the cognitive architecture for voice applications, creating systems that can reason, plan, and execute complex tasks through seamless, low-latency spoken dialogue. A key part of your role will be to effectively communicate complex technical concepts to both technical and non-technical stakeholders.
Design and develop robust, stateful, and scalable voice-first AI agents using Python, optimized for real-time voice interactions, managing turn-taking, interruptions, and low-latency responses.
Integrate real-time Speech-to-Text (STT), Text-to-Speech (TTS), and Voice Activity Detection (VAD) services to create a seamless conversational flow.
Connect voice agents with enterprise systems, databases, and third-party APIs to create end-to-end automated workflows initiated and managed through voice.
Establish and own the evaluations for voice agent performance and behavior, iterating to improve performance, reliability, and user experience.
Build end-to-end conversational flows with reasoning, planning, and dynamic tool use beyond pre-scripted experiences.
Work cross-functionally with product managers, ML scientists, and engineers to understand user needs and voice interaction goals.
Implement fallback, recovery, and error-handling strategies for noisy audio input or speech recognition inaccuracies.
Define and track voice-specific evaluation metrics (e.g., word error rate, latency, conversational naturalness).
Develop observability tools and guardrails to monitor performance, ensure safety, and handle edge cases in spoken interactions.
Document development, architecture decisions, and research findings to share knowledge across the team.
LLM-oriented system design: Experience building multi-step, tool-using agents (LangChain, Autogen). Familiar with prompt engineering, context management, and reasoning strategies like Chain-of-Thought and ReAct.
Voice AI Expertise:
Experience building low-latency, streaming voice applications. Expertise in integrating and managing real-time STT/TTS models and APIs. Proficient with Voice Activity Detection (VAD), noise suppression, and interruption logic.
Experience with integrating third-party voice AI APIs, including STT and TTS services from providers like OpenAI, Deepgram, ElevenLabs, etc.
Understanding of latency, timing, and streaming audio constraints.
Tool integration and APIs: Comfortable connecting agents to external APIs, tools, and databases in secure environments.
RAG (Retrieval-Augmented Generation): Building pipelines with vector stores, chunking strategies, and hybrid retrieval.
Evaluation and Observability: Implementing and using monitoring tools and evaluation frameworks (Braintrust) to score AI Agents.
Safety and Reliability: Familiarity with prompt injection defense, guardrails, and failover logic.
Performance optimization: Token budgeting and latency management using caching, model routing, etc.
Programming and deployment: Expert in Python, FastAPI, and LLM SDKs. Experience deploying AI apps to cloud platforms (AWS, GCP, Azure) using CI/CD best practices.
M.S. / Ph.D. in Computer Science, NLP, Machine Learning, or related field
Background in spoken dialogue systems or conversational UX design.
Familiarity with real-time streaming architecture (e.g., WebRTC, gRPC, socket.io).
Multilingual ASR/TTS pipeline experience
Zendesk builds software for better customer relationships. It empowers organizations to improve customer engagement and understand their customers. Zendesk products are easy to use and implement, enabling rapid innovation and scalable growth.
More than 100,000 paid customer accounts in over 150 countries use Zendesk products. Based in San Francisco, Zendesk has operations worldwide.
Interested in knowing what we do in the community? Check out the Zendesk Neighbor Foundation to learn more about our community engagement efforts.
Zendesk is an equal opportunity employer. We are committed to fostering diversity, inclusion, and belonging in the workplace. We assess applicants without regard to race, color, religion, national origin, age, sex, gender, gender identity, gender expression, sexual orientation, marital status, medical condition, disability, veteran status, or any other protected characteristic.
By submitting your application, you agree that Zendesk may collect your personal data for recruiting and related purposes. Zendesk's Candidate Privacy Notice explains what data may be processed, where, the purposes, and rights you have regarding your information.
#LI-MK12
Hybrid: This role combines onsite collaboration with remote work flexibility. The specific in-office schedule is determined by the hiring manager.