Enable job alerts via email!
A cutting-edge AI technology firm in the United States is seeking a seasoned Sr. Software Engineer to architect and build intelligent backend infrastructure for autonomous AI research agents. The role involves designing orchestration systems and ensures reliable execution of complex multi-step research workflows. Ideal candidates should have a strong background in backend systems, AI/ML expertise, and experience with distributed systems architecture.
As a Sr. Software Engineer at Keru, you'll architect and build the intelligent backend infrastructure that powers our autonomous AI research agents. You'll design the core orchestration systems that coordinate multiple specialized AI agents, manage complex multi-step research workflows, and ensure reliable execution of mission-critical financial analysis tasks.
You'll live our core principle of 'Forward-Deployed with Product DNA' - building agent orchestration systems that directly solve real customer problems while maintaining the fault tolerance and observability needed to handle billions of dollars in investment decisions. Your systems will power AI agents that autonomously navigate complex research tasks, from SEC filing analysis to multi-source data synthesis, all while maintaining full audit trails and human oversight capabilities.
This role is ideal for engineers who want to build foundational agentic infrastructure at the intersection of AI and finance, where robust system architecture enables autonomous agents to augment human decision-making at enterprise scale.
Agent Orchestration & Workflow Engineering: Design and implement sophisticated agent coordination systems that manage complex, multi-step research workflows. Build state machines, task queues, and execution engines that coordinate specialized AI agents across diverse data sources and analysis tasks.
Multi-Agent System Architecture: Architect scalable systems for agent communication, task delegation, and result synthesis. Implement patterns for agent specialization, load balancing, and dynamic resource allocation across research workflows.
Autonomous Task Execution: Build robust execution frameworks that handle long-running, multi-phase research tasks with automatic retry logic, error recovery, and graceful degradation. Ensure agents can autonomously navigate complex decision trees while maintaining human oversight capabilities.
AI Model Integration & Management: Integrate and orchestrate multiple language models (GPT, Claude, specialized financial models) with intelligent routing, fallback mechanisms, and cost optimization. Build abstraction layers that allow seamless model swapping and A/B testing.
Real-Time Data Pipeline Architecture: Design high-throughput data ingestion systems that process streaming financial data, SEC filings, news feeds, and alternative datasets. Build event-driven architectures that trigger agent workflows based on real-time market events.
Agent Memory & Context Management: Implement sophisticated memory systems that allow agents to maintain context across long research sessions, learn from past interactions, and build upon previous analysis. Design vector databases and knowledge graphs that support intelligent information retrieval.
Enterprise Integration & API Design: Build robust APIs and integration layers that connect agent systems with enterprise financial platforms (Bloomberg Terminal, CapIQ, internal trading systems). Implement secure, scalable interfaces for human-agent collaboration.
Observability & Monitoring: Design comprehensive monitoring and logging systems for agent behavior, workflow execution, and system performance. Build dashboards that provide real-time visibility into agent decision-making processes and research progress.
Quality Assurance & Validation: Implement automated testing frameworks for agent behavior, including unit tests for individual agent functions and integration tests for complex multi-agent workflows. Build validation systems that ensure research quality and accuracy.
A future technical startup founder with deep expertise in AI systems
5+ years of backend systems experience with a proven track record of building production-scale distributed systems
Agent Framework Experience: Previous experience building multi-agent systems, autonomous workflows, or AI orchestration platforms. Familiarity with agent frameworks (LangChain, AutoGen, CrewAI) and agentic design patterns.
AI/ML Systems Expertise: Extensive experience building production AI/ML systems, including model serving, inference optimization, and AI workflow orchestration. Deep understanding of transformer architectures, prompt engineering, and LLM integration patterns.
Distributed Systems Architecture: Strong background in designing fault-tolerant distributed systems, message queues, event-driven architectures, and microservices. Experience with system design patterns for high-availability and horizontal scaling.
Advanced Backend Technologies: Expert-level proficiency in Python, Node.js, or Rust. Deep experience with databases (PostgreSQL, Redis), message brokers (Kafka, RabbitMQ), and cloud platforms (AWS, GCP). Experience with container orchestration (Kubernetes, Docker).
Real-Time Processing: Experience building streaming data pipelines, event-driven systems, and real-time analytics platforms. Understanding of data consistency patterns and eventual consistency in distributed systems.
Vector Databases & Embeddings: Hands-on experience with vector databases (Pinecone, Weaviate, Chroma), embedding models, and semantic search systems. Understanding of RAG patterns and knowledge retrieval architectures.
Performance Engineering: Deep experience optimizing system performance, including database query optimization, caching strategies, and distributed system bottleneck analysis. Experience with load testing and capacity planning.
DevOps & Production Excellence: Strong experience with CI/CD pipelines, infrastructure as code, monitoring systems (Datadog, Grafana), and production incident management.
Technical Leadership: Experience mentoring engineers, making architectural decisions, and driving technical strategy. Comfortable leading complex technical projects and communicating with stakeholders.
You’ll be directly mentored by engineers who built Palantir’s Forward Deployed Engineering organization. Expect:
Weekly 1:1s with senior engineers who deployed enterprise platforms at Fortune 500 companies.
Real-time coaching on client presentations and technical architecture decisions.
Clear growth path toward leading client relationships and technical strategy.
Learn by deploying: no theoretical exercises, just production systems at tier-one firms.
At Keru.ai, mentorship transforms strong engineers into exceptional client-facing technical leaders.
Backend: Python, Node.js, Rust, PostgreSQL, Redis
AI/ML: OpenAI GPT, Anthropic Claude, LangChain, Vector Databases
Infrastructure: AWS, Docker, Kubernetes, Kafka, Apache Airflow
Monitoring: Datadog, Sentry, Grafana, OpenTelemetry
Tools: Git, GitHub Actions, Terraform