Enable job alerts via email!

AI Engineer - Infrastructure

Traversal

New York (NY)

On-site

USD 150,000 - 300,000

Full time

15 days ago

Job summary

A leading AI infrastructure company in New York is seeking an AI Infrastructure Engineer to design and maintain scalable AI-powered backend systems. The ideal candidate has over 3 years of experience in distributed systems, strong debugging skills, and proficiency in cloud infrastructure. This full-time role offers a competitive salary range of $150,000–$300,000, along with equity and benefits. Join a collaborative team in a dynamic work environment.

Benefits

Health insurance
Startup equity
Flexible time off
In-office snacks

Qualifications

  • 3+ years of experience at fast-paced companies with a high bar for technical excellence.
  • Direct experience building distributed systems using tools like Kafka, Postgres, S3, or similar.
  • Proven ability to own projects end-to-end, from system design to implementation and long-term maintenance.

Responsibilities

  • Lead the design of scalable, reliable infrastructure systems that support observability and AI agents.
  • Own and manage cloud infrastructure (AWS, GCP, or similar), with experience in hybrid/on-prem deployments.
  • Build and manage observability tools to ensure rapid incident response and issue resolution.

Skills

Distributed systems design
Cloud infrastructure management
Strong debugging skills
CI/CD pipelines
Experience with Rust

Tools

Terraform
Kafka
Postgres
S3
Job description

Traversal is the AI Site Reliability Engineer (SRE) for the enterprise—trusted by some of the world’s largest companies to troubleshoot, remediate, and prevent complex production incidents. Our mission is to free engineers from endless firefighting and enable them to focus on creative, high-impact work.

Our roots are deeply embedded in AI research. We’re building the premier AI agent lab for the enterprise, assembling a talented team from MIT, Harvard, Berkeley, and top industry players to tackle one of the hardest problems for AI to solve. Without the entire team, none of this would be possible.

The Role

As an AI Infrastructure Engineer, you’ll play a key technical role in our AI-powered infrastructure team, designing, scaling, and maintaining the backend systems that power our platform. You’ll be responsible for both cloud and on-prem deployments, ensuring our systems are performant, resilient, and ready for large-scale AI operations. You will collaborate with engineers across the organization, providing expertise in building robust infrastructure that supports our AI agents and observability tools. This requires hands-on experience with cloud infrastructure, distributed systems, and AI model integration in production environments.

Responsibilities
  • System Design & Architecture: Lead the design of scalable, reliable infrastructure systems that support observability and AI agents.
  • Cloud Infrastructure Management: Own and manage cloud infrastructure (AWS, GCP, or similar), with experience in hybrid/on-prem deployments for AI-driven observability.
  • Data Pipeline Caching: Collaborate with data engineers to implement caching mechanisms that accelerate data retrieval and processing.
  • Automation & CI/CD: Build and maintain CI/CD pipelines and infrastructure-as-code to ensure high availability, performance, and security.
  • Scalability & Performance: Design systems that scale with platform growth and customer demand, ensuring high performance across cloud and on-prem environments.
  • Collaboration: Work closely with backend, frontend, AI, and product teams to ensure infrastructure supports seamless integrations and high-performance experiences.
  • Incident Management: Build and manage observability tools to ensure system resilience, rapid incident response, and issue resolution. We should be able to use Traversal on Traversal!
  • Mentorship: Provide technical leadership and mentorship to junior engineers, fostering a collaborative, learning-focused culture.
Requirements
  • 3+ years of experience at fast-paced companies with a high bar for technical excellence.
  • Direct experience building distributed systems using tools like Kafka, Postgres, S3, or similar.
  • Proven ability to own projects end-to-end, from system design to implementation and long-term maintenance.
  • Strong debugging skills across cloud infrastructure and networking layers.
  • Familiarity with production systems: instrumentation, provisioning, bug fixes, and reliability improvements.
  • Experience with Rust.

Nice to Have

  • Experience making complex software systems observable using logs, metrics, and traces.
  • Familiarity with Python-based ecosystems is a plus.
  • Background in large-scale, complex, data-driven applications.
  • Experience provisioning and managing infrastructure using Terraform, Pulumi, or other IaC tools.
  • Familiarity with AI or LLM-powered products.
Compensation

We offer competitive compensation, startup equity, health insurance, and additional benefits. The U.S. base salary range for this full-time, in-person role in New York is $150,000–$300,000, plus equity and benefits. Our salary ranges are based on location, level, and role. Individual compensation is determined by experience, skills, and job-related knowledge.

Why You Should Join Us

We’ll make sure you’re fully supported with health insurance, a great tech setup, flexible time off, and plenty of in-office snacks. We offer competitive salary and equity packages, and take thoughtful consideration with every hire on our small, high-impact team.

Traversal is fully in-office, 5 days a week, based in New York near Madison Square Park. We have a collaborative, hard-working culture and are energized by building the future of AI-powered software maintenance.

Working here means owning meaningful parts of the product, having the flexibility to move fast, and learning constantly. This is a place to grow your career, make a real impact, and help define a new category of infrastructure software.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.