The Role: Architecting the Cognitive Platform
We are seeking a foundational AI Systems Engineer to design, build, and own the core infrastructure that powers our entire company.
This is a unique, high-leverage role. You are not just supporting one product; you are building the single, unified platform that must serve two critical functions:
- Our SaaS Platform: A scalable, multi-tenant, and low-latency infrastructure that delivers our Causal AI models to paying customers via robust APIs/UI.
- Our AI Research Platform: A high-performance, flexible environment that empowers our researchers to conduct massive "self-play" simulations, run "hero training runs" on vast, multimodal datasets, and rapidly prototype new models.
Your work will be the backbone that connects our most advanced research with real-world, high-stakes industrial data.
What You Will Do
- Build the Core Cloud-Native Platform: Design, build, and manage our entire infrastructure from the ground up on Kubernetes (K8s), using Infrastructure as Code (Terraform, Pulumi) for everything.
- Engineer the SaaS Delivery Architecture: Implement the multi-tenant, secure, and highly-available service architecture for our customer-facing APIs. This includes API gateways, service mesh, observability, and logging.
- Create the MLOps/Research Engine: Build the internal AI/ML platform. This includes managing data versioning (DVC, Pachyderm), orchestrating on-demand GPU/TPU-heavy training workloads, and providing researchers with feature stores and a "self-service" environment for experimentation.
- Master Real-Time Data & Orchestration: Engineer the high-throughput, real-time data ingestion pipelines (e.g., Kafka, Pulsar, Spark Streaming) required to model "network cascades" and "perishable inventory" in sectors like aviation and logistics.
- Own Complex Dataflow (DAGs): Design, implement, and manage the complex dataflow orchestration (e.g., Airflow, Dagster, Prefect) that powers both our production ETL/ELT and our complex, multi-stage AI simulation and training loops.
- Champion CI/CD & GitOps: Own and enforce a rigorous CI/CD and GitOps-based discipline. You will be responsible for building the automated pipelines that enable our "relentless shipping" culture, allowing us to deploy to production safely and multiple times a day.
- Unify the Data Layer: Design and manage our central data lakehouse (e.g., Databricks, Snowflake) to act as the "single source of truth," serving real-time analytics for our SaaS platform and batch workloads for AI research.
Ideal Candidate Profile
- A "Full-Stack" Infrastructure Engineer: You are a systems-level thinker who is equally comfortable in the domains of cloud-native infrastructure (K8s, Networking), data engineering (Kafka, Spark), and MLOps (GPU workloads, orchestration).
- Deep Cloud-Native Expertise: You have multiple years of hands-on, in-production experience with Kubernetes, Terraform (or other IaC), and a major cloud provider (AWS/GCP/Azure).
- CI/CD & Automation Fanatic: You live and breathe automation. You have extensive experience building and maintaining robust CI/CD pipelines (e.g., GitLab CI, Jenkins, ArgoCD) and believe GitOps is the standard.
- SaaS & MLOps Fluency: You have ideally built platforms that serve both external B2B customers (with SLAs, security, and multi-tenancy) and internal R&D teams (with needs for flexibility, speed, and massive compute).
- A "Relentless Shipper" (Startup Mentality): You are a pragmatic, proactive builder who thrives in a fast-paced startup environment. You understand that "done" is better than "perfect" and are comfortable with tight release schedules and high ownership.
- Technical Polyglot: You possess deep expertise in Python and/or Go, shell scripting, and the modern data stack (SQL, orchestration tools, streaming platforms).