Job Search and Career Advice Platform

¡Activa las notificaciones laborales por email!

Senior AI Inference Engineer

Jobgether

Ciudad de México

A distancia

MXN 1,467,000 - 2,202,000

Jornada completa

Hoy
Sé de los primeros/as/es en solicitar esta vacante

Genera un currículum adaptado en cuestión de minutos

Consigue la entrevista y gana más. Más información

Descripción de la vacante

A tech recruitment agency is seeking a Senior AI Inference Engineer to lead AI inference system design for major clients in Media, Entertainment, and Sports. Candidates must have strong expertise in Python, AI/ML systems, and experience with Kubernetes. This fully remote role offers a competitive compensation package and the chance to work on high-impact projects with cutting-edge technologies.

Servicios

Competitive compensation package
Fully remote work
Exposure to high-impact projects
Professional growth opportunities
Inclusive work environment

Formación

  • Extensive professional experience designing and shipping AI/ML systems in production.
  • Proven track record of taking AI/ML models from prototype to robust services.
  • Hands-on experience with computer vision or multi-modal inputs.

Responsabilidades

  • Architect and implement AI inference services using Python.
  • Design autonomous AI agents for multi-modal inputs.
  • Deploy AI services on Kubernetes ensuring reliability.

Conocimientos

Python expertise
AI/ML systems design
Kubernetes experience
Computer vision
Multi-modal inputs
NVIDIA GPU optimization
Vision Language Models
Cloud-native architectures

Herramientas

FFmpeg
GStreamer
AWS
Descripción del empleo
Overview

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior AI Inference Engineer in Latin America.

In this role, you will lead the design and deployment of advanced AI inference systems for high-profile clients in Media, Entertainment, Gaming, and Sports. You will be responsible for translating complex, ambiguous business problems into robust, real-time AI architectures capable of interpreting and reasoning about video and multi-modal content. Working across the full project lifecycle—from early discovery and pre-sales to architecture, implementation, and optimization—you will partner with technical teams and clients to deliver scalable, high-performance solutions on modern GPU and cloud infrastructure. This position requires hands-on expertise, innovation, and the ability to communicate complex technical concepts clearly to diverse stakeholders.

Accountabilities
  • Architect, implement, and optimize end-to-end AI inference services and agentic pipelines using Python.
  • Design autonomous AI agents that can interpret, reason about, and act on video and multi-modal inputs.
  • Integrate Vision Language Models (e.g., GPT-4o, Gemini Pro Vision, LLaVA) into production-grade workflows.
  • Utilize LLM/agent orchestration frameworks (LangGraph, AutoGen, Semantic Kernel, etc.) to manage complex visual AI tasks.
  • Deploy and operate AI services on Kubernetes or similar platforms, ensuring reliability and scalability under heavy workloads.
  • Architect distributed systems on AWS, balancing performance, cost, and resilience.
  • Optimize workloads for modern NVIDIA GPU architectures (Ampere, Hopper, Blackwell) focusing on real-time, high-throughput media applications.
  • Produce clear architecture diagrams and technical documentation for both technical and non-technical audiences.
  • Provide technical leadership and guidance to project teams to ensure fidelity to architectural designs and solution goals.
  • (Optional) Work with video tooling such as FFmpeg, GStreamer, NVENC/NVDEC, and modern codecs, or deploy AI to edge/hybrid environments.
Requirements
  • Extensive professional experience designing and shipping AI/ML systems in production, with strong Python expertise.
  • Proven track record of taking AI/ML models from prototype to robust, low-latency inference services.
  • Hands‑on experience building agentic systems, especially with computer vision or multi‑modal inputs.
  • Familiarity with Vision Language Model integration and orchestration frameworks for multi‑modal tasks.
  • Strong practical experience with Kubernetes and cloud‑native distributed architectures (AWS preferred).
  • Knowledge of modern NVIDIA GPU architectures and optimization techniques.
  • Product‑oriented mindset: able to align technical solutions with business outcomes and ROI.
  • Excellent communication skills for collaborating with technical teams, clients, and C‑level stakeholders.
  • Self‑starter, able to work independently in ambiguous or rapidly evolving environments.
  • Nice‑to‑have: experience with FFmpeg, GStreamer, NVENC/NVDEC, OpenShift, NVIDIA Holoscan, Mojo, or AI deployment on edge/hybrid/on‑prem environments.
Benefits
  • Competitive compensation package.
  • Fully remote work within North or South America.
  • Exposure to high‑impact projects with leading global clients in Media, Entertainment, Gaming, and Sports.
  • Opportunity to work with cutting‑edge AI technologies and modern GPU/cloud infrastructure.
  • Professional growth through complex, real‑world problem solving.
  • Inclusive and diverse work environment.
Consigue la evaluación confidencial y gratuita de tu currículum.
o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.