Overview
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior AI Inference Engineer in Latin America.
In this role, you will lead the design and deployment of advanced AI inference systems for high-profile clients in Media, Entertainment, Gaming, and Sports. You will be responsible for translating complex, ambiguous business problems into robust, real-time AI architectures capable of interpreting and reasoning about video and multi-modal content. Working across the full project lifecycle—from early discovery and pre-sales to architecture, implementation, and optimization—you will partner with technical teams and clients to deliver scalable, high-performance solutions on modern GPU and cloud infrastructure. This position requires hands-on expertise, innovation, and the ability to communicate complex technical concepts clearly to diverse stakeholders.
Accountabilities
- Architect, implement, and optimize end-to-end AI inference services and agentic pipelines using Python.
- Design autonomous AI agents that can interpret, reason about, and act on video and multi-modal inputs.
- Integrate Vision Language Models (e.g., GPT-4o, Gemini Pro Vision, LLaVA) into production-grade workflows.
- Utilize LLM/agent orchestration frameworks (LangGraph, AutoGen, Semantic Kernel, etc.) to manage complex visual AI tasks.
- Deploy and operate AI services on Kubernetes or similar platforms, ensuring reliability and scalability under heavy workloads.
- Architect distributed systems on AWS, balancing performance, cost, and resilience.
- Optimize workloads for modern NVIDIA GPU architectures (Ampere, Hopper, Blackwell) focusing on real-time, high-throughput media applications.
- Produce clear architecture diagrams and technical documentation for both technical and non-technical audiences.
- Provide technical leadership and guidance to project teams to ensure fidelity to architectural designs and solution goals.
- (Optional) Work with video tooling such as FFmpeg, GStreamer, NVENC/NVDEC, and modern codecs, or deploy AI to edge/hybrid environments.
Requirements
- Extensive professional experience designing and shipping AI/ML systems in production, with strong Python expertise.
- Proven track record of taking AI/ML models from prototype to robust, low-latency inference services.
- Hands‑on experience building agentic systems, especially with computer vision or multi‑modal inputs.
- Familiarity with Vision Language Model integration and orchestration frameworks for multi‑modal tasks.
- Strong practical experience with Kubernetes and cloud‑native distributed architectures (AWS preferred).
- Knowledge of modern NVIDIA GPU architectures and optimization techniques.
- Product‑oriented mindset: able to align technical solutions with business outcomes and ROI.
- Excellent communication skills for collaborating with technical teams, clients, and C‑level stakeholders.
- Self‑starter, able to work independently in ambiguous or rapidly evolving environments.
- Nice‑to‑have: experience with FFmpeg, GStreamer, NVENC/NVDEC, OpenShift, NVIDIA Holoscan, Mojo, or AI deployment on edge/hybrid/on‑prem environments.
Benefits
- Competitive compensation package.
- Fully remote work within North or South America.
- Exposure to high‑impact projects with leading global clients in Media, Entertainment, Gaming, and Sports.
- Opportunity to work with cutting‑edge AI technologies and modern GPU/cloud infrastructure.
- Professional growth through complex, real‑world problem solving.
- Inclusive and diverse work environment.