Ativa os alertas de emprego por e-mail!

MLOps Engineer (LLM Serving and Infrastructure)

CloudWalk

São Paulo

Presencial

BRL 120.000 - 180.000

Tempo integral

Há 30+ dias

Melhora as tuas possibilidades de ir a entrevistas

Cria um currículo adaptado à oferta de emprego para teres uma taxa de sucesso superior.

Resumo da oferta

Join a pioneering team at an innovative firm as a MLOps Engineer, where you will operationalize cutting-edge AI technologies. Your role will focus on deploying and managing Large Language Models (LLMs) using Kubernetes and Terraform, optimizing computing infrastructure, and collaborating with R&D to transition concepts into scalable systems. This is an exciting opportunity to work with advanced technologies and contribute to the future of AI. If you are passionate about MLOps and eager to make a significant impact, this position is perfect for you. Dare to innovate and join a team that values creativity and excellence.

Qualificações

  • Solid experience with DevOps and deploying machine learning models.
  • Expertise in network optimization and parallel computing is crucial.

Responsabilidades

  • Deploy and manage LLMs using Kubernetes and Terraform.
  • Optimize computing infrastructure for enhanced GPU utilization.

Conhecimentos

MLOps
Kubernetes
Terraform
Cloud Computing
GPU Utilization
Parallel Computing
Network Optimization
CI/CD Pipelines
Git
Bash Scripting

Formação académica

Bachelor's Degree in Computer Science or related field

Ferramentas

Hugging Face's Accelerate
PyTorch

Descrição da oferta de emprego

Join the CloudWalk Wolfpack as a MLOps Engineer.

Your Mission:

At CloudWalk, we're at the cutting edge of AI, pioneering the use of Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) to drive innovation. As a MLOps Engineer, you will play a critical role in operationalizing the visionary work of our LLM Data Scientists. Your expertise will ensure the smooth deployment, efficient management, and scalable performance of LLMs across our extensive infrastructure. Your contributions will turn advanced AI research into scalable, high-performance solutions, with a particular focus on optimizing network communication and parallel processing capabilities.

What You’ll Do:

  1. Deploy and Manage LLMs: Employ Kubernetes, Terraform, and cloud services to deploy and scale LLMs efficiently, ensuring their adaptability to high-demand scenarios.
  2. Optimize Computing Infrastructure: Focus on enhancing GPU utilization, distributed training, bandwidth efficiency between machines, and VPC connections to maximize system performance.
  3. Leverage Cutting-Edge Technologies: Utilize libraries such as Hugging Face's Accelerate and PyTorch's torchrun to facilitate parallel training across multiple machines in a cluster, optimizing our AI models' training and inference processes.
  4. Collaborate on Innovation: Partner with our R&D team to transition LLM and RAG technologies from conceptual stages to scalable, production-ready systems.
  5. Monitor and Improve System Performance: Implement advanced monitoring and logging practices to ensure system reliability and performance, continuously seeking improvements.
  6. Stay Updated on Industry Advances: Actively pursue the latest developments in MLOps, cloud computing, and AI technologies to implement innovative solutions and maintain our infrastructure's leading edge.

Technologies You Will Work With:

  1. Kubernetes, Terraform, and cloud computing platforms for scalable AI model deployment.
  2. CI/CD pipelines, Git for version control, and Bash scripting for operational efficiency.
  3. Hugging Face's Accelerate and PyTorch's torchrun for parallel training and optimization across multiple machines.
  4. A comprehensive understanding of network infrastructure to optimize bandwidth and secure VPC connections is essential.

What We Expect From You:

  1. Technical Mastery: Solid experience with DevOps, cloud infrastructure, and deploying machine learning models. Expertise in network optimization and parallel computing is crucial.
  2. Problem-Solving Mindset: The ability to navigate complex challenges, strategically manage resources, and improve system efficiency.
  3. Collaborative Approach: Strong communication skills and the ability to contribute effectively within a dynamic, interdisciplinary team.
  4. Lifelong Learner: A commitment to continuous learning, staying abreast of the latest technological advancements, and applying innovative solutions.

Why CloudWalk?

By joining CloudWalk, you become part of a team that's reshaping the future with technological innovations. We cherish creativity, teamwork, and a dedication to excellence. Here, your work contributes to a mission of driving forward technological advancements.

Dare to innovate, dare to impact, dare to join the Wolfpack. Apply now!

Obtém a tua avaliação gratuita e confidencial do currículo.
ou arrasta um ficheiro em formato PDF, DOC, DOCX, ODT ou PAGES até 5 MB.

Ofertas semelhantes

arquiteto - home office

Tecmaster

São Paulo

Teletrabalho

BRL 150,000 - 220,000

Ontem
Torna-te num dos primeiros candidatos

Associate Director, Engineering

Deel

São Paulo

Teletrabalho

USD 90,000 - 150,000

Há 4 dias
Torna-te num dos primeiros candidatos

Azure devops with ansible developer

NetVagas

São Paulo

Teletrabalho

BRL 80,000 - 140,000

Há 5 dias
Torna-te num dos primeiros candidatos

Senior Site Reliability Engineer (SRE)

Avra

São Paulo

Teletrabalho

BRL 80,000 - 130,000

Há 13 dias

Software Architect - Containers / Virtualisation

Canonical

São Paulo

Teletrabalho

USD 90,000 - 150,000

Há 8 dias

Igma | Tech Lead

Igma

São Paulo

Teletrabalho

BRL 120,000 - 180,000

Há 9 dias

Senior DevOps Tech Lead | REF#283528

BairesDev

São Paulo

Teletrabalho

USD 80,000 - 150,000

Há 13 dias

Snr. Cloud Security Engineer (Remote) (Position located in Brazil)

KnowBe4

São Paulo

Teletrabalho

BRL 120,000 - 180,000

Há 30+ dias

Senior Security Engineer | Cloud (Remote Work)

Loft

São Paulo

Teletrabalho

BRL 120,000 - 180,000

Há 17 dias