Aktiviere Job-Benachrichtigungen per E-Mail!

Senior Principal ML Engineer

Tradebyte Software GmbH

Berlin

Vor Ort

EUR 80.000 - 110.000

Vollzeit

Vor 13 Tagen

Erhöhe deine Chancen auf ein Interview

Erstelle einen auf die Position zugeschnittenen Lebenslauf, um deine Erfolgsquote zu erhöhen.

Zusammenfassung

Une entreprise de premier plan dans le e-commerce recherche un Senior Principal ML Engineer pour optimiser les infrastructures MLOps et innover avec des modèles de langage avancés. Ce poste stratégique à Berlin est essentiel pour améliorer la productivité des scientifiques et transformer des idées de recherche en solutions tangibles, tout en bénéficiant d'un environnement collaboratif qui valorise la diversité et l'innovation.

Leistungen

50% de réduction sur les produits Zalando
Équipement IT de qualité
Horaires de travail flexibles
Assistance à la relocalisation
Accès à une variété de programmes de santé et de bien-être

Qualifikationen

  • 6+ années d'expérience en MLOps sur AWS.
  • Expertise en optimisation des infrastructures cloud, notamment GPUs.
  • Capacité à sécuriser et automatiser les workflows ML.

Aufgaben

  • Architecturer des infrastructures MLOps pour maximiser la productivité.
  • Évaluer et intégrer des technologies pour le stack ML.
  • Implémenter des visualisations avancées pour les chercheurs.

Kenntnisse

Leadership en MLOps
Passion pour l'autonomisation des scientifiques
Optimisation HPC & GPU
Expérience en containerisation et orchestration
Infrastructure as Code (IaC) et automatisation
Communication & Collaboration
Résolution de problèmes

Tools

Docker
AWS
Terraform
Kubernetes
MLflow

Jobbeschreibung

Senior Principal ML Engineer page is loaded

Senior Principal ML Engineer
Apply locations Berlin time type Full time posted on Posted 2 Days Ago time left to apply End Date: June 30, 2025 (15 days left to apply) job requisition id 2720714
ABOUT THE TEAM

Zalando Research is at the forefront of driving innovation in fashion e-commerce. We are a dynamic and diverse group of scientists and ML engineers dedicated to solving complex challenges through cutting-edge machine learning and AI. Our work directly impacts the experience of millions of Zalando customers and empowers internal teams with state-of-the-art tools and capabilities. We foster a collaborative environment where ambitious research ideas are transformed into impactful solutions. As a Senior Principal MLOps Engineer, you will play a pivotal role in supercharging the productivity and innovation potential of our applied scientists by architecting and delivering world-class MLOps infrastructure. You will be the most senior engineer in this area. As we raise the bar in ML research, we are evolving our infrastructure to eliminate friction and empower our scientists to focus on pushing forward the science—by making MLOps seamless, reproducible, and future-proof.

WHERE YOUR EXPERTISE IS NEEDED
  • Ensure Persistent, Secure and Reproducible R&D Environments: Tackle bottlenecks and improve scalability, resilience, and cost-effectiveness in distributed training workloads across our research teams. Guarantee scientists can resume work across sessions and share exact research setups, enabling robust experiment tracking and ease of collaboration.

  • Curate the R&D ML Stack: Evaluate, select, and integrate the best-in-class technologies for our end-to-end R&D ML stack, ensuring our scientists have access to the most powerful tools all while hardening the security of our cloud setup.

  • Enable Advanced Visualization: Implement and manage streamlined setup processes for 3D GPU-backed remote desktops in the cloud with persistent storage and seamless RDP/VNC experiences, providing scientists with powerful interactive research environments backed by the latest GPUs.

  • Innovate with LLMs: Stay at the cutting edge of Large Language Model (LLM) advancements and spearhead their integration into the Applied Scientists' UX.

WHAT WE ARE LOOKING FOR
  • Proven MLOps Leadership: Extensive experience (6+ years) in designing, building, and maintaining scalable, reliable, and performant MLOps infrastructure, particularly on AWS with a strong focus on GPU-accelerated compute clusters.

  • Passion for Empowering Scientists: Always looking for ways to save users’ time, eliminate skill barriers, and amplify scientific impact.

  • HPC & GPU Optimization Expert: Deep understanding of HPC architectures, job scheduling, GPU utilization, and cost optimization strategies in a cloud environment.

  • Containerization, Orchestration & Technology Expert: Strong hands-on experience with Docker, EC2, AMI(s), EFS, Lustre, S3, JupyterHub, SQL, Superset, Databricks, SageMaker, Slurm, Ray, Kubeflow, Kubernetes (EKS), Nix, Devbox, and other containerization, environment isolation and orchestration technologies for ML workloads.

  • Infrastructure as Code (IaC) and automation first mindset: Proficiency with IaC tools like CloudFormation, CRDs, Terraform to automate infrastructure provisioning and management along with strong skills in CI/CD.

  • Champion of Reproducibility: A passion for building systems that ensure experimental reproducibility, environment consistency, and end-to-end automation of ML workflows. Experience with tools like MLflow, Weights & Biases, or similar for tracking, sharing, and deployment. You’re able to provide both ephemeral and persisted ML environments depending on the use case

  • Excellent Communicator & Collaborator: Ability to articulate complex technical concepts clearly to diverse audiences and work effectively with research scientists, engineers, heads, directors and product managers to understand their needs and drive solutions.

  • Able to understand ML-related scientific challenges and translate them into ergonomic, reliable MLOps solutions for diverse user groups.

  • Problem Solver & Strategic Thinker: A proactive approach to identifying pain points, devising innovative solutions, and thinking strategically about the long-term evolution of the MLOps landscape at Zalando Research.

PERKS AT WORK

Culture of trust, empowerment and constructive feedback, open source commitment, meetups, game nights, 70+ internal technical and fun guilds, knowledge sharing through tech talks, internal tech academy and blogs , product demos, parties & events.

Competitive salary, employee share shop, 40% Zalando shopping discount, discounts from external partners, centrally located offices, public transport discounts, municipality services, great IT equipment, flexible working times, additional holidays and volunteering time off, free beverages and fruits, diverse sports and health offerings.

Extensive onboarding, mentoring and personal development opportunities and an international team of experts.

Relocation assistance for internationals, PME family service and parent & child rooms* (*available in selected locations)

We celebrate diversity and are committed to building teams that represent a variety of backgrounds, perspectives and skills. All employment is decided on the basis of qualifications, merit and business need.

Similar Jobs (4)
Senior Principal Engineer (all genders) - Partner Tech
locations Berlin time type Full time posted on Posted 4 Days Ago
Senior Principal Engineer - Echo (all genders)
locations Berlin time type Full time posted on Posted 30+ Days Ago
Senior Principal Data Engineer
locations 2 Locations time type Full time posted on Posted 30+ Days Ago

It’s the perfect time to join Zalando on our journey, from being a pioneer in the world of e-commerce, to the Starting Point for Fashion in Europe. We connect customers, brands, and partners across 25 markets.

Help us drive digital and sustainable solutions for fashion, logistics, advertising and research, bringing head-to-toe fashion to 50 million active customers through a team of diverse skill-sets, cultural backgrounds, and interests.

Hol dir deinen kostenlosen, vertraulichen Lebenslauf-Check.
eine PDF-, DOC-, DOCX-, ODT- oder PAGES-Datei bis zu 5 MB per Drag & Drop ablegen.