Enable job alerts via email!

Lead Machine Learning Infrastructure Engineer - Infrastructure & Data

Tbwa Chiat/Day Inc

United States

Remote

USD 100,000 - 160,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative firm is seeking a Lead Machine Learning Infrastructure Engineer to design and maintain scalable systems for machine learning initiatives. In this pivotal role, you'll collaborate with cross-functional teams to build state-of-the-art platforms that enhance the development and deployment of machine learning models. Join a remote-first organization that values trust, innovation, and excellence, while providing exceptional benefits like comprehensive medical coverage, unlimited paid time off, and a 401(k) plan. If you're passionate about shaping the future of work and thrive in a dynamic environment, this opportunity is perfect for you.

Benefits

Comprehensive medical insurance
Unlimited paid time off
401(k) plan with matching contributions
12 weeks of paid parental leave
Employee Stock Purchase Plan

Qualifications

  • Strong expertise in scalable ML infrastructure and distributed systems.
  • Experience with cloud-based ML platforms and programming languages.

Responsibilities

  • Design and optimize distributed systems for large-scale ML workflows.
  • Develop frameworks and tools for the ML lifecycle from data to deployment.

Skills

Machine Learning Infrastructure Design
Distributed Systems
Cloud-based ML Platforms
Python
Java
Scala
Problem-solving
Collaboration

Job description

Remote

Upwork ($UPWK) is the world’s work marketplace. We serve everyone from one-person startups to large, Fortune 100 enterprises with a powerful, trust-driven platform that enables companies and talent to work together in new ways that unlock their potential.

Last year, more than $3.8 billion of work was done through Upwork by skilled professionals who are gaining more control by finding work they are passionate about and innovating their careers.

The Machine Learning Infrastructure & Data team is responsible for architecting and building the foundational systems and tools that enable efficient development, deployment, and management of machine learning models at scale.

As a Lead Machine Learning Infrastructure Engineer, you will be pivotal in designing, developing, and maintaining robust and scalable infrastructure components to support Upwork’s machine learning initiatives. You will work closely with cross-functional teams—including machine learning researchers, data scientists, and software engineers—to build state-of-the-art platforms and tools that accelerate the development and deployment of machine learning models.

Responsibilities:
  • Design, implement, and optimize distributed systems and infrastructure components to support large-scale machine learning workflows, including data ingestion, feature engineering, model training, and serving.
  • Develop and maintain frameworks, libraries, and tools that streamline the end-to-end machine learning lifecycle, from data preparation and experimentation to model deployment and monitoring.
  • Architect and implement highly available, fault-tolerant, and secure systems that meet the performance and scalability requirements of production machine learning workloads.
  • Collaborate with machine learning researchers and data scientists to understand their requirements and translate them into scalable and efficient software solutions.
  • Stay current with advancements in machine learning infrastructure, distributed computing, and cloud technologies, integrating them into our platform to drive innovation.
  • Mentor junior engineers, conduct code reviews, and uphold engineering best practices to ensure the delivery of high-quality software solutions.
What it takes to catch our eye:
  • Strong technical expertise in designing and building scalable ML infrastructure.
  • Experience with distributed systems and cloud-based ML platforms.
  • Proficiency in programming languages such as Python, Java, or Scala.
  • Deep understanding of ML workflows, including data pipelines, model training, and deployment.
  • Passion for innovation and eagerness to implement the latest advancements in ML infrastructure.
  • Strong problem-solving skills and ability to optimize complex systems for performance and reliability.
  • Collaborative mindset with excellent communication skills to work across teams.
  • Ability to thrive in a fast-paced, dynamic environment with evolving technical challenges.

Come change how the world works.

At Upwork, you’ll shape talent solutions for how the world works today. We are a remote-first organization working together to create exciting remote work opportunities for a global community of professionals. While we have a physical office in Palo Alto, we currently hire full-time employees in 21 states in the United States.

At the core of our vibrant culture are shared values that form the foundation of our organization. These values revolve around trust, risk-taking, customer focus, and excellence. Our overarching mission is to create economic opportunities so that people have better lives. We foster an environment where individuals are encouraged to bring their authentic selves to work, nurturing personal and professional growth through development opportunities, mentorship programs, and participation in Upwork Belonging Communities.

We take pride in providing exceptional benefits to our employees. These include comprehensive medical insurance coverage for both you and your family, unlimited paid time off, a 401(k) plan with matching contributions, 12 weeks of paid parental leave, and an Employee Stock Purchase Plan.

Upwork is proudly committed to recruiting and retaining a diverse and inclusive workforce. As an Equal Opportunity Employer, we never discriminate based on race, religion, color, national origin, gender (including pregnancy, childbirth, or related medical condition), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics.

Additionally, a criminal background check may be run on a candidate after a conditional offer of employment is made. Qualified applicants with arrest or conviction records will be considered in accordance with applicable law, including the California Fair Chance Act and local Fair Chance ordinances.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Lead ML Infra Engineer, ML Infra & Data

Upwork

Remote

USD 90,000 - 150,000

30+ days ago

(USA) Principal, Systems and Infrastructure Engineer, Data Security

Walmart

Arkansas

On-site

USD 90,000 - 180,000

30+ days ago