Enable job alerts via email!

Member of Technical Staff - Data Infrastructure Engineer (DevOps|SRE|Platform Engineering|MLOps

Microsoft

New York (NY)

On-site

USD 119,000 - 235,000

Full time

6 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Microsoft seeks a passionate Data Infrastructure Engineer to build mission-critical AI platform components for data pipelines and human-AI interactions. The role requires technical proficiency in distributed systems, DevOps practices, and a collaborative spirit to enhance productivity across various teams.

Qualifications

  • 4+ years experience in data infrastructure, DevOps, SRE or MLOps.
  • 3+ years managing and scaling distributed systems.
  • Experience with Kubernetes and containerized applications.

Responsibilities

  • Design, build, and maintain scalable data infrastructure.
  • Implement DevOps and SRE best practices within data workflows.
  • Collaborate with teams to deliver robust CI/CD pipelines.

Skills

Automation
Observability
Distributed Systems
Scripting
Collaboration
Problem Solving

Education

Bachelor’s degree in Computer Science
Master's Degree in Computer Science

Tools

Kubernetes
Helm
Terraform
Bicep
Python
Bash
PowerShell
Azure
AWS
GCP

Job description

As Microsoft continues to lead the frontier of artificial intelligence, we are seeking passionate and driven engineers to solve some of the most challenging and impactful AI problems of our time. Our vision is bold: to build intelligent systems across agents, applications, services, and infrastructure — and to make this intelligence universally accessible for consumers, businesses, and developers alike.

Microsoft AI (MAI) is looking for an experienced Data Infrastructure Engineer to join the team behind personal AI and Copilot systems. We are building mission-critical platform components that drive data pipelines, enable seamless human-AI interactions, and power the evolution of intelligent systems. This role blends platform engineering, DevOps/SRE practices, and MLOps to support large-scale data workflows and AI model development.

You’ll bring technical depth, a passion for automation and observability, fluency in distributed systems, and the creativity to architect solutions that scale. Just as importantly, you’ll bring empathy, a collaborative spirit, and a growth mindset to support a world-class engineering culture.

_This position is based in New York, NY or Redmond, WA, with an in-office requirement of 3 days per week._

**Responsibilities**

+ Design, build, and maintain scalable, reliable, and observable data and ML infrastructure that powers mission-critical AI applications.

+ Implement DevOps and SRE best practices, including automated deployments, service monitoring, and incident response.

+ Develop self-service tooling and workflows that streamline developer and researcher productivity.

+ Create robust CI/CD pipelines and automate infrastructure provisioning using Infrastructure as Code (Bicep, Terraform, ARM).

+ Collaborate closely with AI researchers, platform engineers, and application developers to deliver seamless and secure data workflows.

+ Participate in technical design reviews and contribute to maintaining a clean, secure, and well-documented codebase.

+ Proactively identify and resolve bottlenecks and inefficiencies in data pipelines and infrastructure.

+ Embody and promote Microsoft’s culture and values of respect, integrity, accountability, and inclusion.

**Qualifications**

**Required Qualifications:**

+ Bachelor’s degree in Computer Science, Mathematics, or a related field AND 4+ years experience in a data infrastructure, DevOps, SRE, or MLOps role supporting high-volume, low-latency data systems

+ OR Master's Degree in Computer Science, Mathematics, or related field AND 3+ year(s) experience in data infrastructure, DevOps, SRE, or MLOps role supporting high-volume, low-latency data systems

+ OR equivalent experience.

+ 3+ years experience managing and scaling distributed systems, from bare-metal to Kubernetes, including deep knowledge across the full stack (UI, middleware, platform services)

+ 2+ years building and deploying containerized applications with Kubernetes and Helm/Kustomize.

+ Proficiency in scripting and automation using languages such as Python, Bash, or PowerShell with Proven experience in automating operational tasks, including health checks, alerting, and observability for data and ML systems.

+ Demonstrated success in troubleshooting and supporting critical production systems with managing CI/CD pipelines and release automation.

**Preferred Qualifications:**

+ Experience with Azure, AWS, or GCP and cloud-native data infrastructure.

+ Hands-on experience with modern data storage and processing technologies, including relational and NoSQL databases, key-value stores, Spark compute engines, distributed file systems such as HDFS and ADLS Gen2, as well as messaging systems like Event Hub, Kafka, and RabbitMQ.

+ Collaboration experience with Data Engineer, Data Scientists, ML Engineers, Networking, and Security teams.

+ Familiarity with modern web stacks: Typescript, Node.js, React, PHP (a plus).

+ Understanding of MLOps principles: model training pipelines, artifact versioning, and experiment tracking.

+ Familiarity with agentic workflows, deep learning, or AI frameworks is an advantage.

+ Practical experience using LLMs (e.g., GPT-based models) in daily workflows — such as automating documentation, code generation, code review, or operational intelligence.

+ Demonstrated understanding of prompt engineering techniques to effectively design, optimize, and evaluate interactions with large language models (LLMs).

+ Ability to resolve complex performance and scalability issues across services and infrastructure layers.

+ Interpersonal and communication skills, with a passion for continuous learning and mentorship.

+ Experience applying LLMs to accelerate DevOps tasks, enhance incident response, or streamline cross-functional collaboration is a strong plus.

Data Engineering IC4 - The typical base pay range for this role across the U.S. is USD $119,800 - $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $158,400 - $258,000 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

Microsoft will accept applications for the role until June 9, 2025.

\#MicrosoftAI #Copilot

Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations (https://careers.microsoft.com/v2/global/en/accessibility.html) .

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Member of Technical Staff - Data Infrastructure Engineer (DevOps|SRE|Platform Engineering|MLOps

Microsoft

New York

On-site

USD 158,000 - 258,000

2 days ago
Be an early applicant