Enable job alerts via email!

ML Data Infrastructure Engineer

iitjobs

United States

Remote

USD 90,000 - 150,000

Full time

2 days ago

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is seeking a skilled data engineer to design and implement scalable data processing pipelines for machine learning. This role involves building and maintaining feature stores, developing data quality monitoring frameworks, and creating systems for dataset versioning and lineage tracking. The ideal candidate will have extensive experience with GCP's data infrastructure and be proficient in Python and SQL. Join a dynamic team where your contributions will significantly impact data-driven projects and innovations in machine learning.

Qualifications

7+ years of software engineering experience with a focus on data infrastructure.
Expertise in GCP's data and ML infrastructure including BigQuery and Dataflow.

Responsibilities

Design and implement scalable data processing pipelines for ML training.
Build and maintain feature stores for batch and real-time features.

Skills

Software Engineering

Data Infrastructure

GCP (Google Cloud Platform)

Python

SQL

Data Processing Frameworks (Spark, Beam, Flink)

Data Quality Monitoring

Data Pipeline Orchestration (Airflow, Dagster)

Tools

BigQuery

Dataflow

Cloud Storage

Vertex AI Feature Store

Cloud Composer

Dataproc

Kafka

Kinesis

Design and implement scalable data processing pipelines for ML training and validation
Build and maintain feature stores with support for both batch and real-time features
Develop data quality monitoring, validation, and testing frameworks
Create systems for dataset versioning, lineage tracking, and reproducibility
Implement automated data documentation and discovery tools
Design efficient data storage and access patterns for ML workloads
Partner with data scientists to optimize data preparation workflows

Technical Requirements:

7+ years of software engineering experience, with 3+ years in data infrastructure
Strong expertise in GCP's data and ML infrastructure:
- BigQuery for data warehousing
- Dataflow for data processing
- Cloud Storage for data lakes
- Vertex AI Feature Store
- Cloud Composer (managed Airflow)
- Dataproc for Spark workloads
Deep expertise in data processing frameworks (Spark, Beam, Flink)
Experience with feature stores (Feast, Tecton) and data versioning tools
Proficiency in Python and SQL
Experience with data quality and testing frameworks
Knowledge of data pipeline orchestration (Airflow, Dagster)

Nice to Have:

Experience with streaming systems (Kafka, Kinesis)
Experience with GCP-specific security and IAM best practices
Knowledge of Cloud Logging and Cloud Monitoring for data pipelines
Familiarity with Cloud Build and Cloud Deploy for CI/CD
Experience with streaming systems (Pub/Sub, Dataflow)
Knowledge of ML metadata management systems
Familiarity with data governance and security requirements
Experience with dbt or similar data transformation tools

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Software Engineer - Infrastructure Observability Monitoring (Remote USA)

Splunk

Washington

Remote

USD 131,000 - 182,000

5 days ago

Be an early applicant

Software Engineer - Infrastructure Observability Monitoring (Remote USA)

SPLUNK SERVICES UK LIMITED

USD 100,000 - 160,000

30+ days ago

Senior AI Infrastructure Engineer - DGX Cloud

NVIDIA

Remote

USD 144,000 - 271,000

12 days ago

ML Data Infrastructure Engineer

iitjobs

United States

Remote

USD 90,000 - 150,000

Full time

Job summary

Qualifications

Responsibilities

Skills

Tools

Job description

Similar jobs

Software Engineer - Infrastructure Observability Monitoring (Remote USA)

Washington

Remote

USD 131,000 - 182,000

Software Engineer - Infrastructure Observability Monitoring (Remote USA)

Indiana

Remote

USD 117,000 - 162,000

Senior Azure Cloud Infrastructure Engineer (Open to remote)

New Philadelphia

Remote

USD 115,000 - 140,000

Senior Azure Cloud Infrastructure Engineer (Open to remote)

New York

Remote

USD 115,000 - 140,000

Senior Azure Cloud Infrastructure Engineer (Open to remote)

New York

Remote

USD 115,000 - 140,000

ML Infrastructure Engineer (Staff / Principal)

New York

On-site

USD 120,000 - 200,000

Senior Azure Cloud Infrastructure Engineer (Open to remote)

City of Albany

Remote

USD 115,000 - 140,000

Lead Machine Learning Infrastructure Engineer - Infrastructure & Data

Remote

USD 100,000 - 160,000

Senior AI Infrastructure Engineer - DGX Cloud

Remote

USD 144,000 - 271,000