Enable job alerts via email!
A global software company is looking for a Senior/Principal Software Engineer specializing in data engineering and machine learning. You will design core data systems and work on high-impact data solutions. The ideal candidate has over 5 years of experience, proficiency in programming (preferably Python), and a strong background in cloud platforms and backend development. This opportunity involves working closely with ML engineers and product teams to create scalable solutions.
Position Overview
Join us to design the core data systems powering both traditional machine learning and cutting-edge generative AI/LLM workflows. As a Senior/Principal Software Engineer, you’ll specialize in one of two tracks:
Data & Feature Store Infrastructure: Build scalable backend systems for data ingestion, batch/streaming ETL pipelines, feature stores, vector-enabled APIs, and data compliance
Labeling & Human Feedback Systems: Design multimodal annotation platforms (text, image, audio, video, 3D), develop RLHF workflows (instruction tuning, output ranking), and drive LLM-assisted labeling innovations
You’ll work closely with ML engineers, MLOps, and product teams to deliver high-impact data and labeling solutions at scale. Reporting to the Head of AI & ML Platform, you’ll turn AI research into production-ready features that create real customer value.
Minimum Qualifications
5+ years of experience in data engineering, ML platform, or backend development roles
Proficiency in at least one modern programming language (Python preferred)
Experience developing and operating distributed backend APIs and SDKs
Experience working with cloud platforms (AWS, GCP, or Azure), containers (Docker/Kubernetes), and infrastructure-as-code tools (e.g., Terraform)
Plus, one of the following specialization experiences:
Feature Store Track: (At least have experience with two of the following)
Hands-on experience with feature store frameworks (e.g., SageMaker Feature Store, Feast, Tecton, Hopsworks), or operating vector database systems for serving LLM use cases
Experience with batch and/or streaming data pipelines (e.g., Kafka, Flink, Spark, Ray) and orchestration tools (e.g., Airflow, Argo Workflow)
Demonstrated experience at least in one of the data areas: data catalog, data validation, versioning, lineage, and security/compliance
Labeling Track: (At least have experience with ONE of the following)
Proven working experience with labeling platforms (e.g., GroundTruth, Label Studio)
RLHF/instruction tuning, or annotation workflow development
Preferred Qualifications
Experience with LLM pipelines, including embeddings, retrieval-augmented generation (RAG), or prompt engineering
Familiarity with labeling copilot tools, active learning, or managing hybrid annotation teams
Knowledge of knowledge graphs or semantic data modeling