Enable job alerts via email!

Machine Learning Engineer, Causal Discovery

Rocket Lab

United States

Remote

USD 133,000 - 186,000

Full time

Today
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company in AI solutions is seeking a Machine Learning Engineer to enhance drug discovery processes by developing causal inference systems. The role entails robust collaboration across scientific disciplines and offers a competitive salary range along with various benefits including stock options and professional growth opportunities.

Benefits

Medical/dental/vision benefits
Generous learning opportunities
401(k) plans
PTO (summer and winter breaks)
Stock options depending on employment type

Qualifications

  • Ph.D. required; 3-5 years experience in causal modeling or graph algorithms.
  • Experience with multi-modal data processing preferred.
  • Strong collaboration and communication skills essential.

Responsibilities

  • Develop scalable systems for causal reasoning in biological contexts.
  • Implement algorithms for causal understanding.
  • Communicate complex ideas to internal and external stakeholders.

Skills

Probabilistic or causal modeling
Large-scale graph algorithms
Graph neural networks
Causal inference

Education

Ph.D. in Computer Science, High-Performance Computing, or related field

Tools

High-Performance Computing environments
AWS
Google Cloud Platform (GCP)

Job description

About SandboxAQ

SandboxAQ is a high-growth company delivering AI solutions that address some of the world's greatest challenges. The company’s Large Quantitative Models (LQMs) power advances in life sciences, financial services, navigation, cybersecurity, and other sectors.

We are a global team that is tech-focused and includes experts in AI, chemistry, cybersecurity, physics, mathematics, medicine, engineering, and other specialties. The company emerged from Alphabet Inc. as an independent, growth capital-backed company in 2022, funded by leading investors and supported by a braintrust of industry leaders.

At SandboxAQ, we’ve cultivated an environment that encourages creativity, collaboration, and impact. By investing deeply in our people, we’re building a thriving, global workforce poised to tackle the world's epic challenges. Join us to advance your career in pursuit of an inspiring mission, in a community of like-minded people who value entrepreneurialism, ownership, and transformative impact.

About the Team

SandboxAQ’s AI Simulation team is advancing the frontiers of drug and materials discovery by integrating physics-based simulations with cutting-edge AI. We are looking for an experienced and innovative Machine Learning Engineer to drive causal inference capabilities across complex biological systems using multi-modal datasets—including omics data, clinical information, and physics-based simulations.

In this role, you will design and build causal machine learning systems that enable a deeper understanding of biological mechanisms and accelerate scientific discovery. You will bring expertise in probabilistic graphical models, large-scale graph algorithms, and deep learning techniques for causal discovery, and collaborate closely within a high-performing, interdisciplinary team of drug discovery scientists, computational chemists, physicists, AI researchers, bioinformaticians, and software engineers.

Key Responsibilities
  • Develop robust, scalable software systems that enable large-scale causal reasoning
  • Design and implement algorithms to advance understanding of causality in complex biological systems
  • Apply advanced graph-based reasoning techniques—including Graph Neural Networks, Probabilistic Graphical Models, and LLMs—for querying and inference over large-scale causal biomedical knowledge graphs constructed from simulation, omics data, and literature
  • Identify, ingest, and curate relevant data sources. Own data quality control, validation, and integration workflows
  • Research and prototype novel bioinformatics and deep learning approaches to interpret human genetic variants, gene regulation mechanisms, gene expression dynamics, and disease pathways using diverse multimodal data (e.g., clinical phenotypes, medical records, multi-omics, single-cell data, proteomics, genomics)
  • Communicate complex ideas effectively across audiences, including internal collaborators, external stakeholders, and clients—tailoring technical depth as needed
  • Contribute to the scientific community through patent filings, peer-reviewed publications, white papers, and conference presentations
Basic Qualifications
  • Ph.D. in Computer Science, High-Performance Computing, or a related field
  • 3–5 years of hands-on experience, preferably in the private sector, working on one or more of the following:
    • Probabilistic or causal modeling
    • Large-scale graph algorithms
    • Graph neural networks
  • Experience in processing and curating multi-modal data—including large-scale omics, clinical datasets, and scientific literature
  • Proficiency in running analyses and training machine learning or deep learning models in high-performance computing (HPC) environments, particularly those using GPUs
  • Strong collaboration mindset, with the ability to identify problems and communicate technical concepts clearly to both technical and non-technical stakeholders
  • Demonstrated ability to dive deep into technically complex problems and a track record of driving initiatives through to completion
Preferred Qualifications
  • Familiarity with advanced AI concepts, including:
    • Generative AI (LLMs, Biological Foundation Models)
    • Probabilistic Graphical Models (e.g., Bayesian Networks, Markov Networks, deep learning extensions)
    • Causal inference (e.g., do-calculus, recent developments in causal discovery)
  • Experience with cloud platforms such as Google Cloud Platform (GCP) or AWS for data storage and compute
  • Working knowledge of graph databases and graph data structures
  • Basic understanding of molecular biology concepts, particularly the central dogma (DNA, RNA, protein), and related high-throughput technologies such as RNA-seq, epigenomics, single-cell and spatial omics
  • Strong publication record in peer-reviewed venues (eg. NeurIPS, ICML, ICLR, CVPR, ECCV, ICCV)
  • Willingness to travel up to 25% for conferences, customer engagements, team offsites, or internal meetings
Details
  • Location: Remote (USA, Canada)

The US base salary range for this full-time position is expected to be $133k-$186k per year. Our salary ranges are determined by role and level. Within the range, individual pay is determined by factors including job-related skills, experience, and relevant education or training. This role may be eligible for annual discretionary bonuses and equity.

SandboxAQ welcomes all.
We are committed to creating an inclusive culture where we have zero tolerance for discrimination. We invest in our employees' personal and professional growth. Once you work with us, you can’t go back to normalcy because great breakthroughs come from great teams and we are the best in AI and quantum technology.
We offer competitive salaries, stock options depending on employment type, generous learning opportunities, medical/dental/vision, family planning/fertility, PTO (summer and winter breaks), financial wellness resources, 401(k) plans, and more.
Equal Employment Opportunity: All qualified applicants will receive consideration regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, or Veteran status.
Accommodations: We provide reasonable accommodations for individuals with disabilities in job application procedures for open roles. If you need such an accommodation, please let a member of our Recruiting team know.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Data Scientist

Cohere Health

null null

On-site

On-site

USD 120 000 - 145 000

Full time

30+ days ago