Enable job alerts via email!

Senior Bioinformatics & Machine-Learning Data Scientist

Grafton Biosciences

South San Francisco (CA)

On-site

USD 140,000 - 230,000

Full time

3 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Grafton Biosciences seeks a Senior Bioinformatics & Machine-Learning Data Scientist to join their innovative team. The role involves designing and implementing bioinformatics pipelines, applying advanced machine learning models, and collaborating with biological experts to drive novel therapeutics. Competitive compensation and the opportunity to influence groundbreaking therapeutic designs await passionate candidates.

Benefits

Comprehensive health, dental, and vision coverage.
Competitive compensation.
Opportunity to shape new therapeutic designs.

Qualifications

  • Experience designing and maintaining bioinformatics pipelines.
  • Fluency in Python and statistical languages (R, Julia).
  • Mastery of distributed computing and machine learning.

Responsibilities

  • Design and optimize bioinformatics pipelines using tools like Nextflow.
  • Develop and apply machine learning models for biological data.
  • Collaborate with biologists to transform data insights into hypotheses.

Skills

Machine Learning
Bioinformatics
Data Engineering
Large-Scale Data Expertise
Statistics

Education

Ph.D. or Master’s in Bioinformatics, Computational Biology

Tools

Python
Spark
AWS

Job description

Senior Bioinformatics & Machine-Learning Data Scientist
Senior Bioinformatics & Machine-Learning Data Scientist

This range is provided by Grafton Biosciences. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.

Base pay range

$140,000.00/yr - $230,000.00/yr

About Us:

Grafton Biosciences is a stealth-mode, San Francisco-based biotech startup focused on solving disease through groundbreaking innovations in early detection and therapeutics. We are combining breakthroughs in synthetic biology, machine learning, and manufacturing to fundamentally extend healthy human lifespans. We’re looking for passionate team members who want to shape the future.

Role: Senior Bioinformatics & Machine-Learning Data Scientist

We are seeking a highly specialized scientist who thrives at the intersection of machine learning, bioinformatics, and data engineering. Your mission will be to build and adapt cutting-edge analytical pipelines that transform petabyte-scale multi-omics data into actionable biological insight. From raw sequencing reads to integrated molecular models, you will design the engines that fuel our discovery platform. The ideal candidate is not just a data scientist, but a computational biologist who can architect scalable infrastructure, craft sophisticated models, and collaborate seamlessly with wet-lab teams to drive novel therapeutics.

Key Responsibilities

  • Design, implement, and optimize end-to-end bioinformatics pipelines (e.g., Nextflow, Snakemake, Cromwell) for high-throughput genomics, transcriptomics, epigenomics, and single-cell assays.
  • Develop and apply advanced machine-learning / statistical models (including deep learning, probabilistic graphical models, and transformer-based architectures) to uncover biomarkers, predict functional effects, and stratify patient populations.
  • Engineer distributed data architectures (Spark, Dask, cloud object stores, GPU clusters) that enable rapid querying and analysis of terabyte- to petabyte-scale datasets.
  • Curate, harmonize, and QC diverse public and proprietary datasets, establishing robust data schemas, metadata standards, and version-controlled repositories.
  • Integrate multi-omics layers (DNA, RNA, protein, spatial, clinical) into unified representations that power target discovery and mechanism-of-action studies.
  • Collaborate deeply with experimental biologists and chemists to translate computational insights into testable hypotheses and guide iterative experimental design.
  • Stay at the forefront of the field by tracking breakthroughs in large-scale data analytics, generative biology, and cloud-native bioinformatics—and rapidly prototyping relevant approaches.

Qualifications

To address the specific needs of this role, candidates must demonstrate experience in the following core areas. Applications without this experience will not be considered:

  • Must-Have: Large-Scale Biological Data Expertise. Proven experience designing and maintaining pipelines for complex, high-volume biological datasets (e.g., whole-genome sequencing, single-cell RNA-seq, spatial transcriptomics, proteomics). You understand both the algorithms and the underlying biology.
  • Must-Have: Scalable Machine Learning & Data Engineering. Hands-on mastery of distributed computing (Spark, Ray, or similar) and cloud platforms (AWS or Azure). Demonstrated ability to train, tune, and serve large models on heterogeneous biological data.

Essential Qualifications

  • Ph.D. or Master’s in Bioinformatics, Computational Biology, Computer Science, Biostatistics, or related field with a strong focus on biological applications.
  • Fluency in Python (preferred) and one or more statistical languages (R, Julia).
  • Experience with workflow managers, containerization (Docker/Singularity), and CI/CD for reproducible science.
  • Solid grounding in statistics, experimental design, and multi-omics data integration.
  • Proficiency with relational and NoSQL databases, graph databases a plus.
  • Clear communication skills and a collaborative mindset; you can translate data insights into biological impact.

Preferred Qualifications

  • Big Plus: Expertise in dimensionality-reduction and visualization of massive high-dimensional datasets (e.g., UMAP, t-SNE, tensor decomposition).
  • Familiarity with reinforcement learning or generative models for biological sequence design.
  • Experience contributing to or maintaining open-source bioinformatics software.
  • Publications in top-tier ML or computational biology venues (e.g., NeurIPS, ICML, ICLR, ISMB, Cell Systems, Nature Methods).
  • Background in knowledge-graph construction or network biology.

What We Offer

  • Competitive compensation.
  • Comprehensive health, dental, and vision coverage.
  • The opportunity to define a new data-driven therapeutic design paradigm—and see your work progress toward the clinic.

Screening Questions

If you are a particularly good fit for this role, please email careers@graftonbio.com with responses to the following questions. The email subject should be: Bioinformatician - [Your Last Name].

(1) In ≤400 words, walk us through a project where you engineered an end-to-end pipeline that processed terabyte-scale biological data (genomics, single-cell, proteomics, etc.). Please cover:

  • What scientific question or decision did the pipeline enable?
  • Rough size, modality mix, and the toughest quality-control challenges you had to solve.
  • Pipeline architecture: Workflow manager(s) (e.g., Nextflow, Snakemake, Cromwell), container/CI setup, and the storage/compute stack (cloud services, distributed file systems, GPU/CPU mix).
  • Your personal contributions (key pieces of code or optimizations you authored).
  • Outcome & validation: Quantitative performance metrics (runtime, cost, accuracy) and the downstream biological insight or product milestone it unlocked.

We’re looking for evidence that you can own both the scientific rationale and the engineering required to make large-scale data analysis reliable and reproducible.

(2) Provide concise bullet points (≤400 words total) detailing one instance where you trained or served a large machine-learning model on heterogeneous biological data in a cloud or distributed environment. Please cover:

  • Type of model (e.g., transformer, GNN, probabilistic model) and the biological prediction or discovery goal.
  • Frameworks used (Spark, Ray, Dask, Horovod, PyTorch DDP, etc.), cluster size, and how you handled memory, scheduling, or GPU utilization.
  • How you organized, partitioned, and tracked multi-omics inputs across iterations.
  • Training/serving speed-ups, cost savings, or accuracy improvements achieved (include numbers).
  • How the model was integrated into downstream analyses, visualization dashboards, or experimental decision-making.

We want to see concrete evidence that you can push large models through cloud-scale infrastructure and connect their outputs back to actionable biology.

Seniority level
  • Seniority level
    Mid-Senior level
Employment type
  • Employment type
    Full-time
Job function
  • Job function
    Research

Referrals increase your chances of interviewing at Grafton Biosciences by 2x

Inferred from the description for this job

401(k)

Vision insurance

Medical insurance

Get notified about new Machine Learning Researcher jobs in South San Francisco, CA.

South San Francisco, CA $55.00-$63.00 18 hours ago

San Francisco, CA $152,000.00-$190,000.00 8 hours ago

Head of Computer Vision and Machine Learning

Redwood City, CA $160,000.00-$220,000.00 1 year ago

South San Francisco, CA $250,000.00-$300,000.00 2 weeks ago

Fundamental AI Research Scientist - FAIR (PhD)

Menlo Park, CA $117,000.00-$173,000.00 22 hours ago

Research Scientist Manager, Generative AI - Llama Pre-training

Menlo Park, CA $177,000.00-$251,000.00 2 weeks ago

Senior Machine Learning Engineer, Pricing
Senior Machine Learning Engineering Manager -- Teamwork Graph
Machine Learning Engineering Manager – Marketplace

San Mateo, CA $289,460.00-$338,270.00 3 days ago

Senior Machine Learning Engineering Manager

San Francisco, CA $168,200.00-$340,100.00 6 days ago

San Francisco, CA $200,000.00-$270,000.00 2 weeks ago

Solution Architect - Artificial Intelligence & Machine Learning - Consumer Business Group
Machine Learning Engineering Manager, Content Safety

San Mateo, CA $289,460.00-$338,270.00 5 days ago

Lead Software Engineer II, Machine Learning

San Francisco, CA $182,000.00-$215,000.00 3 months ago

Senior Staff Machine Learning Engineer - Marketplace Pricing & Incentives
Machine Learning Engineering Manager, Safeguards
Senior Machine Learning Engineering Manager - Pricing & Incentives
Research Scientist Manager - 3D Generative AI
Senior Data Scientist - Credit & Lending
Postdoctoral Researcher, Fundamental AI Research (PhD)

Menlo Park, CA $117,000.00-$173,000.00 2 weeks ago

San Francisco, CA $182,000.00-$215,000.00 2 weeks ago

Postdoctoral Researcher, Fundamental AI Research (PhD)

San Francisco, CA $117,000.00-$173,000.00 2 weeks ago

San Francisco, CA $65,300.00-$121,100.00 1 day ago

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Data Scientist

The Rundown AI, Inc.

Palo Alto

Remote

USD 120,000 - 170,000

7 days ago
Be an early applicant

Sr Data Scientist - Healthcare & Research - Analysts

Comforcehealth

San Francisco

Remote

USD 150,000 - 200,000

10 days ago

Staff Data Scientist Engineer

CareDx, Inc.

Brisbane

Remote

USD 161,000 - 185,000

3 days ago
Be an early applicant

Senior Data Scientist (Remote)

Latica

Palo Alto

Remote

USD 120,000 - 160,000

9 days ago

Senior Data Scientist

Plum Inc

San Francisco

Remote

USD 140,000 - 195,000

13 days ago

Data Scientist

Pinterest

San Francisco

Remote

USD 101,000 - 210,000

4 days ago
Be an early applicant

Data Scientist

AECOM

Concord

Remote

USD 221,000 - 236,000

5 days ago
Be an early applicant

Senior Data Scientist/Machine Learning Engineer

Trimble

Remote

USD 130,000 - 180,000

10 days ago

Senior Data Engineer

Jobot

San Francisco

Remote

USD 150,000 - 200,000

3 days ago
Be an early applicant