Senior Data Engineer ADMET

Sei unter den ersten Bewerbenden.
Nur für registrierte Mitglieder
Berlin
Remote
EUR 60.000 - 100.000
Sei unter den ersten Bewerbenden.
Vor 2 Tagen
Jobbeschreibung

At Apheris, we power federated data networks in life sciences to address the data bottleneck in training highly performant ML models. Publicly available molecular datasets are insufficient to train high-quality ML models that meet industry requirements. Our product hosts networks where biopharma organizations collaboratively train higher quality models on their combined data. The Apheris product is a set of drug discovery applications enriched with proprietary data from network participants. Our federated computing infrastructure, with built-in governance and privacy controls, ensures that data IP and ownership always stay with the data custodians.

As we focus more on ADMET (absorption, distribution, metabolism, excretion, and toxicity) in our drug discovery efforts, we are seeking a Senior Data Engineer to help us build advanced ADMET models. This hands-on, high-impact role involves advancing the application of foundational models to drug discovery problems. You will work closely with our ADMET team and serve as the technical authority on data preparation, harmonization, and pipelines in this domain.

You should have deep expertise in data infrastructure and preparation, with domain knowledge in pharmacokinetics and toxicity, especially related to ADMET modeling. Understanding the application of these models within industrial drug discovery workflows is essential.

If you want to be part of a mission-driven team building cutting-edge AI systems for life sciences and have the expertise to leverage domain-specific data, this role is for you.

What you will do

  1. ADMET Data Pipeline Development: Design, build, and maintain scalable pipelines for ingesting, processing, and harmonizing diverse ADMET datasets from public sources (e.g., ChEMBL, PubChem) and proprietary assays.
  2. Data Harmonization: Standardize heterogeneous ADMET data formats (e.g., in vitro assays, in silico predictions) across network participants to enable modeling readiness.
  3. Model-Ready Dataset Curation: Preprocess raw ADMET data (e.g., normalize units, handle missing values) to support ML model training for endpoints like bioavailability, hERG inhibition, or CYP450 interactions.
  4. Data Quality Assurance: Implement and automate validation checks to ensure data integrity.
  5. Cross-Functional Integration: Collaborate with computational chemists to optimize data structures for AI-driven ADMET models (e.g., graph-based representations for metabolic pathways).
  6. Stakeholder Collaboration: Work with customers and academic partners to define data preprocessing, selection, and benchmarking strategies for novel training tasks involving ADMET data, including harmonizing assay data from different sources.
  7. Strategic and Mentorship Roles: Guide team members on complex ADMET data preparation, influence data infrastructure strategies, and contribute to publications or open-source projects.

What we expect from you

  1. Within 3 months: Develop a deep understanding of the Apheris product and how it applies to current ADMET use cases. Take ownership of an ADMET data preparation stream, build relationships with leadership, and develop a roadmap for a high-value use case.
  2. Within 12 months: Lead multiple ADMET data efforts, demonstrate improvements in model performance and impact, mentor colleagues, and set strategic directions.

Qualifications

  • Background in computational chemistry, cheminformatics, computational biology, bioinformatics, data engineering, or computer science with experience in preparing data for ML in drug discovery.
  • Deep experience with pharma/biotech ADMET data pipelines and assay protocols.
  • Comfort navigating complex technical landscapes and driving modeling plans.
  • Understanding of how ADMET data and models are used in drug discovery.
  • Experience in federated learning, privacy-preserving ML, or secure model training.
  • Experience benchmarking predictive models, working with ML/MLOps at scale, and contributing to open-source tooling.
  • Hands-on experience with ADMET assays and DMPK stakeholders.
  • Ability to guide technical directions in fast-paced, research-oriented environments.

What we offer

  • Competitive compensation, including early-stage virtual share options.
  • Remote-first work environment.
  • Benefits including wellbeing and mental health budgets, work-from-home and coworking stipends, and learning budgets.
  • Team events, quarterly meetups, and a diverse, mission-driven team.
  • Opportunities for personal and professional growth.

About Apheris

Apheris enables federated life sciences data networks, addressing the challenge of accessing proprietary data due to IP and privacy concerns. Our platform allows organizations to collaboratively train high-quality ML models on combined data, focusing now on structural biology and ADMET.

Logistics

Interview process:

  • Initial Screening: A video call to explore fit and answer questions.
  • Deep Dive: An assessment of your skills and knowledge with a domain expert.
  • Final Interview: Up to three hours with founders and future coworkers.

Required Experience: Senior IC

Key Skills

Apache Hive, S3, Hadoop, Redshift, Spark, AWS, Apache Pig, NoSQL, Big Data, Data Warehouse, Kafka, Scala

Employment Type: Full-Time

Experience: Years

Vacancy: 1

Location: Berlin, Germany