Aktiviere Job-Benachrichtigungen per E-Mail!

Senior Data Engineer ADMET

Apheris

Berlin

Remote

EUR 60.000 - 100.000

Vollzeit

Heute
Sei unter den ersten Bewerbenden

Erhöhe deine Chancen auf ein Interview

Erstelle einen auf die Position zugeschnittenen Lebenslauf, um deine Erfolgsquote zu erhöhen.

Zusammenfassung

Join a forward-thinking company dedicated to revolutionizing drug discovery through advanced AI systems. As a Senior Data Engineer, you will play a pivotal role in developing ADMET models, ensuring data integrity, and collaborating with a diverse team of experts. This hands-on position offers the chance to work with cutting-edge technology and make a significant impact in the life sciences sector. With a remote-first work environment and a focus on personal and professional growth, this opportunity is perfect for those passionate about leveraging data to drive innovation in drug discovery.

Leistungen

Competitive compensation
Early-stage virtual share options
Wellbeing and mental health budgets
Work-from-home stipends
Learning budgets
Team events
Quarterly meetups
Opportunities for personal growth

Qualifikationen

  • Deep expertise in data infrastructure and preparation for drug discovery.
  • Experience with ADMET data pipelines and assay protocols.
  • Ability to guide technical directions in research-oriented environments.

Aufgaben

  • Design and maintain scalable pipelines for ADMET datasets.
  • Standardize ADMET data formats for modeling readiness.
  • Collaborate with stakeholders to define data preprocessing strategies.

Kenntnisse

ADMET Data Pipeline Development
Data Harmonization
Model-Ready Dataset Curation
Data Quality Assurance
Stakeholder Collaboration
Experience with ADMET assays
Federated Learning
Benchmarking Predictive Models

Ausbildung

Background in Computational Chemistry
Experience in Data Engineering
Experience in Bioinformatics

Tools

Apache Hive
AWS
Hadoop
Spark
Kafka
NoSQL
Redshift
Scala

Jobbeschreibung

At Apheris, we power federated data networks in life sciences to address the data bottleneck in training highly performant ML models. Publicly available molecular datasets are insufficient to train high-quality ML models that meet industry requirements. Our product hosts networks where biopharma organizations collaboratively train higher quality models on their combined data. The Apheris product is a set of drug discovery applications enriched with proprietary data from network participants. Our federated computing infrastructure, with built-in governance and privacy controls, ensures that data IP and ownership always stay with the data custodians.

As we focus more on ADMET (absorption, distribution, metabolism, excretion, and toxicity) in our drug discovery efforts, we are seeking a Senior Data Engineer to help us build advanced ADMET models. This hands-on, high-impact role involves advancing the application of foundational models to drug discovery problems. You will work closely with our ADMET team and serve as the technical authority on data preparation, harmonization, and pipelines in this domain.

You should have deep expertise in data infrastructure and preparation, with domain knowledge in pharmacokinetics and toxicity, especially related to ADMET modeling. Understanding the application of these models within industrial drug discovery workflows is essential.

If you want to be part of a mission-driven team building cutting-edge AI systems for life sciences and have the expertise to leverage domain-specific data, this role is for you.

What you will do
  1. ADMET Data Pipeline Development: Design, build, and maintain scalable pipelines for ingesting, processing, and harmonizing diverse ADMET datasets from public sources (e.g., ChEMBL, PubChem) and proprietary assays.
  2. Data Harmonization: Standardize heterogeneous ADMET data formats (e.g., in vitro assays, in silico predictions) across network participants to enable modeling readiness.
  3. Model-Ready Dataset Curation: Preprocess raw ADMET data (e.g., normalize units, handle missing values) to support ML model training for endpoints like bioavailability, hERG inhibition, or CYP450 interactions.
  4. Data Quality Assurance: Implement and automate validation checks to ensure data integrity.
  5. Cross-Functional Integration: Collaborate with computational chemists to optimize data structures for AI-driven ADMET models (e.g., graph-based representations for metabolic pathways).
  6. Stakeholder Collaboration: Work with customers and academic partners to define data preprocessing, selection, and benchmarking strategies for novel training tasks involving ADMET data, including harmonizing assay data from different sources.
  7. Strategic and Mentorship Roles: Guide team members on complex ADMET data preparation, influence data infrastructure strategies, and contribute to publications or open-source projects.
What we expect from you
  1. Within 3 months: Develop a deep understanding of the Apheris product and how it applies to current ADMET use cases. Take ownership of an ADMET data preparation stream, build relationships with leadership, and develop a roadmap for a high-value use case.
  2. Within 12 months: Lead multiple ADMET data efforts, demonstrate improvements in model performance and impact, mentor colleagues, and set strategic directions.
Qualifications
  • Background in computational chemistry, cheminformatics, computational biology, bioinformatics, data engineering, or computer science with experience in preparing data for ML in drug discovery.
  • Deep experience with pharma/biotech ADMET data pipelines and assay protocols.
  • Comfort navigating complex technical landscapes and driving modeling plans.
  • Understanding of how ADMET data and models are used in drug discovery.
  • Experience in federated learning, privacy-preserving ML, or secure model training.
  • Experience benchmarking predictive models, working with ML/MLOps at scale, and contributing to open-source tooling.
  • Hands-on experience with ADMET assays and DMPK stakeholders.
  • Ability to guide technical directions in fast-paced, research-oriented environments.
What we offer
  • Competitive compensation, including early-stage virtual share options.
  • Remote-first work environment.
  • Benefits including wellbeing and mental health budgets, work-from-home and coworking stipends, and learning budgets.
  • Team events, quarterly meetups, and a diverse, mission-driven team.
  • Opportunities for personal and professional growth.
About Apheris

Apheris enables federated life sciences data networks, addressing the challenge of accessing proprietary data due to IP and privacy concerns. Our platform allows organizations to collaboratively train high-quality ML models on combined data, focusing now on structural biology and ADMET.

Logistics

Interview process:

  • Initial Screening: A video call to explore fit and answer questions.
  • Deep Dive: An assessment of your skills and knowledge with a domain expert.
  • Final Interview: Up to three hours with founders and future coworkers.

Required Experience: Senior IC

Key Skills

Apache Hive, S3, Hadoop, Redshift, Spark, AWS, Apache Pig, NoSQL, Big Data, Data Warehouse, Kafka, Scala

Employment Type: Full-Time

Experience: Years

Vacancy: 1

Location: Berlin, Germany

Hol dir deinen kostenlosen, vertraulichen Lebenslauf-Check.
eine PDF-, DOC-, DOCX-, ODT- oder PAGES-Datei bis zu 5 MB per Drag & Drop ablegen.