Attiva gli avvisi di lavoro via e-mail!

Senior Reinforcement Learning Builder

Neutralis S.R.L

Milano

In loco

EUR 60.000 - 100.000

Tempo pieno

Oggi
Candidati tra i primi

Genera un CV personalizzato in pochi minuti

Ottieni un colloquio e una retribuzione più elevata. Scopri di più

Descrizione del lavoro

A cutting-edge technology company in Milan seeks a Senior Research Engineer to lead initiatives in Reinforcement Learning and controls for industrial systems, mentoring a team of MSc/PhD students. Candidates must have a strong background in offline RL, Python, and effective stakeholder communication. This role emphasizes developing robust pipelines for data and policy learning in energy applications, along with responsibilities encompassing safety and evaluation.

Servizi

Competitive package with equity
Conference and equipment budget
Opportunity to solve hard problems in the energy sector

Competenze

  • Track record shipping RL/controls for physical systems.
  • Deep hands-on skill in offline RL and model-based RL/MPC.
  • Strong engineering skills in Python and PyTorch or JAX.
  • Rigor around evaluation and safety.
  • Ability to lead and mentor a research-engineering team.
  • Clear writing and stakeholder communication.
  • Degree in CS/EE/ME/Controls or equivalent experience.

Mansioni

  • Own the RL/control roadmap.
  • Build the pipeline for data curation and policy learning.
  • Ship reproducible research to production.
  • Lead and mentor a cohort of MSc/PhD students.
  • Partner with domain experts.
  • Define fallback controllers and quantify risk.
  • Collaborate across the stack.
  • Communicate and present results.

Conoscenze

RL/controls for physical systems
offline RL
Python
PyTorch or JAX
evaluation and safety
lead and mentor
clear writing

Formazione

Degree in CS/EE/ME/Controls or equivalent experience

Strumenti

MLflow
W&B
FastAPI
PostgreSQL
Descrizione del lavoro
Overview

Neutralis is building the learning brain for industrial heat‑pump plants. We fuse model‑based RL with digital twins and strict safety constraints to turn messy plant telemetry into better decisions, hour by hour. This is paper‑to‑plant work with real impact on energy, reliability, and decarbonization.

Challenge

Industrial plants are complex, safety‑critical, and non‑stationary. Off‑policy data, partial observability, actuator limits, drift, and human‑in‑the‑loop operations make naïve RL fail fast. Your mission is to own a safe, reproducible path from data to control: offline → simulated → shadow → live, with guardrails at every step.

Responsibilities
  • Own the RL/control roadmap: architect offline RL + model‑based control with a digital twin in the loop; define safety envelopes and verification gates.
  • Build the pipeline: data curation, policy learning, simulation/gym environments, evaluation harnesses, and promotion criteria from sim to plant.
  • Ship reproducible research to production: baselines, ablations, and clear experiment tracking; transform results into services/APIs.
  • Lead and mentor a 15–20 person cohort of MSc/PhD thesis students and research engineers; set standards for code, experiments, and writing.
  • Partner with domain experts (HVAC/OT/BMS) on constraints, actuation limits, failure modes, and alarm triage.
  • Land safety: define fallback controllers, interlocks, and shadow‑mode strategies; quantify risk and uncertainty.
  • Collaborate across the stack with our FastAPI services, time‑series store, and observability/ML Ops.
  • Communicate: write crisp technical notes, contribute to publications where useful, and present results to partners.
What you’ll bring
  • Track record shipping RL/controls for physical systems (energy, robotics, process, automotive, etc.).
  • Deep hands‑on skill in offline RL (e.g., CQL/IQL/TD3‑BC) and model‑based RL/MPC; comfort with system identification and constrained optimization.
  • Strong engineering in Python and PyTorch or JAX; experience with experiment tracking (MLflow/W&B), containers, and CI.
  • Rigor around evaluation and safety: distribution shift, uncertainty, guardrails, fallback policies.
  • Ability to lead, mentor, and scale a research‑engineering team.
  • Clear writing and stakeholder communication.
  • Degree in CS/EE/ME/Controls or equivalent experience.
Nice to have
  • Familiarity with OT/BMS/historians (OPC UA, Modbus, BACnet, PI), time‑series modeling, anomaly detection.
  • Experience with digital twins/simulation, domain randomization, and sim‑to‑real transfer.
  • MLOps in AWS; FastAPI, PostgreSQL + a time‑series DB.
  • Italian language skills.
Why Neutralis
  • Hard problems, real plants: your work moves real energy, not just a leaderboard.
  • Ownership: technical stewardship from first principles to deployment.
  • Talent platform: lead a serious thesis cohort and shape a next‑gen team.
  • Impact: measurable COP uplift, energy savings, reliability gains.
  • Compensation: competitive package with meaningful equity; conference and equipment budget.
Location & working model

On‑site in Milan (primary). Some flexibility for exceptional candidates. Occasional visits to partner sites.

What success looks like (6–12 months)
  • A documented, reproducible RL pipeline from data → policy → evaluation → shadow.
  • Benchmarked policies that outperform baselines in sim and shadow with clear safety margins.
  • A mentored student cohort delivering publishable experiments and production‑ready components.
  • Accepted path to controlled live trials with partners.
How to apply

Apply on LinkedIn or send a short note with "RL — Senior", a link to work you’re proud of (GitHub/Google Scholar/website), and availability. DMs welcome.

Neutralis is an equal‑opportunity employer. We value clarity, safety, and results over pedigree. If you’ve shipped control systems that matter, we want to hear from you.

Ottieni la revisione del curriculum gratis e riservata.
oppure trascina qui un file PDF, DOC, DOCX, ODT o PAGES di non oltre 5 MB.