Job Search and Career Advice Platform

Enable job alerts via email!

Data Scientist

FIRMUS METAL INTERNATIONAL PTE. LTD.

Singapore

On-site

SGD 80,000 - 110,000

Full time

Yesterday
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A technology company in AI solutions is looking for a Data Scientist to optimize workload performance and energy consumption. The successful candidate will build predictive models and implement anomaly detection, utilizing time-series data to drive insights for energy analytics. This full-time position offers the opportunity to make a significant impact on sustainable AI infrastructure, working in a diverse, inclusive environment based in Singapore or Australia.

Qualifications

  • 5–7 years of data science experience focused on time-series analysis, anomaly detection, or operational data.
  • Expert in Python and R, capable of end-to-end analysis and productionizing models.
  • Strong foundation in statistical techniques including hypothesis testing and uncertainty quantification.

Responsibilities

  • Analyze GPU and facility time-series data to identify patterns and leading indicators.
  • Build predictive models for power demand and resource contention.
  • Implement anomaly detection models for real-time monitoring.

Skills

Time-series analysis
Anomaly detection
Operational data analysis
Predictive modeling
Python
R
Statistical analysis
Data quality practices

Tools

ARIMA
Prophet
TensorFlow
PyTorch
Grafana
Prometheus
Job description
Role Summary

The Data Scientist will turn GPU/facility/DC telemetry and operational data into predictive models, patterns and insights that help Firmus AI Factory users optimize workload performance and energy consumption. You'll build anomaly detection, forecasting, and efficiency scoring that differentiate the platform. Your models and insights power the energy analytics that are a key competitive advantage for Firmus FactoryOS and drive customer cost optimization decisions.

Key Responsibilities
  • Analyze GPU and facility time-series data: identify patterns, leading indicators of degradation, thermal stress, power throttling.
  • Build predictive models: forecast power demand, detect anomalies, predict resource contention, recommend optimal batch sizes.
  • Quantify energy consumption per workload: kWh/Joule per training job, per-token energy for inference, energy vs. performance curves.
  • Build AI workload profiles with correlation to energy consumption for different AI work types and stages of the work.
  • Build energy efficiency scoring: rate jobs/clusters/tenants on efficiency (e.g., “this cluster runs at 40% MFU; optimal is 65%”).
  • Implement anomaly detection models (Isolation Forest, autoencoders, statistical) for real‑time cluster monitoring.
  • Implement event correlation: when anomalies are detected, correlate with telemetry events to suggest root causes.
  • Create incident copilot features: anomaly detected → summarize relevant telemetry → suggest likely causes and actions.
  • Build RAG evaluation metrics: retrieval accuracy (NDCG, MRR), reranking quality, end‑to‑end answer quality.
  • Implement continuous monitoring for model drift; retrain models as patterns evolve.
  • Productionize models into pipelines: batch prediction, real‑time scoring, metric updates.
Skills and Experience
  • 5–7 years of data science experience focused on time‑series analysis, anomaly detection, or operational data.
  • Proficiency with ARIMA, Prophet, state‑space models, autoencoders, or deep learning for time‑series forecasting.
  • Strong statistical foundation: hypothesis testing, confidence intervals, uncertainty quantification.
  • Expert Python/R: pandas, scikit‑learn, PyTorch/TensorFlow, Jupyter; can build end‑to‑end analysis and productionize models.
  • Hands‑on data quality practices: handle missing data, sensor noise, outliers, validation before modelling.
  • Experience with Prometheus, Grafana, or observability platforms for accessing operational metrics.
  • Comfort with anomaly detection frameworks (Isolation Forest, LOF, autoencoders) and event correlation.
Key Competencies
  • Time‑Series Mastery: deeply understands seasonality, trend, noise, stationarity, forecasting trade‑offs.
  • Production Mindset: not just Jupyter notebooks; thinks about model deployment, retraining, monitoring in production.
  • Communication: explains findings to both technical engineers and non‑technical operators/customers clearly.
  • Rigor: validates models on hold‑out test sets; reports false‑positive rates, detection latency, uncertainty.
  • Curiosity: asks “why” questions; doesn’t just fit models; understands the business impact.
Success Metrics
  • Actionable detection (not noise): anomalies are detected quickly with acceptable false positives and strong operator confidence.
  • Forecasting & planning accuracy improves: forecasts are accurate enough to inform capacity and energy planning decisions.
  • Measured efficiency impact: insights drive reductions in waste and/or cost (GPU‑hours, energy‑per‑workload) where adopted.
  • Telemetry trust & completeness stays high: data quality supports billing/ops/optimization decisions reliably.
  • RCA acceleration via analytics: analytics shorten investigations for repeat incident classes and reduce time‑to‑identify‑likely‑cause.
Location & Reporting
  • Singapore or Australia (Launceston, TAS or Sydney, NSW)
  • Reporting to Head of AI & Applications
Employment Basis

Full‑time

Diversity

At Firmus, we are committed to building a diverse and inclusive workplace. We encourage applications from candidates of all backgrounds who are passionate about creating a more sustainable future through innovative engineering solutions.

Join us in our mission to revolutionize the AI industry through sustainable practices and cutting‑edge engineering. Apply now to be part of shaping the future of sustainable AI infrastructure.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.