Overview
Mistplay is the #1 loyalty app for mobile gamers. Our community of millions of engaged mobile gamers come to Mistplay to discover new games to play and earn rewards. Gamers are rewarded for their time and money spent within the games and can redeem those rewards for gift cards. Mistplay is on a mission to be the best way to play mobile games for everyone everywhere!
Français — Missions et Atouts
- Sous la responsabilité du directeur de la plateforme de données et d\'apprentissage automatique, l\'ingénieur sénior de la plateforme d\'apprentissage automatique jouera un rôle clé dans la recherche et le développement de solutions d\'IA visant à résoudre des problèmes commerciaux complexes. Il travaillera en étroite collaboration avec une équipe interfonctionnelle pour identifier les domaines à améliorer et concevoir des solutions évolutives.
- Conception, création et exploitation de pipelines standardisés de formation à la mise en service, couvrant la gestion des artefacts, l\'approvisionnement de l\'environnement, l\'empaquetage, le déploiement et la restauration pour les points de terminaison SageMaker.
- Maîtrise de l\'inférence en temps réel et par lots sur SageMaker et de l\'inférence serverless lorsque approprié, avec stratégies blue/green et canary, politiques d\'auto-scaling et contrôles des coûts.
- Implémentation de modèles de service à très faible latence avec Redis/Valkey (mise en cache des fonctionnalités, récupération en ligne, état par requête, mise en cache des réponses, limitation du débit).
- Provisionnement et gestion de l\'infrastructure ML/données avec Terraform (points de terminaison SageMaker, ressources ECR/ECS/EKS, réseau/VPC, clusters ElastiCache/Valkey, piles d\'observabilité, secrets et IAM).
- Construction d\'abstractions de plate-forme et de chemins d\'or : modèles Airflow DAG, CLI/SDK, référentiels cookie-cutter et pipelines CI/CD.
- Mise en place de la gouvernance du cycle de vie des modèles : registres, workflows d\'approbation, politiques de promotion, lignage et pistes d\'audit intégrés.
- Observalbilité de bout en bout : vérifications de fraîcheur des données/fonctionnalités, dérive/qualité, SLO de performance/latence, tableaux de bord, traçabilité et alertes, réponse aux incidents et postmortems.
- Collaboration avec les équipes sécurité, SRE et ingénierie des données sur les réseaux privés et l\'IAM à privilèges minimaux.
- Évaluation et intégration des outils de la plateforme et pilotage des migrations avec gestion des changements et interruption minimale.
What you’ll do
- Design, build, and operate standardized training-to-serving pipelines with Airflow, covering artifact management, environment provisioning, packaging, deployment, and rollback for SageMaker endpoints.
- Own real-time and batch inference on SageMaker: multi-model endpoints, serverless inference where appropriate, blue/green and canary strategies, autoscaling policies, and cost controls (spot strategies, instance right-sizing).
- Implement ultra-low-latency serving patterns with Redis/Valkey: feature caching, online feature retrieval, request-scoped state, model response caching, and rate limiting/backpressure for bursty traffic.
- Provision and manage ML/data infrastructure with Terraform: SageMaker endpoints/configs, ECR/ECS/EKS resources, networking/VPC endpoints, ElastiCache/Valkey clusters, observability stacks, secrets, and IAM.
- Build platform abstractions and golden paths: Airflow DAG templates, CLI/SDKs, cookie-cutter repos, and CI/CD pipelines that take models from notebooks to production predictably.
- Establish and run model lifecycle governance: model/feature registries, approval workflows, promotion policies, lineage, and audit trails integrated with Airflow runs and Terraform state.
- Implement end-to-end observability: data/feature freshness checks, drift/quality gates, model performance/latency SLOs, infra health dashboards, tracing, and alerting—plus incident response and postmortems.
- Partner with Security, SRE, and Data Engineering on private networking, policy-as-code, PII handling, least-privilege IAM, and cost-efficient architectures across environments.
- Evaluate, integrate, and rationalize platform tooling; lead migrations with clear change management and minimal downtime.
What you’ll bring
- 5+ years building and operating production-grade ML/data platforms with a focus on serving, reliability, and developer experience.
- Strong software engineering in Python, Go, or Java; experience building resilient services, APIs, and automation tooling with high test coverage.
- Deep experience with AWS SageMaker inference: endpoint configuration, containerization, model packaging, autoscaling, serverless vs. real-time trade-offs, MME, A/B and canary releases.
- Expertise with online feature stores like Redis/Valkey in ML serving contexts.
- Proven Terraform experience managing ML and data infra end-to-end: modules, workspaces, drift detection, change reviews, and safe rollbacks; familiarity with GitOps patterns.
- Airflow orchestration at scale: dependency modeling, sensors, retries, SLAs, backfills, DAG factories, and integrations with registries, artifact stores, and Terraform pipelines.
- Familiarity with ML frameworks (scikit-learn, XGBoost, PyTorch, TensorFlow) from a platform-integration perspective to support diverse runtimes and containers.
- Observability for ML workflows: metrics/logs/traces, performance profiling, capacity planning, cost monitoring, and runbooks.
- Excellent communication and cross-functional collaboration with Data Science, Data Engineering, DevOps and Backend.
Why Mistplay?
We strive to make our work environment as inviting and fun as possible! Working at Mistplay is coupled with a whole array of perks that we\'ve adopted virtually and in-person: Team Lunches, game nights, company-wide events, and so much more. Our culture is deeply rooted in growth and upheld by a team of smart, dynamic, and enthusiastic people. We utilize data to constantly learn, improve, and adapt. We foster an environment where everyone is encouraged to share their ideas, push boundaries, take calculated risks, and witness their visions come to life.