EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.
We are seeking an accomplished
Chief AI Platform Engineer
to lead the development and management of advanced systems that empower stream-aligned teams to deliver secure, scalable, and high-performing AI-driven solutions.
In this role, you will establish strategic direction for the platform, shape cross-functional collaboration, and drive production-grade deployment of machine learning models at scale. Additionally, you will spearhead innovation by evaluating and integrating emerging technologies.
Responsibilities
- Define and execute the architectural strategy for scalable backend systems leveraging MLFlow, Kubernetes, Databricks, and Docker
- Lead the development of high-performance APIs tailored for advanced data processing and seamless machine learning model integration
- Set vision for large-scale operationalization and deployment of machine learning models, leveraging MLFlow frameworks and best practices
- Establish organizational strategies to deliver scalable, robust, and reusable cloud-based infrastructures
- Drive the creation of sophisticated automated cloud configuration workflows optimized for large-scale environments
- Oversee cross-domain optimizations to enhance systems, processes, and tools for advanced functionality
- Provide authoritative guidance on cloud architecture in areas such as automation, orchestration, security, resilience, and operability
- Ensure continuous alignment of cloud platform initiatives with long-term business priorities and evolving technological demands
- Guide executive-level discussions and build consensus among diverse technical and business stakeholders for impactful delivery outcomes
- Cultivate organizational talent by mentoring mid-level engineers and data scientists in leveraging complex frameworks and tools
- Lead evaluations and integration of cutting-edge products, services, and technologies for enhanced platform capability
- Encourage collaboration and inspire innovation by setting the tone in ceremonies, strategic discussions, and retrospectives
- Act as the highest escalation authority during major incident responses, influencing long-term improvements to system reliability
- Shape mitigation strategies for risks and navigate technical complexities while driving communication of outcomes across stakeholders
- Maintain keen foresight on technological trends to anticipate industry shifts and inform critical strategy development
Requirements
- Bachelor'sdegree in Computer Science, Software Engineering, or a related field; a Master's degree is preferred
- 7+ years of experience architecting and managing highly available, scalable infrastructure, solutions, and services in complex environments
- At least 2 years of experience in a technical leadership role directing engineering or platform teams
- Proven mastery of leading Cloud (IaaS, PaaS, SaaS) services and solutions across major providers
- Expert-level proficiency in programming languages such as Python and modern JVM languages including Java, Scala, or Kotlin
- Deep expertise in all stages of machine learning workflows, including advanced algorithms, frameworks like TensorFlow, PyTorch, or scikit-learn, and production-scale deployment
- Extensive experience in distributed data processing frameworks such as Apache Spark, with strong proficiency in Delta Lake and Parquet formats
- Expert knowledge of agile methodologies and enterprise-level CI/CD processes using tools like GitLab, Terraform, or similar platforms
- Established track record of architecting and managing machine learning models at scale within mission-critical production environments
- Advanced competency in data structures, algorithmic problem-solving, and principles of enterprise-scale system design
- Exceptional proficiency in notebook-based workflows using tools like Jupyter or Databricks for large-scale data experimentation
- AWS Certified Solutions Architect Professional or comparable elite-level cloud architecture certifications
- Excellent analytical and strategic communication skills with the ability to influence diverse audiences, from technical talent to senior executives
- Fluent English proficiency at a minimum of C1 level to drive collaboration across global, multidisciplinary teams
Nice to have
- Proven hands-on experience across the extended AWS ecosystem, including AWS S3, EC2, RDS, EMR, RedShift, Glue, Sagemaker, Lambda, DynamoDB, and CloudWatch
- Expert familiarity with Apache Parquet and data lake transformation workflows utilizing next-generation solutions
We offer
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn