We are currently seeking a Senior Cloud Platform Engineer to join an innovative and fast-growing software development start-up. The ideal candidate will bring a strong background in data engineering combined with extensive experience in cloud infrastructure. This role is critical to shaping and scaling their cloud architecture as they continue to build cutting-edge solutions.
Responsibilities:
- Cloud Infrastructure Management: Administer and fine-tune AWS environments to ensure optimal performance, cost-efficiency, and scalability across core services.
- Cross-Functional Collaboration: Partner with engineering, product, and business stakeholders to architect infrastructure solutions aligned with strategic goals and seamlessly integrated into current and future products.
- Infrastructure as Code & CI/CD: Implement and manage cloud resources using Terraform while optimizing CI/CD pipelines to streamline deployments and improve development workflows.
- System Monitoring & Reliability: Enhance observability by establishing robust monitoring, alerting, and diagnostic tools that support early issue detection and uninterrupted service availability.
- Cost Optimization: Monitor cloud usage and implement strategies to control costs without compromising system performance or availability.
- Generative AI Infrastructure: Lead the design and deployment of secure, scalable infrastructure to support Generative AI applications in collaboration with software and AI engineering teams.
- Data Integration: Build and maintain secure data pipelines between the platform and external customer systems, including Power BI and Azure-based data solutions.
- Cloud Operations Best Practices: Define and enforce operational standards, including on-call procedures, incident response, and disaster recovery planning.
- Engineering Productivity: Promote the adoption of AI-enhanced development tools (e.g., GitHub Copilot) to accelerate team output and code quality.
- Strategic Communication: Deliver technical insights and strategic recommendations to senior leadership and stakeholders to support business decision-making.
- Team Engagement: Actively contribute to team planning, knowledge-sharing sessions, and collaborative initiatives. Foster a proactive, self-driven team culture.
Requirements:
- 4+ years of experience in Cloud Infrastructure, Site Reliability Engineering (SRE), or Platform Engineering.
- Strong expertise in architecting solutions using AWS services. Experience with EC2, Fargate, Kinesis, and others
- Experience with ClickHouse, Amazon Redshift, or similar data warehouse platforms. Knowledge of designing and optimizing data pipelines.
- Strong expertise with Terraform and using infrastructure as code (IaC).
- Experience with CI/CD pipelines (GitHub Actions, Jenkins, GitLab CI/CD, ArgoCD).
- Familiarity with deployment strategies (canary, blue/green)
- Proficiency in monitoring and observability tools (CloudWatch, OpenTelemetry, Prometheus, Datadog).
- Hands-on experience with containerization and orchestration (Docker, Kubernetes).
- Deep understanding of incident management and alerting best practices.
- Strong background in cloud cost optimization.
- Passionate about building massively scalable data platforms.
- Bonus: cross-cloud experience (especially AWS <> Azure), ML & GenAI Ops
- Soft Skills: Excellent problem-solving skills, customer obsession, and the ability to communicate complex technical concepts to both technical and non-technical stakeholders.
Seniority level
Seniority level
Mid-Senior level
Employment type
Job function
Industries
Software Development, Manufacturing, and Retail
Referrals increase your chances of interviewing at High Trail by 2x
Get notified about new Platform Engineer jobs in United States.
United States $170,000.00-$720,000.00 2 weeks ago
Site Reliability Engineer L4, Netflix Technology Services
Software Engineer L4/L5, Model Serving Systems, Machine Learning Platform
United States $100,000.00-$720,000.00 1 week ago
Site Reliability Engineer L5 - Open Connect
United States $100,000.00-$720,000.00 1 week ago
Software Engineer L4/L5, Training Platform, Machine Learning Platform
United States $100,000.00-$720,000.00 5 days ago
Software Engineer (L5) - Open Connect Platform
United States $170,000.00-$720,000.00 2 weeks ago
United States $126,000.00-$135,900.00 22 hours ago
Junior Site Reliability Engineer (Remote)
United States $80,237.00-$139,077.00 19 hours ago
Software Engineer L5 - Data and Feature Infrastructure, Machine Learning Platform
United States $100,000.00-$720,000.00 2 weeks ago
United States $100,000.00-$720,000.00 1 week ago
United States $100,000.00-$620,000.00 9 hours ago
New York, NY $100,000.00-$150,000.00 1 week ago
United States $145,000.00-$160,000.00 2 weeks ago
United States $130,000.00-$140,000.00 1 week ago
Software Engineer (L6) - Cloud Infra Abstractions
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.