Enable job alerts via email!

Cloud Engineer

ShyftLabs

Toronto

Hybrid

CAD 100,000 - 150,000

Full time

30+ days ago

Job summary

A leading company in data products is seeking a Senior Cloud Engineer to oversee cloud infrastructure for GenAI applications. This role focuses on building robust platforms, ensuring security, and optimizing performance. They offer a hybrid work model alongside competitive salary and benefits, fostering employee growth.

Benefits

Strong healthcare insurance

Extensive learning and development resources

Qualifications

5+ years AWS experience with expertise in core services.
Experience designing disaster recovery strategies.
Strong skills in Terraform and containerization.

Responsibilities

Design, provision, and maintain cloud resources across AWS.
Implement comprehensive disaster recovery strategies.
Build and maintain a self-service platform for GenAI applications.

Skills

AWS

Terraform

Data governance

Containerization

Monitoring tools

Scripting languages

Disaster Recovery

Event-driven architectures

Education

Bachelor's degree in Computer Science or related field

Tools

AWS services

Databricks

CloudWatch

Kubernetes

Position Overview:

ShyftLabs is seeking a highly skilled Cloud Engineer (Senior, Data Platforms) to join our team and lead the design, implementation, and management of cloud infrastructure for our innovative GenAI applications. This role will be instrumental in building a robust platform that enables rapid experimentation and deployment while maintaining enterprise-grade security and reliability.

ShyftLabs is a growing data product company founded in early 2020 and works primarily with Fortune 500 companies. We deliver digital solutions built to help accelerate the growth of businesses in various industries, by focusing on creating value through innovation.

Job Responsibilities:

Cloud Infrastructure Management

Design, provision, and maintain cloud resources across AWS (primary), with capabilities to work in Azure and Google Cloud environments

Manage end-to-end infrastructure for full-stack GenAI applications including:
Database systems (Aurora, RDS, DynamoDB, DocumentDB, etc.)
Security groups and IAM policies
VPC architecture and network design
Container orchestration (ECS, EKS, Lambda)
Storage solutions (S3, EFS, etc.)
CDN configuration (CloudFront)
DNS management (Route53)
Load balancing and auto-scaling

Data & AI Platforms

Design feature stores, vector stores, data ingestion frameworks, and lakehouse architectures
Manage data governance, lineage, masking, and access controls around data products

Serverless Architecture

Design and implement serverless solutions using AWS Lambda, API Gateway, and EventBridge
Optimize serverless applications for performance, cost, and scalability
Implement event-driven architectures and asynchronous processing patterns
Manage serverless deployment pipelines and monitoring

Disaster Recovery & High Availability

Architect and implement comprehensive disaster recovery strategies
Design multi-region failover capabilities with automated recovery procedures
Implement RTO/RPO requirements through backup strategies and replication
Build auto-failover mechanisms using Route53 health checks and failover routing
Create and maintain disaster recovery runbooks and testing procedures
Ensure data durability through cross-region replication and backup strategies

Platform Development

Build and maintain a self-service platform enabling rapid experimentation and testing of GenAI applications
Implement Infrastructure as Code (IaC) using Terraform for consistent and repeatable deployments
Create streamlined CI/CD pipelines that support local-to-dev-to-prod workflows
Design systems that minimize deployment time and maximize developer productivity
Establish quick feedback loops between development and deployment

Monitoring & Operations

Implement comprehensive monitoring, observability, and alerting solutions
Set up logging aggregation and analysis tools
Ensure high availability and disaster recovery capabilities Optimize cloud costs while maintaining performance
DevOps Excellence
Champion DevOps best practices across the organization
Automate infrastructure provisioning and application deployment
Implement security best practices and compliance requirements
Create documentation and runbooks for operational procedures

Basic Qualifications:

Technical Skills
5+ years of hands-on experience with AWS services
2+ years of hands-on experience with Databricks

Expert-level knowledge of AWS core services (EC2, VPC, IAM, S3, RDS, Lambda, ECS/EKS)
Expert-level knowledge of Databricks capabilities
Familiarity with SageMaker, Bedrock, or Anthropic/Claude API integration
Strong proficiency with Terraform for infrastructure automation
Demonstrated experience with containerization (Docker, Kubernetes)
Solid understanding of networking concepts (subnets, routing, security groups, VPN)
Experience with CI/CD tools (Jenkins, GitLab CI, GitHub Actions, AWS CodePipeline)
Proficiency in scripting languages (Python, Bash, PowerShell)

Serverless & Event-Driven Architecture

Extensive experience with AWS Lambda, API Gateway, ECS, Step Functions
Knowledge of serverless frameworks (SAM, Serverless Framework)
Experience with event-driven patterns using SNS, SQS, EventBridge
Understanding of serverless best practices and optimization techniques

Disaster Recovery & Business Continuity

Proven experience designing and implementing DR strategies in AWS
Expertise in multi-region architectures and data replication
Experience with AWS backup services and cross-region failover
Knowledge of RTO/RPO planning and implementation
Hands-on experience with Route53 health checks and failover routing policies

Cloud Platform Experience

Primary: AWS (extensive experience required)
Secondary: Azure and Google Cloud Platform (working knowledge)
Multi-cloud architecture understanding

Monitoring & Observability

Experience with monitoring tools (CloudWatch, Datadog, Prometheus, Grafana)
Log management systems (ELK stack, Splunk, CloudWatch Logs) APM tools and distributed tracing

Preferred Qualifications

AWS certifications (Solutions Architect, DevOps Engineer)
Databricks Certifications
Experience with open-source LLMs, embedding models, and RAG-based applications
Experience with chaos engineering and resilience testing
Knowledge of security frameworks and compliance (SOC2, HIPAA, PCI)
Experience implementing complex build systems for mono-repo micro-services architectures
Background in building developer platforms or internal tools Experience with Infrastructure as Code testing frameworks

We are proud to offer a competitive salary alongside a strong healthcare insurance and benefits package. The role is preferably hybrid, with 2 days per week spent in the office, and flexibility for client engagement needs. We pride ourselves on the growth of our employees, offering extensive learning and development resources.

ShyftLabs is an equal-opportunity employer committed to creating a safe, diverse and inclusive environment. We encourage qualified applicants of all backgrounds including ethnicity, religion, disability status, gender identity, sexual orientation, family status, age, nationality, and education levels to apply. If you are contacted for an interview and require accommodation during the interviewing process, please let us know.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Cloud Engineer

ShyftLabs

Toronto

Hybrid

CAD 100,000 - 150,000

Full time

Job summary

Benefits

Qualifications

Responsibilities

Skills

Education

Tools

Job description

Similar jobs

Company

Services

Free resources

Support

Cloud Engineer

ShyftLabs

Toronto

Hybrid

CAD 100,000 - 150,000

Full time

Job summary

Benefits

Qualifications

Responsibilities

Skills

Education

Tools

Job description

Similar jobs

Follow us

Company

Services

Free resources

Support