Enable job alerts via email!

Site Reliability Engineer (SRE)

Biggeo

Calgary

On-site

CAD 80,000 - 120,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a pioneering team at a forward-thinking company that is revolutionizing geospatial intelligence. As a Site Reliability Engineer, you will play a crucial role in ensuring the stability and efficiency of our advanced platform. This position offers the opportunity to work in a dynamic, data-driven environment, where your contributions will directly impact innovative solutions that address major global challenges. Embrace a modern work schedule that prioritizes balance and well-being while collaborating with talented professionals to redefine how industries access and interpret geospatial data. If you're passionate about driving technological advancements and making a difference, this role is perfect for you.

Qualifications

5+ years in a DevOps/Site Reliability role with complex project experience.
Proficient in CI/CD pipeline design for large-scale systems.
Strong leadership skills to guide teams towards common goals.

Responsibilities

Implement and manage CI/CD pipelines for software updates.
Automate infrastructure provisioning and configuration.
Ensure reliability and availability of services through proactive measures.

Skills

CI/CD pipelines

Containerization

Infrastructure as Code

Cloud technologies

Problem-solving

Scripting (Python, Bash)

Root cause analysis

Monitoring and alerting

Education

Bachelor's degree in Computer Science

Technical degree in related field

Tools

Docker

Kubernetes

Terraform

Ansible

Helm Charts

Azure

AWS

GCP

Prometheus

Grafana

Employers often ask why you'd be a good fit to work for them. At BigGeo, we prefer to start by showing why we’re a good fit for you.

Why You’d Want to Work at BigGeo:

Be part of a pioneering team driving the future of geospatial intelligence.
Work in an innovative, data-driven environment that values creativity and rapid problem-solving.
Experience firsthand how your contributions shape cutting-edge technologies and serve critical industries globally.
Embrace a modern “self-care” work schedule that emphasizes balance and well-being.
Shape products that solve major global challenges, from urban planning to environmental conservation.

About BigGeo:

BigGeo is at the forefront of geospatial data intelligence, creating transformative solutions that turn location-based data into actionable insights across industries. Our advanced platform brings geospatial analysis, real-time data processing, and 3D visualization to life, empowering industries to unlock deeper insights and make informed decisions.

Our company has assembled a dynamic, forward-thinking team across all pillars of commercial and technology, united by the mission to redefine how people access and interpret their geospatial data. We make it possible for individuals and businesses alike to unlock the full potential of their data—enabling them to extract valuable insights from massive datasets. With a work environment that thrives on cutting-edge innovation, BigGeo isn’t just a tech company; we’re revolutionizing how the world understands and interacts with data.

Role Overview:

We are seeking a skilled Site Reliability Engineer (SRE) to join our team, focused on ensuring the stability, scalability, and efficiency of our platform. The ideal candidate will have a deep understanding of CI/CD pipelines, containerization, infrastructure as code, and cloud technologies. This role is essential in automating our infrastructure, maintaining high availability, and enabling fast, safe, and reliable software delivery.

Key Responsibilities:

Implementing and managing CI/CD pipelines for automating the build, test, and deployment processes, ensuring safe and efficient release of software updates.
Collaborating with development teams to ensure smooth integration of code changes into the pipeline.
Implementing and maintaining infrastructure as code (IaC) practices using tools like Helm Charts, Terraform or Ansible to manage infrastructure changes.
Automating infrastructure provisioning and configuration to support scalability and reliability.
Building and maintaining Docker containers for applications and services.
Orchestrating container deployments using Kubernetes and Docker Compose, and other relevant technologies.
Ensuring the reliability and availability of services by proactively identifying and mitigating potential issues and responding to incidents.
Participating in on-call rotations to respond to critical incidents and minimize downtime.
Conducting post-incident reviews to identify root causes and prevent future occurrences.
Developing and testing disaster recovery plans and procedures to minimize data loss and downtime in case of failures.
Maintaining documentation for infrastructure, processes, and procedures to facilitate knowledge sharing and team collaboration.
Taking ownership of complex technical issues and coordinating resolutions across teams.
Defining and enforcing best practices in areas such as performance, and reliability.

KEY Requirements

Bachelor's or technical degree in computer science or related field.
Extensive experience (5+ years) in a DevOps/Site Reliability role, demonstrating a track record of successfully leading and implementing complex projects.
In-depth knowledge of advanced DevOps/SRE concepts, methodologies, and best practices.
Proven experience in designing and implementing CI/CD pipelines for large-scale, distributed systems.
Strong leadership skills with the ability to influence and guide team members towards achieving common goals.
Experience with architectural design and planning for highly available and scalable systems.
Proficient in conducting root cause analysis and implementing preventive measures.
Advanced knowledge of cloud computing platforms (e.g. Azure, GCP, AWS) and container orchestration technologies (e.g. Docker, Docker Compose, Kubernetes).
Strong programming and scripting skills (e.g. Typescript, Rust, Python, Bash).
Experience with database administration (e.g. MySQL, PostgreSQL).
Proficient in infrastructure as code (IaC) tools and practices (e.g. Helm Charts, Terraform, Ansible).
Expertise with monitoring and alerting tools (e.g. Jaeger, Loki, Open Telemetry, Prometheus, Grafana).
Excellent problem-solving and troubleshooting skills.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs