Enable job alerts via email!

Senior Production Operations Engineer

Index Exchange

London

On-site

GBP 60,000 - 100,000

Full time

30+ days ago

Job summary

Join a forward-thinking company at the forefront of ad tech innovation. This role offers an exciting opportunity to work with a diverse team dedicated to maintaining the operational stability of complex global systems. You will leverage your technical expertise in Linux, networking, and cloud infrastructure to ensure high performance and reliability. As part of a collaborative environment, you'll drive continuous improvement and support the company's mission to fund the open web. If you're passionate about technology and eager to make a significant impact, this position is perfect for you.

Benefits

Comprehensive health and life insurance

Paid Time Off

Flexible work schedules

Company contribution to Provident Fund

Stock options plan

Paid Parental Leave

Monthly internet stipend

Quarterly Wellness allowance

Volunteer paid day off

Annual virtual company retreats

Qualifications

6-8 years of experience in DevOps or similar roles.
In-depth knowledge of private-cloud infrastructure and automation frameworks.

Responsibilities

Maintain operational stability of on-premises and cloud environments.
Respond to incidents and optimize system performance.

Skills

Linux Operating Environment

Networking Fundamentals

Cloud Infrastructure Management

Automation Scripting (Go, Python, Bash, Perl)

Incident Response

System Performance Optimization

Communication Skills

Analytical Thinking

Education

Bachelor's Degree in Computer Science or related field

Tools

Ansible

Terraform

Docker

Kubernetes

Prometheus

Grafana

ELK Stack

Kafka

We shaped the earliest forms of ad tech, and we’re looking for the technical expertise to help shape its future. Our customers have unique problems that can only be solved at internet scale, and that’s where the technical skills of our team make a real difference.

Our exchange handles over 350 billion requests every day (for comparison Google serves an estimated 9 billion searches a day), all running in our own global data centers. Every member of our technology team has an enormous amount of autonomy in building and managing our systems to support and enable our growing level of scale. Through the transparency of our technology, dedication to innovation and integrity, and long-standing customer relationships, we lead through change.

What’s it like to work at Index?

We have more than 550 Indexers around the globe dedicated to building a safe and transparent marketplace that provides a trusted experience for consumers.

Index is an exciting and fast-paced place to work. We’re built on our values of change, support, learning and teaching, trust, and intention. We pride ourselves on our independence and openness, not only in our technology, but in our teams, too. Our diverse and inclusive culture celebrates how we can leverage our unique differences to help drive Index forward.

Our culture of success is truly supportive and collaborative. In working together across our teams, we’re continually investing in the people and technology to solve the industry’s most complex problems. As we extend the promise of ad tech to every channel, we’re looking for talented engineers to help advance Index, and the industry, forward.

Are you ready to join the programmatic evolution?

Index Exchange funds the open web. Content and journalism across the internet are funded through advertising, and we are the engine that helps to make that happen transparently, safely and efficiently. Handling hundreds of billions of auctions per day within milliseconds requires an intense understanding of the exchange and the ecosystem that we live in.

Our business is growing significantly every year and is poised to grow even faster. Our people and our platforms are the foundation and enabler of that growth. We are significantly expanding our technology teams, and are looking for technologists with a passion for high performance software development, and a drive to deliver software products and platforms that enable and empower industries at a global scale.

About the Team:

The global Production Operations group is integral to ensuring the operational stability and reliability of our worldwide 24x7 on-premises and cloud environments. As the first line of defense this team has ownership of operations engineering. Collaborating closely with IT, SRE, Network, and Data engineering teams, and key stakeholders across business, product, and software engineering teams. We play a crucial role in maintaining systems health, responding to incidents, and optimizing the performance, efficiency, and stability of complex global systems.

Here's what you'll be doing:

The ideal engineer is someone who possesses a solid understanding of systems, network and hardware fundamentals and can quickly learn and get up to speed on the operations behind complex global systems.

Environment Stewardship

Maintain oversight on internal metrics, including the health, security, and performance of on-premises & hybrid-cloud network and systems infrastructure environments.
Execute timely and effective incident response, identifying and mitigating issues to minimize downtime.
Respond to alerts within our established SLOs and assist in incident triage, ensuring that the right teams are engaged to address issues promptly.
Participate in maintaining system backups, disaster recovery plans, and security protocols are in place and maintained.

Support, Collaboration, and Reporting

Serve as a point-of-contact team for operational issues, providing both internal and external teams with technical support and ensuring the issue remains in custody until resolution.
Collaborate with product and software engineering teams to relay operational insights and requirements.

Automation, Tooling & Research

Continuously identify opportunities for optimization and present findings to technical leads and management.
Research and implement improvements enhancing systems performance and scalability.
Continuously research and embrace technological advancements and industry best practices to deliver exceptional service.
Actively identify and mitigate risks and escalate them so the team can proactively address present or anticipated operational challenges.
Develop, implement, and maintain automation frameworks streamlining operational processes, reducing time spent on manual tasks.
Identify catalysts for future optimization including provisioning techniques, deployment optimization, ancillary services, pipelines, ansible playbooks, power usage, bandwidth etc.

Documentation and Knowledge Sharing

Draft comprehensive documentation for system configurations, processes, and incident resolution procedures.
Participate in knowledge sharing within the team and with support provided about the content and delivery, provide cross-training to other relevant departments.
Create and maintain runbooks and technical documentation, in addition to being familiar with internal and external escalation pathways.

24x7x365

Joining a globally distributed team that maintains coverage 24X7. As a member of this team and broader group, you may be required to occasionally work some weekends, holidays, and after hours to respond to high-urgency or emergency events outside of your local time-zone.

Here's what you need:

Technical Expertise

In-depth understanding of the Linux operating environment: kernel tuning, network stack tuning, system observability & instrumentation, and security & access management.
Solid understanding of layer 2-7 networking fundamentals and the relationship between servers & services, and the transit of their packets through network hardware.
In-depth experience engineering and maintaining a private-cloud infrastructure: Bare-metal, vSphere, KVM, Kubernetes.
Experience with tools like Ansible, Terraform, Docker, Kafka, Nexus.
Experience with observability platforms: InfluxDB, Prometheus, ELK, Jaeger, Grafana, Nagios, Zabbix.
Familiarity with Big Data tools: Hadoop, HDFS, Spark, HBase.
Ability to write code in Go, Python, Bash, or Perl for automation.

Work Experience

6-8 years of proven experience in previous roles or one of the following roles: DevOps Engineer, Linux System Administrator, Site Reliability Engineer (SRE).
Built or maintained a private-cloud infrastructure running CentOS/Rocky Linux on a mix of bare-metal, virtualization, and containerization.
Managed public cloud environments such as AWS, GCP, Azure and their federation into on-premise environments.
Life-cycle management of baremetal servers such as Dell and Supermicro in globally distributed data centers (e.g., break-fix, firmware updates).
Built or maintained on-premise and cloud Kubernetes clusters: Kubeadm, Kind, EKS, GKE.
Built or operated automation & orchestration frameworks for deployment & maintenance pipelines: Kafka, StackStorm, Ansible, ArgoCD, Terraform to push out code or configuration updates, and build new infrastructure systems.

Soft Skills

Communication: Clear and effective communication within and across teams. While we place a huge premium on technical skill, we value your ability to work with others.
Curiosity: Things can (and will) break for different reasons; your curiosity will help drive you to identify and fix issues.
Alertness: We can never predict when things will go wrong, so it is your job to be vigilant and prepared to respond when they do.
Analytical Thinking: Monitor and analyze activity, collaborate with other departments to maintain technical defense.
Reliability: Prioritize the reliability of our systems, ensuring our exchange customers can trust in our services 24x7.
Continuous Improvement: Embrace a culture of continuous learning and innovation.
Customer-Centricity: Committed to providing the best possible experience for our customers.
Accountability: Take ownership of responsibilities and hold ourselves accountable for work quality.

Why You’ll Love Working Here:

Company paid comprehensive health and life insurance plans
Paid Time off and flexible work schedules
Company contribution to Provident Fund
Participation in our company Stock options plan
Company paid Parental Leave
Monthly internet stipend
Quarterly Wellness allowance
Community engagement opportunities and donation-matching program
Volunteer paid day off
Annual virtual company retreats and community-led team events
A workplace that supports a diverse, equitable, and inclusive environment –learn more here

Equal employment opportunity

At Index Exchange, we believe that successful products are built by teams just as diverse as the audience who uses them. We are committed to equal employment opportunities and celebrate diversity of race, color, religion, sex, national origin, sexual orientation, age, disability, gender identity or expression, or veteran status. We welcome individuals with grit, passion, and humility to join us.

Accessibility for applicants with disabilities

We are committed to providing access and reasonable accommodations for applicants with disabilities. Please let us know if you need accommodations.

Index Everywhere, Index Anywhere

Our headquarters are in Toronto, with major offices worldwide. All our technology positions are open to remote and virtual work.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Senior Production Operations Engineer

Index Exchange

London

On-site

GBP 60,000 - 100,000

Full time

Job summary

Benefits

Qualifications

Responsibilities

Skills

Education

Tools

Job description

Similar jobs

Company

Services

Free resources

Support

Senior Production Operations Engineer

Index Exchange

London

On-site

GBP 60,000 - 100,000

Full time

Job summary

Benefits

Qualifications

Responsibilities

Skills

Education

Tools

Job description

Similar jobs

Follow us

Company

Services

Free resources

Support