Enable job alerts via email!

System and Platform Operations Manager

PUBLICIS LIMITED

London

Hybrid

GBP 45,000 - 90,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a forward-thinking company as a System and Platform Operations Manager, where your technical leadership will drive the reliability and stability of production systems. This role offers an exciting opportunity to work in the digital marketing space, managing a team dedicated to providing exceptional customer support and operational excellence. You'll collaborate with various teams to implement proactive solutions and enhance system performance, all while fostering a culture of continuous improvement. If you're passionate about technology and eager to make a significant impact, this position is perfect for you.

Benefits

Competitive Compensation
Great Benefits Package
Career Advancement Opportunities
Hybrid Working Opportunities
Inclusive and Diverse Workforce
Office in Iconic Television Centre

Qualifications

  • 5+ years in Site Reliability focused roles with strong technical skills.
  • Experience with containerization and infrastructure as code.
  • Proficient in scripting and monitoring tools for system reliability.

Responsibilities

  • Manage operations for production systems ensuring reliability and stability.
  • Lead incident management and customer support processes.
  • Collaborate with engineering teams to enhance system performance.

Skills

Site Reliability Engineering
Containerization (Docker, Kubernetes)
Infrastructure as Code (Terraform)
Scripting Languages (Java, Golang, Python, Bash)
Monitoring Tools (DataDog, Prometheus, Grafana)
Database Management (PostgreSQL, Bigtable)
API and Microservices Architecture
ITIL and DevOps Knowledge
Communication Skills
Change Management

Education

Bachelor's Degree or Equivalent
Google Cloud Architect or Engineer Certification

Tools

Terraform
Docker
Kubernetes
DataDog
Prometheus
Grafana

Job description

Overview

How You’ll Make an Impact

A subsidiary of Publicis Groupe, Epsilon is a leading provider of multi-channel marketing services, technologies, and database solutions. We do more than collect and store data, and we might be the most important Internet company you’ve never heard of. Join our team for your chance to work in the digital marketing space and solve meaningful problems on a massive scale—and have fun doing it.

The System and Platform Operations Manager is a technical leadership role that is responsible for the support, reliability and stability of Epsilon Retail Media production systems, environments and offerings. The team owns the reliability vision for the company, driving continuous improvement through a combination of development and operations initiatives as well as process excellence. This position and their team has solid-line responsibility for operations including the deployment, management, monitoring, reporting, troubleshooting, and repair of production systems. Core to the success of the role is to provide a premium customer support experience focused on a “center of excellence” that allows for a full-service delivery support cycle.

This role is responsible for managing the Platform Operation Team centralized within a single geo-region, orchestrating the regional teamwork, serving with both technical and professional support, and championing the company values. The Platform Operations Engineer works closely with the Engineering team to ensure ongoing system stability and supports the Technical Account Managers from an environment's perspective.

The Platform Operations team is responsible for supporting all retailers once they are live. Critically important is how this team collaborates and liaises with other teams such as Customer Support, Technical Account Management, Engineering and Customer Success teams.

What you’ll do:

  • Operational Practices
  • Establish and manage operational practices and ensure we design, implement and operate a support model that is fit for purpose for our future.
  • Implement proactive solutions for incident and problem detection, response and remediation and continuous improvement
  • Owner of the operational integrity of all production environments.
  • Production Monitoring and Operational Reporting
  • Adopt a “Measure Everything” approach to ensure that internal service level objectives and customer service levels agreements are exceeded including executive level reporting on operational health metrics such as SLAs, incident resolution, performance, availability, reliability, capacity etc.
  • Customer Support & Incident Management
  • Own incident management processes and on call response.
  • Take ownership of complex issues related to performance, reliability, and scalability and leading resolution of serious incidents and events including communications with customers and wider stakeholders.
  • Change Management
  • Uphold processes and procedures to manage change across production platforms
  • Provide insight and expertise on how customers will perceive the changes or impacts to customers to drive customer organization change management and communication.
  • Empower the Delivery teams to release new products, features, updates and fixes quickly, while ensuring Platforms remain reliable and stable.
  • System Reliability
  • Work with the wider Engineering, Product, Delivery and Security teams to ensure that appropriate attention is given to production/system reliability.
  • Establish Operational Practices in conjunction with the Product and Engineering teams (e.g. understanding how product feature development could affect the system’s overall reliability and performance).
  • Provide delivery status information on System Reliability initiatives to the IT Leadership Team and additional stakeholders with a focus and ensure proper communication concerning changes to agreed milestones or challenges, risks and blockers that may affect the outcome or agreed completion dates (with proactive suggestions to resolve)
  • IT Service Management
  • Execute Service Management processes including Change, Config, Service Level, Performance, Incident and Problem Management to deliver a high level of support and system availability
  • Leverage industry standards and best practices for improving service levels and performance
  • Uphold Customer Support standards in line with Service Level Agreements
  • Ensure SLAs and KPIs are met to the best of your ability, with particular focus on first level response times, escalation paths and resolution times.
  • Uphold the IT Service and Support workflow - with a particular focus on ensuring best in class customer experience.
  • Deliver support and service solutions for the Group in line with industry best practice
  • Work as a team to ensure all SLAs and practices are well defined, documented and consistently applied/adhered to provide premium customer support services.
  • Organizational Capability
  • Identify the capabilities needed to meet the current and emerging business needs of a significant function.
  • Evaluate current capabilities, identify gaps, and prioritize development activities.
  • Embed personal development and the fulfillment of personal potential in the culture of the organization.
  • Build capabilities elsewhere in the organization through mentoring and other informal methods.
  • Technical Developments, Process Improvement and Simplification
  • Discuss and recommend more complex or innovative technical developments to improve the quality of software and supporting infrastructure to better meet users' needs.
  • As subject matter expert on the team, maintain understanding of current technology, database management, reliability practices, and future trends through ongoing education, conference attendance and industry press.
  • Ensure all processes and procedures are documented for ease of continuous improvement activities
  • Proactively identify new opportunities to drive improvements and simplification of our overall technology solutions.
  • Personal Capability Building
  • Develop own capabilities by participating in assessment and development planning activities as well as formal and informal training and coaching; gain or maintain external professional accreditation where relevant to improve performance and fulfill personal potential. Maintain an in-depth understanding of technology, external regulation, and industry best practices through ongoing education, attending conferences, and reading specialist media.

Who You Are

  • What you’ll bring with you:
  • At least 5 years of experience of hands-on experience in Site Reliability focused positions.
  • Strong knowledge of containerization technologies (Docker, Kubernetes).
  • Experience with infrastructure as code (Terraform).
  • Solid understanding of networking, security, and system architecture.
  • Proficient in scripting languages (Java, Golang, Python, Bash, or similar).
  • Experience with monitoring and observability tools (DataDog, Prometheus, Grafana).
  • Knowledge of database management systems (PostgreSQL, Bigtable).
  • Understanding of API and microservices architecture.
  • Strong people leadership skills with at least a year in leading and driving high-performance technical teams
  • Operations teams within enterprise environments with knowledge of DevOps, ITIL, Cloud Services, IT Infrastructure and Operations supporting and maintaining production and development environments and building cloud services that are secure, reliable, scalable and observable
  • Experience implementing and managing Logging, Monitoring and Alerting frameworks
  • Knowledge and experience of establishing deployment and automation pipelines
  • Expertise with ITSM principles from previous positions held.
  • Have excellent communications and written skills, and must be able to talk about technology intelligently and passionately to all levels of an organization including Developers, Architects and senior management (technical and non-technical)
  • Past establishing support strategies to support SaaS or Cloud based backends with a particular focus on APM deployment (such as Dynatrace or other monitoring tools).
  • Experience with establishing Service Delivery strategies that align to new ways of work methods, including Agile.
  • Understanding of international requirements relating to data/information security.
  • Experience in the design, development and management of commercial technology contracts, technical service level agreements, and KPIs.
  • Experience of establishing and delivering IT support services in a high availability (HA) environment such as 24/7 operations.
  • Why you might stand out from other talent:
  • Google Cloud Architect or Engineer certification preferred.
  • Achieved certificates in relevant Database Managements Systems, referenced programming languages/scripting tools, or similarly related subject matter.
  • Bachelor’s degree or equivalent.

Additional Information

When You Join Us, We’ll Create Something EPIC Together

Epsilon is a global data, technology and services company that powers the marketing and advertising ecosystem. For decades, we’ve provided marketers from the world’s leading brands the data, technology and services they need to engage consumers with 1 View, 1 Vision and 1 Voice. 1 View of their universe of potential buyers. 1 Vision for engaging each individual. And 1 Voice to harmonize engagement across paid, owned and earned channels.

Epsilon’s comprehensive portfolio of capabilities across our suite of digital media, messaging and loyalty solutions bridge the divide between marketing and advertising technology. We process 400+ billion consumer actions each day using advanced AI and hold many patents of proprietary technology, including real-time modeling languages and consumer privacy advancements. Thanks to the work of every employee, Epsilon has been consistently recognized as industry-leading by Forrester, Adweek and the MRC. Epsilon is a global company with more than 9,000 employees around the world.

Epsilon has a core set of 5 values that define our culture and guide us to create value for our clients, our people and consumers. We are seeking candidates that align with our company values, demonstrate them and make them meaningful in their day-to-day work:

  • Act with integrity. We are transparent and have the courage to do the right thing.
  • Work together to win together. We believe collaboration is the catalyst that unlocks our full potential.
  • Innovate with purpose. We shape the market with big ideas that drive big outcomes.
  • Respect all voices. We embrace differences and foster a culture of connection and belonging.
  • Empower with accountability. We trust each other to own and deliver on common goals.

Because You Matter

We know that we have some of the brightest and most talented employees in the world, and we believe in rewarding them accordingly. If you work here, expect competitive compensation, a great benefits package and endless opportunities to advance your career.

We offer hybrid working opportunities, with our office space located in the Iconic Television Centre, White City.

As part of our dedication to enhance our inclusive and diverse workforce , Epsilon is committed to equal access to opportunity for people without regard to race, age, sex, disability, neurodiversity, sexual orientation, gender identity, pregnancy and maternity, marriage and civil partnership or religion or belief. We are committed to providing reasonable adjustments for candidates in our application process.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Global Revenue Operations Manager

Canonical

London

Remote

GBP 80,000 - 120,000

6 days ago
Be an early applicant

Head of Operations - Exec Coaching to Education Leaders. UK Remote

ZipRecruiter

London

Remote

GBP 55,000 - 60,000

11 days ago

Global Alliances Revenue Operations Manager

TN United Kingdom

London

Remote

USD 60,000 - 100,000

23 days ago

Investment Operations Manager

Intermediate Capital Group (ICG)

London

Hybrid

GBP 60,000 - 80,000

6 days ago
Be an early applicant

Senior Pharmacovigilance Study Operations Lead, 9-month FTC (Remote)

Jazz Pharmaceuticals

London

Remote

GBP 50,000 - 80,000

24 days ago

Senior Operations Manager (Hard Services)

Kingston Barnes Ltd

London

On-site

GBP 65,000 - 75,000

6 days ago
Be an early applicant

Finance & Operations Manager | Food Manufacturing Scaleup | West London

JR United Kingdom

London

On-site

GBP 50,000 - 80,000

Today
Be an early applicant

Operations Manager

NHS

Bexleyheath

On-site

GBP 40,000 - 60,000

Yesterday
Be an early applicant

Intellectual Property Operations Manager

Schneider Electric

London

On-site

GBP 70,000 - 90,000

2 days ago
Be an early applicant