Job Search and Career Advice Platform

Enable job alerts via email!

Service Delivery Lead (6 Month FTC)

Prolific

Greater London

On-site

GBP 60,000 - 80,000

Full time

Yesterday
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading AI innovation firm in Greater London is seeking an experienced Service Reliability Manager to enhance operational effectiveness and manage incident response. Responsibilities include collaborating with teams to ensure smooth operations, leading vendor relationships, and improving service management processes. The ideal candidate will possess strong problem-solving skills, experience in DevOps, and the ability to communicate complex concepts effectively. This full-time role offers a competitive salary and a supportive work culture.

Benefits

Competitive salary
Remote working options
Ethical research opportunities

Qualifications

  • Experience leading incidents from detection to resolution.
  • Experience in a DevOps environment.
  • Experience in continuous-improvement processes.

Responsibilities

  • Build relationships with teams to gain operational insights.
  • Manage third-party vendor onboarding and relationships.
  • Lead incident management and coordinate communication.
  • Monitor and analyze API performance metrics.
  • Ensure compliance with regulatory standards.

Skills

Problem-solving
Critical thinking
DevOps experience
Excellent communication skills
Collaboration

Tools

Datadog
Rootly
Job description

Prolific is not just another player in the AI space we are the architects of the human data infrastructure that’s reshaping the landscape of AI. In a world where foundational AI technologies are increasingly commoditised, it’s the quality and diversity of human‑generated data that truly differentiates products and models.

What you’ll be doing in the role
  • Building relationships with different teams across Prolific from Engineering and Product to Operations and support teams to gain insights into business operations.
  • Supporting the onboarding of new third‑party vendors within Engineering, ensuring appropriate governance and controls are in place.
  • Maintaining and managing existing third‑party relationships, including performance monitoring, contract oversight and cost analysis to ensure value and alignment with business objectives.
  • Overseeing all newly onboarded vendors from a risk and business impact perspective. For vendors identified as critical to the business, taking ownership of developing and maintaining appropriate disaster recovery and continuity plans with business owners.
  • Responsible for the configuration and ongoing maintenance of our incident management tool, including hands‑on configuration, implementing new features and continuously driving innovation to improve outcomes for both the business and customers.
  • Assisting with providing guidance on service‑management capabilities throughout Prolific.
  • Acting as an incident lead within the business, responsible for facilitating incident bridges, driving effective mitigation to reduce customer impact and ensuring clear communication to our customers and stakeholders.
  • Collaborating with Engineering Managers to ensure post‑incident analysis and reviews are undertaken, root causes identified and lessons learned.
  • Helping enable teams to understand their failure modes while understanding impact back to the business and customers based on our incident‑management priority matrix.
  • Overseeing the end‑to‑end maintenance process to ensure upgrades and rollouts are executed smoothly with minimal customer disruption, including assessing risks, implementing migration plans and communicating effectively with customers.
  • Coaching our Engineering Managers in how to organise and run chaos days with our teams so that they can learn how their services are affected during downtime and identify opportunities to build resilience while improving our observability.
  • Taking ownership and driving improvement of key service‑management processes.
  • Providing assistance to engineering teams outside standard working hours through involvement in an on‑call rotation, taking on the role of an incident lead.
  • Managing the on‑call function, including financial calculations, configuring out‑of‑hours tooling and ensuring engineers have the necessary guidance, runbooks and alerts to support effective incident response.
  • Contributing towards designing and implementing service‑management reporting through metrics such as availability, SLAs and SLOs.
  • Acting as the lead for service transitions, overseeing critical changes to the customer journey, delivering pre‑mortems to identify risks and coordinating rollout plans with key stakeholders.
  • Proactively monitoring and identifying under‑performing or unreliable API endpoints, assessing their impact on customers, and developing strategic health‑check reviews with recommendations for platform usage, highlighting under‑performing endpoints to product managers for prioritised investigation and resolution by the relevant teams.
  • Collecting and analysing key metrics on incident‑management performance and engineering development on a weekly basis for the ExCo team.
  • Acting as the Service Reliability representative in the quarterly security forum, summarising incident‑management trends, behaviours and insights to our leadership team.
  • Managing key policies to ensure ongoing compliance with Cyber Essentials, SOC2 and ISO27001 standards, regularly gathering evidence and maintaining audit readiness to meet regulatory requirements.
  • Facilitating weekly refinement sessions with Service Reliability to ensure initiative deadlines are met, acting as a proactive problem‑solver providing clear guidance and removing blockers to keep the team on track and focused.
  • Facilitating planning and prioritising quarterly initiatives in partnership with the Service Reliability Lead, serving as their counterpart to align on goals and execution.
What you’ll bring to the role
  • Problem‑solving and critical‑thinking capabilities to identify and address challenges within Service Reliability.
  • Experience leading incidents from initial detection through to managing communication and improvement actions with individual teams.
  • Experience of working within a DevOps environment.
  • Experience designing and implementing continuous‑improvement activity.
  • Keen interest in operational metrics to help drive improvements within Service Reliability.
  • Ability to collaborate with others and build strong relationships with key stakeholders.
  • Excellent written and verbal communication skills, capable of articulating complex concepts clearly to both technical and non‑technical audiences.
Even better if you have
  • Experience using Datadog for monitoring and observability.
  • Experience using incident‑management tooling, desired preference being Rootly.
Why Prolific is a great place to work

We’ve built a unique platform that connects researchers and companies with a global pool of participants, enabling the collection of high‑quality, ethically sourced human behavioural data and feedback. This data is the cornerstone of developing more accurate, nuanced and aligned AI systems.

We believe that the next leap in AI capabilities won’t come solely from scaling existing models, but from integrating diverse human perspectives and behaviours into AI development. By providing this crucial human‑data infrastructure, Prolific is positioning itself at the forefront of the next wave of AI innovation, reflecting the breadth and best of humanity.

Working for us will place you at the forefront of AI innovation, providing access to our unique human‑data platform and opportunities for groundbreaking research. Join us to enjoy a competitive salary, benefits and remote working within our impactful, mission‑driven culture.

Required Experience: Contract

Key Skills Electrical Engineering, Clinical Research, Corporate Sales, Key Account, AutoCAD Drafting

Employment Type: Full Time

Experience: years

Vacancy: 1

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.