Enable job alerts via email!

Observability Site Reliability Engineer

DRW Holdings, LLC.

London

On-site

GBP 50,000 - 90,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is seeking a dedicated professional to join their Observability team. This role involves providing critical support for logging, metrics, and tracing tools, ensuring their effective deployment and availability. You will tackle production system incidents, automate administrative tasks, and collaborate with various teams to enhance observability practices. If you thrive in a fast-paced environment and have a passion for technology, this opportunity offers the chance to grow your skills while contributing to innovative solutions that drive the firm's success.

Qualifications

5+ years of experience with logging and monitoring tools.
Coding experience to automate repetitive tasks is essential.

Responsibilities

Provide support for applications and troubleshoot incidents.
Develop automation for administrative tasks and improve workflows.

Skills

Logging and Monitoring Tools

Automation Coding

CI/CD Systems

Troubleshooting

Communication Skills

Teamwork

Tools

Splunk

Grafana

Prometheus

Kubernetes

Jsonnet

Git

DRW is a diversified trading firm with over 3 decades of experience bringing sophisticated technology and exceptional people together to operate in markets around the world. We value autonomy and the ability to quickly pivot to capture opportunities, so we operate using our own capital and trading at our own risk.

Headquartered in Chicago with offices throughout the U.S., Canada, Europe, and Asia, we trade a variety of asset classes including Fixed Income, ETFs, Equities, FX, Commodities and Energy across all major global markets. We have also leveraged our expertise and technology to expand into three non-traditional strategies: real estate, venture capital and cryptoassets.

We operate with respect, curiosity and open minds. The people who thrive here share our belief that it’s not just what we do that matters–it's how we do it. DRW is a place of high expectations, integrity, innovation and a willingness to challenge consensus.

Our Observability team provides mission critical support for many of our centralized logging, metrics and tracing tools used throughout the firm. They manage the deployment and administration of these applications ensuring multi-tenant and highly available operation. In addition, they help interface with other teams to effectively use these tools to get the most out of the data produced. It's a fast-paced, dynamic environment that provides new technical challenges constantly and demands that you learn new things daily.

What you will do in this role:

Provide best in class support for our suite of applications
Troubleshoot production system incidents and create artifacts for postmortems to ensure that similar failures in the future are avoided
Develop automation to facilitate administrative tasks supporting the onboarding and maintenance various users and groups
Test and automate upgrades of our applications to remain on our vendor's latest releases
Constantly be improving our own logging, monitoring and alerting practices
Interact with vendor support to debug and drive third-party issues to resolution
Interface with other teams to be an ambassador of good observability practices
Help teams identify data to ingest and how to make use of this data through dashboards and alerting

Required Experience:

5+ years of industry experience using various logging and monitoring tools
Coding experience to automate repetitive tasks
Familiarity with CI/CD systems and workflows
Familiarity with git or other version control systems
Persistent drive to improve workflows and make things better
Ability to troubleshoot complex problems
Solid written and verbal communication skills
Ability to work well on a team as well as independently

What will make you stand out:

Experience using Splunk, Grafana, Prometheus and other observability tools
Experience using kubernetes to deploy and maintain systems
Experience using Jsonnet or other templating tools to render complex yaml/json
Familiarity with gitops workflows
Solid configuration management concepts and skills

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Site Reliability Engineer

Bentley Whitaker Search and Selection

London

Remote

GBP 55,000 - 70,000

6 days ago

Be an early applicant

Remote Site Reliability Engineer

TN United Kingdom

London

Remote

GBP 60,000 - 100,000

13 days ago

Site Reliability Engineer – FinTech / Global Payments – London HQ / Remote First

Future Talent Group

Greater London

Remote

GBP 50,000 - 90,000

13 days ago

Site Reliability Engineer

ZipRecruiter

Chelmsford

Remote

GBP 60,000 - 100,000

5 days ago

Be an early applicant

Site Reliability Engineer – FinTech / Global Payments – London HQ / Remote First

JR United Kingdom

London

Remote

GBP 60,000 - 95,000

11 days ago

Site Reliability Engineer

Eligo Recruitment

Greater London

Remote

GBP 80,000 - 95,000

9 days ago

Site Reliability Engineer

Orgvue Limited

London

Hybrid

GBP 70,000 - 110,000

Yesterday

Be an early applicant

Site Reliability Engineer

JR United Kingdom

Remote

GBP 50,000 - 90,000

5 days ago

Be an early applicant

Site Reliability Engineer (SRE) - Consultant - Digital Factory - London

ZipRecruiter

London

On-site

GBP 60,000 - 100,000

5 days ago

Be an early applicant

Observability Site Reliability Engineer

DRW Holdings, LLC.

London

On-site

GBP 50,000 - 90,000

Full time

Job summary

Qualifications

Responsibilities

Skills

Tools

Job description

Similar jobs

Site Reliability Engineer

London

Remote

GBP 55,000 - 70,000

Remote Site Reliability Engineer

London

Remote

GBP 60,000 - 100,000

Site Reliability Engineer – FinTech / Global Payments – London HQ / Remote First

Greater London

Remote

GBP 50,000 - 90,000

Site Reliability Engineer

Chelmsford

Remote

GBP 60,000 - 100,000

Site Reliability Engineer – FinTech / Global Payments – London HQ / Remote First

London

Remote

GBP 60,000 - 95,000

Site Reliability Engineer

Greater London

Remote

GBP 80,000 - 95,000

Site Reliability Engineer

London

Hybrid

GBP 70,000 - 110,000

Site Reliability Engineer

Remote

GBP 50,000 - 90,000

Site Reliability Engineer (SRE) - Consultant - Digital Factory - London

London

On-site

GBP 60,000 - 100,000