Enable job alerts via email!

Site Reliability Engineer - Video Infrastructure

TikTok Pte. Ltd.

Singapore

On-site

USD 60,000 - 100,000

Full time

Yesterday
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative firm is seeking a skilled Site Reliability Engineer to join their Video Cloud Infra team. This role involves building and optimizing a global multimedia transmission network to enhance efficiency and reduce costs. You will engage in system management, monitoring, and capacity planning while building tools and automations to streamline operations. The ideal candidate will have a strong technical background in software engineering, extensive knowledge of SRE responsibilities, and hands-on experience with cloud services. If you are passionate about technology and thrive in a collaborative environment, this opportunity is perfect for you.

Qualifications

  • Extensive knowledge of SRE responsibilities and large scale distributed systems.
  • Good programming experience in C, C++, Java, Python, or Go.

Responsibilities

  • Build global infrastructure for multimedia transport and storage.
  • Engage in global production system management and optimization.

Skills

Monitoring
Incident Handling
Capacity Management
Disaster Recovery
Microservice Architecture
Troubleshooting
Teamwork

Education

Bachelor's degree in Computer Science
Equivalent working experience

Tools

Linux
MySQL
MongoDB
Redis
ELK
AWS
Google Cloud
Azure

Job description

Responsibilities

Team Introduction: The Video Cloud Infra team, facing business experience and cost, builds a competitive video transmission network and multimedia processing platform, data foundation and analysis capabilities, drives product refined operation, reduces costs and increases efficiency. Responsibilities:

  • Build global infrastructure for multi-media transport, storage and process, to serve billions of users all over the world.
  • Engage in global production system management such as monitoring, emergency response, capacity planning and optimization.
  • Build tools, automations, visualizations and monitors to facilitate the operation and optimization of the global infrastructure.
  • Engage in and improve the whole service lifecycle, from inception and design, through deployment, operation and refinement.
  • Scale up systems sustainably through mechanisms like automation, and initiate changes that improve system reliability and processing speed.
Qualifications

Minimum Qualifications:

  • Bachelor's degree in Computer Science or a related technical background involving software/system engineering, or equivalent working experience.
  • Extensive knowledge of SRE responsibilities, such as monitoring, incident handling, capacity management and disaster recovery.
  • Extensive knowledge of networking, operation system, database system and container technology.
  • Good understanding of every aspect of microservice architecture, and hands on experience in troubleshooting in large scale distributed systems.

Preferred Qualifications:

  • Good programming experience with at least one of the following languages: C, C++, Java, Python, or

Go.

  • Hands on experience in common open-source systems such as Linux, MySQL, MongoDB, Redis and ELK and experience in building solutions with AWS,Google Cloud, Azures and other cloud services is a plus.
  • Passionate, self-motivated, strong ownership and good teamwork skills.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.