Job Search and Career Advice Platform

Enable job alerts via email!

Site Reliability Engineer - AI Application

BYTEPLUS PTE. LTD.

Singapore

On-site

SGD 80,000 - 120,000

Full time

2 days ago
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading tech company in Singapore is looking for a seasoned professional to enhance system reliability and performance in Big data services. Candidates should have a Bachelor's degree in computer-related fields and over five years of relevant experience. The role emphasizes operations, problem-solving, and requires familiarity with programming languages and cloud infrastructure. Strong communication skills in both English and Chinese are a must. Competitive compensation and career growth opportunities are offered.

Benefits

Positive team atmosphere
Career growth opportunity
Paid leave
Meals provided
Competitive compensation

Qualifications

  • More than five years of relevant work experience.
  • Solid foundation in computer software and Linux operating systems.
  • Moderate development capabilities in at least one programming language.

Responsibilities

  • Ensure reliability and operation of core systems for Big data services.
  • Enhance system visibility and monitor performance metrics.
  • Improve services' reliability and performance optimization.

Skills

Problem-solving abilities
Good communication skills
Bilingual communication (Chinese + English)

Education

Bachelor's degree in computer-related fields

Tools

Python
Go
Java
Ansible
Kubernetes
Docker
AWS
Job description
About Us

Founded in 2012, ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including TikTok, Lemon8, CapCut and Pico as well as platforms specific to the China market, including Toutiao, Douyin, and Xigua, ByteDance has made it easier and more fun for people to connect with, consume, and create content.

Why Join ByteDance

Inspiring creativity is at the core of ByteDance's mission. Our innovative products are built to help people authentically express themselves, discover and connect – and our global, diverse teams make that possible. Together, we create value for our communities, inspire creativity and enrich life - a mission we work towards every day.

As ByteDancers, we strive to do great things with great people. We lead with curiosity, humility, and a desire to make impact in a rapidly growing tech company. By constantly iterating and fostering an "Always Day 1" mindset, we achieve meaningful breakthroughs for ourselves, our Company, and our users. When we create and grow together, the possibilities are limitless. Join us.

Diversity & Inclusion

ByteDance is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At ByteDance, our mission is to inspire creativity and enrich life. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.

Job highlights

Positive team atmosphere, Career growth opportunity, Paid leave, Flat organization, Industry experts, Meals provided, Competitive compensation

Responsibilities
  1. Ensure the reliability and normal operation of multiple core systems related to Viking Team's Big data and online services, while focusing on system capacity planning and stability assurance.
  2. Enhance system visibility by monitoring the availability and performance metrics of system components, helping development teams quickly locate faults, and especially ensuring the stable operation of critical links such as AI search/vector databases.
  3. Improve the reliability, scalability, and performance optimization of services to ensure the achievement of the core system SLA.
  4. Participated in the design and implementation of the automation platform, ensuring the rapid iteration and efficient operation and maintenance of large‑scale online Viking clusters and AI search‑related clusters.
  5. Combining with the usage scenarios of AI Search/Viking business, in‑depth optimization of service governance practices, including but not limited to analysis of performance bottlenecks in key AI Search/Viking links, business problem location and troubleshooting, promoting the transformation and upgrading of the system's high‑availability architecture.
Qualifications
  1. Bachelor's degree or above, majoring in computer‑related fields, with more than five years of relevant work experience.
  2. Has a solid foundation in computer software knowledge, and understands the relevant principles of Linux operating systems, storage, network IO, etc.
  3. Familiar with at least one programming language (such as Python/Go/Java/Shell/Ansible), with moderate development capabilities, and placing more emphasis on operations and maintenance practices and problem‑solving abilities.
  4. Has the ability to solve problems systematically, good communication skills, a sense of ownership, and bilingual communication skills (Chinese + English), capable of handling cross‑team bilingual collaboration scenarios.
  5. Understand at least one type of knowledge related to cloud infrastructure such as AWS/Volcano Engine/Alibaba Cloud/GCP; those with experience in computing/distributed systems are preferred (e.g., Nginx/Kubernetes/Docker/OpenStack/Hadoop/Spark/Flink, etc.).
  6. Priority will be given to candidates with algorithmic thinking, good data structure and system design capabilities, and a certain understanding of AI Cloud Native, large model‑related Search Suggestion, and Recommender system.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.