Enable job alerts via email!

Site Reliability Engineer - Data Management Suite

TikTok Pte. Ltd.

Singapore

On-site

SGD 60,000 - 90,000

Full time

Today
Be an early applicant

Job summary

A leading tech company in Singapore seeks a Software Engineer for the Data Management Suite. The role involves building and optimizing one of the largest data platforms. You'll work on enhancing service stability and designing robust systems. Ideal candidates hold a degree in Computer Science and have experience in Java, Go, or Python. This position offers the chance to impact core products used by millions.

Qualifications

  • Bachelor's degree in Computer Science or related field required.
  • Experience in monitoring and alerting for big data systems.
  • Proficiency in programming languages like Java, Go, or Python.

Responsibilities

  • Ensure production stability for big data development systems.
  • Improve lifecycle of services from design to deployment.
  • Maintain live services by monitoring system health.

Skills

Java
Go
Python
Site Reliability Engineering
Big Data Systems Monitoring

Education

Bachelor's degree in Computer Science or related field

Tools

Hadoop
Kafka
Spark
ClickHouse
Job description
Overview

About the Team The Data Management Suite team is building products that cover the whole lifecycle of data pipeline, including data ingestion and Integration, data development, data catalog, data security and data governance. These products support various businesses, so data engineers and data scientists could greatly boost their productivity.

As a software engineer in the data management suite team, you will have the opportunity to build, optimize and grow one of the largest data platforms in the world. You'll have the opportunity to gain hands-on experience on core systems in the data platform ecosystem. Your work will have a direct and huge impact on the company's core products as well as hundreds of millions of users.

Responsibilities
  • Be responsible for the production stability for big data development and governance systems.
  • Engage in and improve the whole lifecycle of service, from inception and design, through to deployment, operation and refinement.
  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health.

Practice sustainable incident response and blameless postmortems.

  • Establish best engineering practice for engineers as well as non-technical people.
  • Design and implement reliable, scalable, robust and extensible big data systems that support core products and business.
Qualifications

Minimum Qualifications

  • Bachelor's degree in Computer Science, a related technical field involving software or systems engineering, or equivalent practical experience.
  • Experience with site reliability engineering, monitoring, alerting for big data related systems.
  • Experience writing code in Java, Go, Python or a similar language.

Preferred Qualifications

  • Knowledge about a variety of strategies for ingesting, modeling, processing, and persisting data, ETL design, job scheduling and dimensional modeling.
  • Familiarity with running production grade services at scale and understanding cloud native technologies and networking.
  • Experience developing tools and APIs to reduce human interaction with systems and applications using a variety of coding and scripting standards.
  • Expertise in designing, analyzing, and troubleshooting large-scale distributed systems is a plus (Hadoop, M/R, Hive, Spark, Presto, Flume, Kafka, ClickHouse, Flink or comparable solutions).
  • Systematic problem-solving approach, coupled with effective communication skills and a sense of drive.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.