Enable job alerts via email!

Software Consultant (Reliability engineering, Batch operations) - Contract

QUESSCORP HOLDINGS PTE. LTD.

Singapore

On-site

SGD 60,000 - 80,000

Full time

6 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company in Singapore is seeking a Site Reliability Engineer to implement SRE principles for operational support. The role involves monitoring application health, managing incidents, and ensuring efficient support processes. Candidates should have a diploma in a related field and experience in application operations.

Qualifications

  • At least 3 years of experience supporting application operations.
  • Familiarity with platforms such as Windows, Linux, Unix, and Cloud.

Responsibilities

  • Provide Level 1 operation support for in-scope systems.
  • Monitor application health using runbooks and respond to system alerts.
  • Manage incident tickets and ensure high-quality support activities.

Skills

Application Monitoring
Incident Response
Communication

Education

Diploma in Computer Engineering/Science

Tools

Windows
Linux
Unix
AWS

Job description

Job Description

Apply Site Reliability Engineering (SRE) principles to implement continuous operational support for business processes.

Adopting SRE includes focusing on rapid incident detection and resolution, enhancing operational availability and sustainability, and continuously improving operational efficiency.

You will join the application support team to provide application operational support, aiming to minimize business disruptions.

Effective communication with internal and external partners is essential.

Work involves shift arrangements, which will be reviewed and adjusted based on business needs.

Primary Responsibilities
  1. Provide Level 1 operation support for in-scope systems (70% to 90%)
  2. Support shift work (support hours from 5am to 9pm on weekdays, 7am to 1pm on Saturday, and 7am to 9am on Sunday & public holidays; subject to review)
  3. Monitor batch operation jobs for scheduled completion
  4. Monitor application health using runbooks
  5. Respond to system alerts following SOPs, resolving or escalating as necessary
  6. Recover jobs or application services according to runbooks
  7. Ensure high-quality activities within support and incident processes, meeting service levels and KPIs
  8. Manage incident tickets, escalate to Level 2 support, and follow up for timely resolution
  9. Report on application health status and trends
  10. Follow onboarding procedures for new systems, including creating runbooks for system health checks and recovery SOPs
  11. Maintain and update runbooks
Continuous Improvement
  1. Update application SOPs for efficient support
  2. Revise existing L1 SOPs
  3. Automate SOP activities and monitoring processes
  4. Create or enhance monitoring dashboards as needed
Production Support

Support is 24x7, with ongoing reviews to meet business needs. After-hours support may include:

  • Immediate incident response
  • Support during project cutovers, including health checks
  • Support for critical service requests upon manager approval
Requirements
  • Diploma in Computer Engineering/Science, IT, Electronics, Electrical, or related fields
  • At least 3 years of experience supporting application operations
  • Experience with application monitoring
  • Batch operation support experience
  • Familiarity with platforms such as Windows, Linux, Unix, Cloud (AWS), databases, and middleware
  • Knowledge of programming languages like Java or .NET
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.