Site Reliability Engineer (Mandarin Speaker!) - Kajang
Hiredly X Kajang Full-time
The Site Reliability Engineer (SRE) ensures the reliability and performance of critical services, bridging development and operations. The role focuses on scalable infrastructure, SRE practices such as SLOs and SLIs, and reducing operational toil.
Collaboration with teams to improve reliability and foster a continuous learning culture is key.
- Design and implement resilient system architectures for high availability and scalability.
- Develop automation tools and scripts to improve operational efficiency.
- Define, track, and analyze SLOs and SLIs for performance and reliability.
- Conduct post-mortem analyses and implement improvements based on findings.
- Collaborate on best practices for system reliability and incident management.
- Troubleshoot and resolve database, network, and deployment issues.
- Ensure issue resolution meets Service Level Agreements (SLAs).
- Identify and address system performance bottlenecks with actionable recommendations.
- Maintain documentation for processes and incident responses.
- Proficiency in programming languages like Python, Golang, or Java.
- Experience in system architecture with a focus on reliability and scalability.
- Strong understanding of SRE principles (SLOs, SLIs, toil reduction).
- Experience with cloud environments (AWS, Azure, Google Cloud).
- Expertise in Linux system administration.
- Problem-solving skills with a proactive approach to operational challenges.
- Ability to work independently and collaborate in a team environment.
- Able to speak, read, and write in Mandarin to support communication and collaboration across teams.
Preferred skills:
- Familiarity with monitoring tools and performance optimisation.
- Experience with system administration automation and scripting.
- Knowledge of networking concepts and troubleshooting.
- Hands-on experience with cloud platforms and services.
- Familiarity with DevOps practices (CI/CD, infrastructure as code, containerisation).
Hiredly XAmpang Jaya, 18 km from Kajang
The Site Reliability Engineer (SRE) ensures the reliability and performance of critical services, bridging development and operations. The role focuses on scalable infrastructure, SRE practices such as SLOs and SLIs, and reducing operational toil...
AirAsia XKuala Lumpur, 20 km from Kajang
goal is to create seamless, reliable, and delightful journeys for travelers across the region.
About the Role
We’re looking for a Senior Site Reliability Engineer to help scale and stabilize our cloud infrastructure and reliability practices as we...
Exxonmobil Business Support Centre MalaysiaKuala Lumpur, 20 km from Kajang
center in Kuala Lumpur that provides high-level information technology and engineering expertise to ExxonMobil’s upstream, downstream and chemical businesses worldwide.
What role you will play in the team
• As an experienced Reliability Engineer, you...