Senior Site Reliability Engineer

apartmentAirAsia X placeKuala Lumpur scheduleFull-time calendar_month 

Job Description

Location: Kuala Lumpur

About AirAsia MOVE

AirAsia MOVE is a leading ASEAN-focused budget travel OTA, part of the Capital A Group. We deliver customer-centric travel solutions by combining innovation with operational excellence. Our goal is to create seamless, reliable, and delightful journeys for travelers across the region.

About the Role

We’re looking for a Senior Site Reliability Engineer to help scale and stabilize our cloud infrastructure and reliability practices as we grow across multiple lines of business.

You’ll lead key initiatives around:

  • Cloud architecture modernization.
  • Multi-region reliability.
  • Observability and incident response.
  • Reducing toil through automation and self-service.

This is a hands-on technical role, where you’ll work across platforms, SRE, and application teams to build scalable systems that are resilient, cost-aware, and developer-friendly.

What You’ll Do
  • Design and implement secure, scalable infrastructure on Google Cloud Platform (GCP).
  • Lead efforts to build and evolve MOVE’s GCP Landing Zone, including Shared VPC, org structure, IAM, and policy guardrails
  • Build and improve multi-region architectures for high availability and disaster recovery.
  • Drive infrastructure automation using Terraform, CI/CD, and GitOps practices.
  • Improve observability across teams by standardizing monitoring, tracing, and alerting.
  • Collaborate on incident response and postmortems to reduce MTTR and build resilience.
  • Enforce tagging, FinOps controls, and security policies across GCP projects.
  • Contribute to platform engineering initiatives and developer self-service tools.
What We’re Looking For
  • 5+ years in SRE, DevOps, or cloud infrastructure roles.
  • Solid experience with GCP, Terraform, Kubernetes (GKE), or similar cloud providers.
  • Strong hands-on experience in automation and multi-region architecture design.
  • Experience in networking (VPCs, NAT, PSC), IAM, and cloud-native security.
  • Proven ability to debug and support production systems under pressure.
  • Familiarity with monitoring and tracing tools like Cloud Monitoring, OpenTelemetry, Signoz.
  • Exposure to using AI/anomaly detection for alert tuning or reliability insights.
  • Clear communicator who works well with developers, product, and other infra teams.

We are all different - one talent to another - that is how we rely on our differences. At AirAsia, you will be treated fairly and given all chances to be your best.We are committed to creating a diverse work environment and are proud to be an equal opportunity employer.

Search Firm Representatives - AirAsia does not accept unsolicited assistance from search firms for employment opportunities. All CVs / resumes submitted by search firms to any employee at our company without a valid written search agreement in place will be deemed the sole property of our company.

No fee will be paid in the event a candidate is hired by our company as a result of an agency referral where no pre-existing agreement is in place.

apartmentHiredly XplaceAmpang Jaya, 8 km from Kuala Lumpur
The Site Reliability Engineer (SRE) ensures the reliability and performance of critical services, bridging development and operations. The role focuses on scalable infrastructure, SRE practices such as SLOs and SLIs, and reducing operational toil...
apartmentExxonmobil Business Support Centre MalaysiaplaceKuala Lumpur
center in Kuala Lumpur that provides high-level information technology and engineering expertise to ExxonMobil’s upstream, downstream and chemical businesses worldwide. What role you will play in the team  •  As an experienced Reliability Engineer, you...
apartmentDeloitte ConsultingplaceKuala Lumpur
are celebrated, and everyone is recognised for their contributions. Ready to unleash your potential with us? Join the winning team now! Work you’ll do: As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability...