Cloud Engineer - AI ML

apartmentAccenture Malaysia placeKuala Lumpur scheduleFull-time calendar_month 

You will serve as a subject‑matter expert (SME) providing Level‑3 technical support across Google Cloud’s AI/ML portfolio, with emphasis on Vertex AI, GenAI, Conversational AI, and Other AI services. The role centers on rapid, high‑quality incident response, root‑cause diagnosis, and resolution for complex customer cases—while maintaining SLOs, CSAT targets, and rigorous documentation standards across phone, email, and chat channels.

Key Responsibilities
  • Own complex incidents end‑to‑end: triage, reproduce, diagnose, and resolve issues for AI/ML products; maintain transparent customer communication and accurate case records.
  • Response, diagnosis, resolution and tracking by phone, email and chat of customer support queries.
  • Maintain response and resolution speed as defined by SLOs.
  • Keep high customer satisfaction scores and follow quality standards in 90% of cases.
  • Assist and respond to consults from other technical support representatives through existing systems and tools.
  • Use existing troubleshooting tools and techniques to establish root cause for queries and provide a customer facing root cause assessment.
  • Understand business impact of customer issue reports and follow internal issue prioritization guidelines, provide justification on priority for a given single customer report.
  • Perform internal classification queries documenting classes of problems and preventative actions for further retroactive analysis.
  • Reactively (e.g. as a result of a query) file issue reports to Google engineers, collaborate with Google engineers to diagnose customer issues, build documentation, procedures, document desired behavior and/or steps to reproduce, and suggest code-level resolutions for complex product bugs, assist engineers to drive bugs to resolution.
  • Perform community management tasks as needed by the business.
  • Promptly and independently resolve technical incidents and escalations, with effective communication to all stakeholders internally and externally, so that no monitoring is needed by Google engineers.
  • Take cases involving customer-specific requirements on architectural design, provide solutions limited to a particular product (or a subset of product features).
  • Community contributions: solutions posts, FAQs, and guidance on best practices for AI/ML deployments and responsible AI usage.

Product Scope & Typical Case Patterns

Vertex AI
  • Introduction/AutoML: dataset ingestion, labeling, AutoML training failures, metric drift, imbalance handling.
  • Notebooks: environment provisioning, dependency/runtime conflicts, GPU/TPU access, kernel issues.
  • AI Vector Search: index build latency, recall/precision tuning, ANN configuration, embedding mismatches.
  • Pipelines: DAG orchestration failures, component contract issues, artifact lineage, caching.
  • Prediction (Online/Batch): endpoint scaling, model versioning, cold‑start latency, batch job retries.
  • Training: hyperparameter tuning, distributed training, accelerator utilization, checkpointing.
  • Model Registry: version promotion policies, metadata integrity, rollback flows.
  • Managed Datasets: schema evolution, governance, access control.
  • Explainable AI: feature attributions, baselines, compliance requests.
  • Feature Store: ingestion latency, online/offline store consistency, backfills.
GenAI
  • LLMs & GenAI Introduction: prompt engineering pitfalls, safety filters, quota/latency.
  • Vertex AI Gemini: model selection, context window sizing, tool‑use function calling, grounding.
  • Vertex AI Search & Conversation: data connectors, retrieval quality, schema/FAQ ingestion.
  • Discovery AI Retail Search: relevance tuning, synonym/attribute mapping, cold‑start catalogue issues.
  • Vertex Gen AI Studio: prototype to production handoff, evaluation harnesses.
  • Vertex Model Garden: model availability, versioning, licenses, tuning envelopes.
Conversational AI
  • Dialogflow ES/CX: intent/flow design, session state, webhook reliability, NLU regression.
  • CCAI Platform / CCaaS: telephony integration, routing, agent desktop, compliance.
  • CCAI Insights: transcript accuracy, sentiment, redaction, analytics pipelines.
  • Contact Center AI (General): deployment patterns, multichannel orchestration.
  • Speech‑to‑Text / Text‑to‑Speech: language/acoustic models, latency, accuracy, voice settings.
  • Agent Assist: suggestion quality, knowledge base integration, real‑time performance.
Other AI
  • Healthcare Data Engine (HDE): FHIR mapping, interoperability, privacy controls.
  • Document AI: processor selection, field extraction accuracy, batch throughput.
  • Vision API: model outputs, rate limits, edge cases, dataset curation.
check_circleNew offer

Internship Cloud Engineer

apartmentWorldline InternationalplaceKuala Lumpur
Develop and maintain cloud infrastructure using Terraform (modules, state management, CI/CD integration).  •  Write Python scripts for automation, tooling, and small data/ML-related tasks.  •  Assist in building and maintaining CI/CD pipelines (code...
placeKuala Lumpur
closely with engineering and cloud teams to drive “shift-left” security initiatives, ensuring risks are identified and mitigated early in the development process. Key Responsibilities  •  Strengthen security across products and applications by applying...
starFeatured

Cloud Engineer

apartmentSTRATEQplacePetaling Jaya, 11 km from Kuala Lumpur
We are seeking a skilled AWS Cloud Engineer to join our dynamic team in Petaling Jaya, Malaysia. As an AWS Cloud Engineer at Strateq Group, you will play a crucial role in designing, implementing, and managing our AWS cloud infrastructure to ensure...