Lead Machine Learning Engineer, AI Platform
HCMC, Vietnam
Grab
Grab is Southeast Asia’s leading superapp. It provides everyday services like Deliveries, Mobility, Financial Services, and More.Company Description
About Grab and Our Workplace
Grab is Southeast Asia's leading superapp. From getting your favourite meals delivered to helping you manage your finances and getting around town hassle-free, we've got your back with everything. In Grab, purpose gives us joy and habits build excellence, while harnessing the power of Technology and AI to deliver the mission of driving Southeast Asia forward by economically empowering everyone, with heart, hunger, honour, and humility.
Job Description
Get to Know the Team
The AI Platform team empowers Grab teams to harness advanced AI efficiently and effectively. By developing cutting-edge tools and infrastructure, the team democratises AI capabilities, fosters innovation, and scales AI-powered solutions to elevate Grab's products and services.
Get to Know the Role
As a Lead Machine Learning Engineer focusing on Testing & Data Serving, you will report to the Head of Engineering, Machine Learning and Experimentation Platforms and work onsite at Grab's CMC Creative Space office, Vietnam. You will lead initiatives that strengthen our evaluation rigor and build reliable, low-latency data-serving layers that power AI systems across Grab.
The Critical Tasks You Will Perform
- Advance end-to-end evaluation: Design and own automated testing frameworks (offline unit tests, simulation, canary releases, shadow deployments) to ensure every model and pipeline meets strict accuracy, robustness, and latency targets before and after launch.
- Build scalable data-serving infrastructure: Architect and operate feature/embedding stores, real-time retrieval APIs, and batch pipelines that deliver high-quality data to models with sub-second latency and rock-solid SLAs.
- Enforce data quality & validation: Implement continuous data profiling, drift detection, and schema enforcement so that upstream data issues are surfaced early and resolved quickly.
- Integrate with experimentation platforms: Tighten the feedback loop between experiments and production by connecting online metrics, offline benchmarks, and A/B analysis into a unified reporting dashboard.
- Apply cutting-edge research pragmatically: Translate advances in test-time evaluation, data versioning, and vector retrieval into production-ready systems that generate measurable business value.
- Align AI initiatives with strategy: Partner with leadership and product teams to identify high-impact use-cases for robust testing and fast data access, ensuring our platform accelerates Grab's business goals.
- Mentor and elevate the team: Guide engineers and researchers in best practices for evaluation, data engineering, and distributed systems; foster a culture of craftsmanship and continuous improvement.
Qualifications
What Essential Skills You Will Need
- 8+ years building and operating large-scale ML systems focused on testing, evaluation, and data pipelines
- Proficiency with modern ML & data frameworks: PyTorch/TensorFlow, Spark/Ray/Flink, Kafka/Kinesis
- Designed automated test harnesses—offline unit/integration tests, simulations, canary & shadow deployments, online A/B analysis
- Expertise in feature stores, embedding/vector stores, and low-latency retrieval services that meet strict SLAs
- Strong data-quality management: profiling, drift & anomaly detection, schema enforcement, versioning
- Ability to optimise inference stacks (TorchServe, TensorFlow Serving, Ray Serve, Triton, vLLM) for throughput, latency, cost, observability
- Track record translating research in evaluation, retrieval, and data serving into production value
- Fluency in Python and cloud-native engineering (Kubernetes, IaC, CI/CD)
- Experience delivering AI solutions in user-facing products with high operational standards
- Proven mentorship and leadership fostering best practices in testing, data engineering, and reliability
Additional Information
Life at Grab
We care about your well-being at Grab, here are some of the global benefits we offer:
- We have your back with Term Life Insurance and comprehensive Medical Insurance.
- With GrabFlex, create a benefits package that suits your needs and aspirations.
- Celebrate moments that matter in life with loved ones through Parental and Birthday leave, and give back to your communities through Love-all-Serve-all (LASA) volunteering leave
- We have a confidential Grabber Assistance Programme to guide and uplift you and your loved ones through life's challenges.
- Balancing personal commitments and life's demands are made easier with our FlexWork arrangements such as differentiated hours
What We Stand For at Grab
We are committed to building an inclusive and equitable workplace that enables diverse Grabbers to grow and perform at their best. As an equal opportunity employer, we consider all candidates fairly and equally regardless of nationality, ethnicity, religion, age, gender identity, sexual orientation, family commitments, physical and mental impairments or disabilities, and other attributes that make them unique.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: APIs CI/CD Data pipelines Data quality Distributed Systems Engineering Flink Kafka Kinesis Kubernetes Machine Learning Pipelines Python PyTorch Research Spark TensorFlow Testing vLLM
Perks/benefits: Career development Medical leave Parental leave
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.