AI Support Specialist (MLE Focus)
Ho Chi Minh, Dong Nam Bo, Viet Nam
Our story might surprise you. We’re the world’s largest restaurant company—encompassing KFC, Pizza Hut, Taco Bell and Habit Burger & Grill —but there’s a lot more going on behind the scenes than just frying chicken, baking pizzas, and serving up tacos. We put this delicious food in the hands of customers through apps, websites, kiosks, POS, and other digital dining experiences.
JOB SUMMARY
We are looking for a skilled AI Support Specialist (MLE Focus) to join our 24/7 operations team, focusing on maintaining and optimizing machine learning pipelines, infrastructure, and deployments. This role involves troubleshooting model deployment issues, ensuring system scalability, and working closely with MLEs to resolve infrastructure-related challenges.
KEY RESPONSIBILITIES:
Operational Support:
• Monitor machine learning pipelines, APIs, and deployment environments for errors and performance degradation.
• Troubleshoot issues related to model inference, deployment failures, or infrastructure bottlenecks.
• Perform root cause analysis for system incidents, documenting findings and implementing preventive measures.
• Support CI/CD workflows for model updates and pipeline changes.
Incident Management:
• Act as the first responder for MLE-related incidents detected via monitoring tools or reported by users.
• Escalate unresolved issues to MLEs or engineering teams and follow through to resolution.
• Track incident metrics (e.g., mean time to resolution) and provide insights for operational improvement.
Collaboration:
• Partner with MLEs to support the deployment of new models and infrastructure changes.
• Collaborate with Data Scientists and AI Engineers to ensure seamless handoffs and alignment on system requirements.
Continuous Improvement:
• Contribute to the development of operational playbooks for model deployments and infrastructure support.
• Identify opportunities to automate repetitive tasks, such as scaling model endpoints or managing resource utilization.
QUALIFICATIONS:
• Bachelor’s degree in Computer Science, Engineering, or a related field.
• 1-2 years of experience in MLE, DevOps, or AI system operations roles.
• Strong knowledge of cloud platforms (AWS, GCP, or Azure) and container orchestration tools (e.g., Kubernetes, Docker).
• Familiarity with MLOps tools and frameworks (e.g., MLflow, Kubeflow, SageMaker).
• Experience with CI/CD pipelines and infrastructure as code (e.g., Terraform, CloudFormation).
• Excellent problem-solving skills and ability to work in a fast-paced, 24/7 support environment.
Preferred Qualifications:
• Experience with monitoring and alerting tools (e.g., Prometheus, Grafana, Datadog).
• Knowledge of scripting and automation with Python or Bash.
• Understanding of ML model lifecycle management and production best practices.
Location
HCM, Tan Binh district (Hybrid)
________________________________________
Department
Digital & Technology
________________________________________
Employment Type
Employee (Full-Time)
________________________________________
Minimum Experience
Experienced
Dragontail Systems is the leading B2B company of revolutionary optimization software for the food and delivery industry with a global presence.
We are proud to be part of Yum! Brands, a company with over 55,000 restaurants in more than 150 countries and territories primarily operating the company’s restaurant brands – KFC, Pizza Hut, Taco Bell and The Habit Burger Grill.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: APIs AWS Azure CI/CD CloudFormation Computer Science DevOps Docker Engineering GCP Grafana Kubeflow Kubernetes Machine Learning MLFlow MLOps Model deployment Model inference Pipelines Python SageMaker Terraform
Perks/benefits: Career development
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.