AI Operations Engineer Vertex AI
Cluj, RO
NTT DATA Romania
Who we are
We are building a state-of-the-art AI team to drive business innovation. Our team is structured for success: AI Solution Engineers partner with the business to design the "what" and the "why," and as our AI Ops Engineer, you will be the expert who masters the "how."
What you'll be doing
- Take the solutions and model architectures designed by the AI Solution Engineer and lead their hands-on implementation
- You will partner directly with non-technical business stakeholders to deeply understand the core processes that generate our data
- You’ll act as a BI expert, collaborating on the definition and creation of robust data models (facts, dimensions) that serve as the foundation for our AI systems
- Write clean, production-grade Python code to build, train, and fine-tune models, translating business logic and domain knowledge into powerful, predictive features
- Work in a tight feedback loop with business users to tune and refine model outputs
- You will be responsible for ensuring the model’s results are not only statistically accurate but also intuitive and actionable for the people using them
- This is your primary domain of ownership
- You will design, build, and maintain our entire MLOps infrastructure using GCP and Vertex AI
- Establish and manage the CI/CD pipelines for our AI models, automating the end-to-end process from code commit to production deployment
- Use tools like Airflow to orchestrate complex model retraining jobs, data refresh cycles, and other essential workflows
- Own the deployment of models as scalable, low-latency API endpoints. You are responsible for monitoring their health, performance, and cost, making continuous optimizations
- As a foundational member of the team, you will establish and enforce best practices for code quality, model versioning, monitoring, and alerting
What You'll bring along
- University Degree in Computer Science, or a related field
- Minimum of 3-5 years of experience in a similar role
- You have a genuine curiosity for how the business works and an ability to understand the "why" behind the data
- You have proven experience acting as a bridge between technical teams and business departments, skilled at active listening, asking the right questions, and building trust with non-technical colleagues
- Demonstrable experience in a hands-on role like ML Engineer or AI Engineer where you were responsible for building and shipping AI production systems
- High proficiency in Python, writing clean, maintainable, and testable code for both data processing (e.g., Pandas) and ML development (e.g., Scikit-learn, TensorFlow/PyTorch)
- The ability to write complex, efficient SQL queries to work with large datasets. Experience with Google BigQuery is a major asset
- Hands-on experience deploying and managing applications or services on a major cloud platform (GCP is strongly preferred; significant AWS/Azure experience is also relevant)
- You are passionate about writing code and building systems. You have a pragmatic approach to engineering and a high bar for quality
- Proven experience using the GCP Vertex AI suite (especially Pipelines, Endpoints, Training, and Model Registry) to build and operate ML systems
- Direct experience building CI/CD pipelines for ML (e.g., using GitHub Actions, Jenkins) and orchestrating workflows with Apache Airflow
- Practical experience with tools like Terraform for managing cloud infrastructure
- A strong working knowledge of Docker and experience with Kubernetes (GKE)
- Proficiency in English speaking and writing
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Airflow APIs Architecture AWS Azure BigQuery CI/CD Computer Science Docker Engineering GCP GitHub Jenkins Kubernetes Machine Learning MLOps Pandas Pipelines Python PyTorch Scikit-learn SQL TensorFlow Terraform Vertex AI
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.