MLOPS Engineer
Pune, Maharashtra, India
InfraCloud
InfraCloud helps companies build GPU Cloud, modernize applications and infrastructure with our expertise in cloud native technologies.Location: Pune,Maharashtra,India
Job Overview:
We are seeking an MLOps Engineer to support the development, deployment, and operationalization of AI-based models. The ideal candidate will have strong experience in both machine learning and DevOps, with a deep understanding of model deployment, monitoring, and scaling. In this role, you will be responsible for migrating our AI-based application from a cloud-hosted solution to a self-hosted model, ensuring robust model integration, and managing the lifecycle of machine learning models, from development and training to deployment and monitoring.
Key Responsibilities:
Model Development & Training:
- Develop and implement efficient pipelines for training, validating, and fine-tuning machine learning models.
- Ensure that models are trained on relevant datasets and are capable of handling a wide variety of use cases.
Deployment and Infrastructure:
- Design and implement the architecture required for self-hosted AI models, both on-premise and in the cloud, depending on project requirements.
- Work closely with cloud infrastructure teams (AWS, GCP, Azure) to deploy and scale models in a production environment.
- Set up automated deployment pipelines for the continuous integration and delivery (CI/CD) of machine learning models.
Model Monitoring and Performance Tuning:
- Monitor model performance and identify areas for improvement, including latency, resource consumption, and prediction accuracy.
- Develop strategies for model retraining, continuous learning, and tuning to maintain the best performance over time.
- Implement logging and monitoring systems to track model health and trigger alerts for potential issues.
Integration & Automation:
- Integrate AI models with the existing platform, ensuring smooth interaction between models and front-end systems.
- Automate the end-to-end ML pipeline, from data collection and preprocessing to model deployment and feedback loops.
- Collaborate with DevOps and infrastructure teams to ensure smooth operation of models in production environments.
Collaboration & Documentation:
- Document model architectures, training processes, deployment pipelines, and system configurations for internal use.
- Provide technical support and troubleshooting for model-related issues during integration or in production.
Security and Compliance:
- Ensure compliance with data privacy and security regulations in the AI model deployment process.
- Implement robust access controls, encryption, and audit logging for model and data access.
Qualifications:
- Education: Bachelor’s or Master’s degree in Computer Science, Engineering, Data Science, or a related field.
Experience:
- 3+ years of experience in MLOps, with a focus on machine learning model deployment and management.
- Hands-on experience with both on-premise and cloud-based AI model deployment (AWS, GCP, Azure).
- Experience with containerization technologies (Docker, Kubernetes) for model deployment and scaling.
- Familiarity with continuous integration/continuous delivery (CI/CD) tools (e.g., Jenkins, GitLab CI, CircleCI).
- Experience in monitoring tools (e.g., Prometheus, Grafana) and managing model performance at scale.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Architecture AWS Azure CI/CD Computer Science DevOps Docker Engineering GCP GitLab Grafana Jenkins Kubernetes Machine Learning ML models MLOps Model deployment Pipelines Privacy Security
Perks/benefits: Career development
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.