Senior MLOps Engineer
Bangalore
Razorpay
Online Payments India: Start Accepting Payments Instantly with Razorpay's Payment suite, which Supports Netbanking, Credit card & Debit Cards, UPI, etc.Razorpay was founded by Shashank Kumar and Harshil Mathur in 2014. Razorpay is building a new-age digital banking hub (Neobank) for businesses in India with the mission is to enable frictionless banking and payments experiences for businesses of all shapes and sizes. What started as a B2B payments company is processing billions of dollars of payments for lakhs of businesses across India.
We are a full-stack financial services organisation, committed to helping Indian businesses with comprehensive and innovative payment and business banking solutions built over robust technology to address the entire length and breadth of the payment and banking journey for any business. Over the past year, we've disbursed loans worth millions of dollars in loans to thousands of businesses. In parallel, Razorpay is reimagining how businesses manage money by simplifying business banking (via Razorpay X) and enabling capital availability for businesses (via Razorpay Capital).The Role:
We are seeking a skilled Senior MLOps Engineer to join our team and drive the scalability and reliability of Razorpay’s machine learning infrastructure. In this role, you will work closely with Data Scientists, Machine Learning Engineers, and other stakeholders to streamline the deployment and maintenance of machine learning models, ensuring robust production-grade solutions.
Key Responsibilities:
-
Collaborate Effectively: Partner with Data Scientists and ML Engineers to understand model requirements and transform them into efficient, scalable production solutions.
-
Enhance ML Infrastructure: Contribute to key projects including our feature store, model-serving platform, and model registry to elevate Razorpay’s machine learning capabilities.
-
Optimize ML Pipelines: Oversee and refine machine learning pipelines, focusing on training, evaluation, and deployment. Enhance processes to boost cost efficiency and reduce runtime, leveraging platforms like DataRobot.
-
Improve Real-time Reliability: Design and implement strategies to reduce latency and increase reliability for real-time feature and model serving.
-
Implement Best Practices: Establish and promote version control standards, CI/CD processes, and automated testing for robust and reliable ML model deployment.
-
Manage Cloud Infrastructure: Oversee cloud resources, manage ML-related data storage and compute instances, and drive improvements across the infrastructure stack.
-
Stay Current with Emerging Technologies: Continuously explore and integrate new tools and advancements in the MLOps ecosystem to refine and improve our systems.
-
Document and Share Knowledge: Maintain comprehensive documentation of processes, architectures, and workflows to foster knowledge sharing and team cohesion.
Skills and Qualifications:
-
Build & Manage Cloud Environments: Experience managing cloud environments, with a strong preference for AWS. Proficiency in infrastructure automation tools like Terraform and configuration management tools like Helm, Puppet, Chef, or Ansible.
-
Proficient in CI/CD & Version Control: Skilled with CI/CD tools and version control systems (e.g., Git) to maintain reliable deployment processes.
-
Containerization & Orchestration: Hands-on experience with Docker, Kubernetes, and other containerization technologies.
-
ML Frameworks & Tools: Knowledgeable in machine learning concepts and frameworks (TensorFlow, PyTorch, Scikit-learn) and MLOps platforms like MLflow, Kubeflow, or DataRobot.
-
Distributed Systems Experience: Ability to build and maintain low-latency, distributed backend APIs.
-
Problem-solving & Collaboration: Strong analytical skills and the ability to work collaboratively in a team-oriented environment.
Mandatory Qualifications:
-
3-5 years of experience in DevOps, with an emphasis on MLOps practices.
-
Strong cloud infrastructure experience, particularly with AWS.
-
Expertise in scripting languages (e.g., Python, Shell, Ruby, Go) and troubleshooting
Linux production environments.
-
Knowledge of network concepts in AWS and infrastructure operations, especially in
regulated environments such as banking.
-
Proficient in monitoring, logging, and database technologies (e.g., MariaDB/MySQL).
Preferred Qualities:
-
Background in a product-focused organization.
-
Demonstrated ability to manage infrastructure at scale.
-
Side projects or contributions to open-source projects (e.g., on GitHub).
-
Experience with backend programming languages.
Follow us on LinkedIn & Twitter
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Ansible APIs Architecture AWS Banking CI/CD DataRobot DevOps Distributed Systems Docker Git GitHub Helm Kubeflow Kubernetes Linux Machine Learning MariaDB MLFlow ML infrastructure ML models MLOps Model deployment MySQL Open Source Pipelines Puppet Python PyTorch Ruby Scikit-learn TensorFlow Terraform Testing
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.