ML Ops Architect
Dallas, Texas, United States - Remote
Tiger Analytics
An Advanced Analytics and AI consulting services company. Trusted Data sciences, Data engineering partner for Fortune 1000 firms.Simplify data. Explore moreTiger Analytics is an advanced analytics consulting firm. We are the trusted analytics partner for several Fortune 100 companies, enabling them to generate business value from data. Our consultants bring deep expertise in Data Science, Machine Learning, and AI. Our business value and leadership have been recognized by various market research firms, including Forrester and Gartner.
We are looking for a motivated and passionate Machine Learning Engineers for our team.
Job Description:
As a Senior ML OPS Engineer, you will be joining a team of experienced Machine Learning Engineers that support, build, and enable Machine capabilities across the organization. You will work closely with internal customers and infrastructure teams to build our next generation data science workbench and ML platform and products. You will be able to further expand your knowledge and develop your expertise in modern Machine Learning frameworks, libraries and technologies while working closely with internal stakeholders to understand the evolving business needs. If you have a penchant for creative solutions and enjoy working in a hands-on, collaborative environment, then this role is for you.
Requirements
What you'll do in the role:
- Implement scalable and reliable systems leveraging cloud-based architectures, technologies and platforms to handle model inference at scale.
- Deploy and manage machine learning & data pipelines in production environments.
- Work on containerization and orchestration solutions for model deployment.
- Participate in fast iteration cycles, adapting to evolving project requirements.
- Collaborate as part of a cross-functional Agile team to create and enhance software that enables state-of-the-art big data and ML applications.
- Leverage CICD best practices, including test automation and monitoring, to ensure successful deployment of ML models and application code.
- Ensure all code is well-managed to reduce vulnerabilities, models are well-governed from a risk perspective, and the ML follows best practices in Responsible and Explainable AI.
- Collaborate with Data scientists, software engineers, data engineers, and other stakeholders to develop and implement best practices for MLOps, including CI/CD pipelines, version control, model versioning, monitoring, alerting and automated model deployment.
- Manage and monitor machine learning infrastructure, ensuring high availability and performance.
- Implement robust monitoring and logging solutions for tracking model performance and system health.
- Monitor real-time performance of deployed models, analyze performance data, and proactively identify and address performance issues to ensure optimal model performance.
- Troubleshoot and resolve production issues related to ML model deployment, performance, and scalability in a timely and efficient manner.
- Implement security best practices for machine learning systems and ensure compliance with data protection and privacy regulations.
- Collaborate with platform engineers to effectively manage cloud compute resources for ML model deployment, monitoring, and performance optimization.
- Develop and maintain documentation, standard operating procedures, and guidelines related to MLOps processes, tools, and best practices.
Basic Qualifications:
- Master's or doctoral degree in computer science, electrical engineering, mathematics, or a similar field.
- Typically requires 7+ years of hands-on work experience developing and applying advanced analytics solutions in a corporate environment with at least 4 years of experience programming with Python.
- At least 3 years of experience designing and building data-intensive solutions using distributed computing.
- At least 3 years of experience productionizing, monitoring, and maintaining models
Must have skills:
- Understanding of Azure stack like Azure Machine Learning, Azure Data Factory, Azure Databricks, Azure Kubernetes Service, Azure Monitor, etc.
- Demonstrated expertise in building and deploying AI/Machine Learning solutions at scale leveraging cloud such as AWS, Azure, or Google Cloud Platform.
- Experience in developing and maintaining APIs (e.g.: REST).
- Experience specifying infrastructure and Infrastructure as a code (e.g.: Ansible, Terraform).
- Experience in designing, developing & scaling complex data & feature pipelines feeding ML models and evaluating their performance.
- Ability to work across the full stack and move fluidly between programming languages and MLOps technologies (e.g.: Python, Spark, DataBricks, Github, MLFlow, Airflow).
- Expertise in Unix Shell scripting and dependency-driven job schedulers.
- Understanding of security and compliance requirements in ML infrastructure.
- Experience with visualization technologies (e.g.: RShiny, Streamlit, Python DASH, Tableau, PowerBI).
- Familiarity with data privacy standards, methodologies, and best practices.
Benefits
Significant career development opportunities exist as the company grows. The position offers a unique opportunity to be part of a small, fast-growing, challenging and entrepreneurial environment, with a high degree of individual responsibility.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Agile Airflow Ansible APIs Architecture AWS Azure Big Data CI/CD Computer Science Consulting Consulting firm Databricks Data pipelines Engineering GCP GitHub Google Cloud Kubernetes Machine Learning Market research Mathematics MLFlow ML infrastructure ML models MLOps Model deployment Model inference Pipelines Power BI Privacy Python Research Security Shell scripting Spark Streamlit Tableau Terraform
Perks/benefits: Career development Health care
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.