ML Ops Engineer Specialist

World Wide - Remote

⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️

Invisible Technologies

We've trained 80% of the world's top AI models. Now, we'll make them work for you.

View all jobs at Invisible Technologies

Apply now Apply later

What You’ll Do

 

You’ll design and implement robust infrastructure to enable scalable, reliable, and reproducible machine learning workflows. You’ll streamline the lifecycle of ML models, from experimentation to deployment, ensuring our systems are production-grade and future-proof.

  • Build Scalable ML Infrastructure: Architect, deploy, and maintain pipelines and tooling that support versioning, training, testing, and deployment of machine learning models across a variety of environments.
  • Bridge Research and Production: Work closely with ML researchers, data scientists, and backend engineers to translate prototypes into efficient, production-ready services and APIs.
  • Focus on Automation and Reliability: Implement systems for continuous integration, model monitoring, auto-scaling, and failover, with a strong emphasis on observability and operational excellence.
  • Optimize Cloud Resources: Manage and optimize compute resources across cloud and hybrid environments (e.g., GCP, AWS, on-prem), reducing latency and cost while maintaining high reliability.
  • Document Best Practices: Document and deliver best practices in MLOps methodologies such as model versioning, reproducibility, metadata tracking, and experiment lineage..

What We Need

 

Professional Experience:

  • 2+ years of experience building and maintaining ML infrastructure or platforms in production environments.
  • Demonstrated ability to take ML models from experimentation to deployment using MLOps best practices.
  • Experience collaborating with data scientists, ML engineers, and backend teams on cross-functional projects.

Technical Expertise:

  • Proficiency in Python and core ML tooling (e.g., MLflow, Kubeflow, Airflow, Docker, Git).
  • Familiarity with model training frameworks such as PyTorch, ONNX, or scikit-learn.
  • Experience with CI/CD pipelines tailored to ML systems (e.g., model validation checks, artifact versioning).
  • Comfortable managing infrastructure via cloud services (GCP, AWS) and container orchestration platforms (e.g., Kubernetes).
  • Strong debugging and performance tuning skills across data, model, and infrastructure layers.

Bonus (Nice to Haves):

  • Hands-on experience with Databricks or similar distributed compute environments.
  • Familiarity with data engineering tools and workflow orchestration (Spark, dbt, Prefect).
  • Knowledge of monitoring and observability stacks (Prometheus, Grafana, OpenTelemetry) for ML systems.
  • Exposure to regulatory/compliance-aware ML deployment (audit logs, reproducibility, rollback strategies).

 

We offer a pay range of $35-to- $50 per hour, with the exact rate determined after evaluating your experience, expertise, and geographic location. Final offer amounts may vary from the pay range listed above. As a contractor you’ll supply a secure computer and high‑speed internet; company‑sponsored benefits such as health insurance and PTO do not apply.

Important:

All candidates must pass an interview as part of the contracting process.

Apply now Apply later
Job stats:  1  0  0

Tags: Airflow APIs AWS CI/CD Core ML Databricks dbt Docker Engineering GCP Git Grafana Kubeflow Kubernetes Machine Learning MLFlow ML infrastructure ML models MLOps Model training ONNX Pipelines Python PyTorch Research Scikit-learn Spark Testing

Perks/benefits: Career development Health care

Region: Remote/Anywhere

More jobs like this