DevOps and MLOps Lead

United States - Remote

EverService

EverService provides integrated websites, marketing, sales and engagement services. Learn more about EverService today.

View all jobs at EverService

Apply now Apply later

About EverService Holdings, LLC:

EverService is a global provider of tech-enabled business solutions for companies of all sizes, helping them to grow and scale with digital marketing, website design & development, scheduling & booking services, 24/7 answering services, inbound & outbound sales, live virtual receptionists, client & patient intake, and IT services. The company focuses on end-to-end solutions specialized for the legal, medical, home services, retail and technology industries integrated with clients’ CRM, EHR and operational systems. EverService goes to market with vertically integrated, industry-leading brands including Alert Communications, Blue Corona, Nexa Receptionists, Mid-State Communications, Client Chat Live, Mainline Telecommunications, Nexa Healthcare, RYNO Strategic Solutions, iLawyer Marketing and Strike Healthcare. For more information, visit EverService at www.everservice.com.

Summary of Position:

We are seeking an experienced and passionate Development Operations and Machine Learning Operations Lead to join our growing team. This role will be crucial in leading the design, implementation, and management of our CI/CD pipelines and MLOps infrastructure, ensuring efficient and reliable deployment of both software and AI/ML models. The ideal candidate will possess strong technical expertise in both DevOps and MLOps practices, with a proven ability to build and manage scalable and robust systems. This is a leadership role requiring excellent communication, collaboration, and mentoring skills to guide and empower a team of engineers.

Key Responsibilities:

DevOps Lead Responsibilities

  • Lead the design, implementation, and evolution of our CI/CD pipelines, encompassing all stages of the software development lifecycle
  • Collaborate with cross-functional teams to streamline and automate the end-to-end deployment processes
  • Develop and execute the overall DevOps strategy, aligning it with the organization's goals and objectives
  • Prioritize objectives and provide a holistic approach to solution recommendations that includes ROI, time to market, scalability, and reliability as well as alternative recommendations
  • Lead and manage a team of DevOps engineers, providing guidance and support to ensure the successful delivery of projects
  • Architect and manage cloud infrastructure for scalability, security, and cost optimization
  • Champion and implement Infrastructure-as-Code (IaC) principles using tools like Terraform and Ansible
  • Oversee containerization and orchestration using Docker and Kubernetes to ensure rapid and reliable software deployments
  • Collaborate with software development, system administration, and quality assurance teams to streamline processes and improve efficiency
  • Communicate effectively with stakeholders, providing regular updates on DevOps projects, initiatives, and performance metrics
  • Establish and enforce best practices and standards across development and operations teams
  • Drive automation initiatives to minimize manual efforts and improve productivity
  • Identify and mitigate risks and issues related to software delivery and infrastructure maintenance
  • Stay up to date with the latest industry trends and technologies related to DevOps and suggest relevant improvements to the organization
  • Foster collaboration and communication between development and operations teams, promoting a culture of continuous improvement and knowledge sharing
  • Maintain department standards for hiring, code, development practices, and professionalism
  • Work with IT Ops to maintain health of systems within the development, test, staging, and production realms
  • Responsible for providing support for outages and emergencies and facilitates communication with other technical departments and the ministry for outage resolutions

MLOps Responsibilities:

  • Build and optimize ML infrastructure to support the deployment and operation of machine learning models in production
  • Create pipelines for model training, testing, validation, and deployment, ensuring seamless integration with existing platforms
  • Collaborate with data scientists and AI engineers to integrate MLOps practices into the model development lifecycle
  • Maintain and optimize cloud platforms (AWS, Azure, or Google Cloud) for ML workloads, ensuring efficiency and scalability
  • Build and manage infrastructure for model training, including GPU clusters and distributed training frameworks
  • Develop and implement automated pipelines for model retraining, validation, and deployment
  • Implement monitoring and logging solutions to track model performance and identify potential issues
  • Stay abreast of the latest advancements in MLOps tools and technologies

Team Leadership:

  • Provide technical guidance, mentorship, and coaching to a team of engineers
  • Foster a collaborative and supportive team environment, encouraging knowledge sharing and innovation
  • Conduct performance reviews, identify training needs, and support the professional development of team members
  • Delegate tasks effectively and ensure accountability for deliverables

Innovation and Continuous Improvement:

  • Experiment with new technologies and methodologies to improve deployment processes
  • Contribute to the continuous improvement of DevOps practices and AI model deployment
  • Lead initiatives to modernize legacy systems and integrate them into the cloud infrastructure

Requirements

  • Bachelor’s degree in computer science, Engineering, or a related field
  • 10+ years of software development experience
  • 5+ years of experience in DevOps, including experience leading and managing DevOps teams
  • 3+ years of experience in MLOps, with a proven track record of building and managing MLOps infrastructure and pipelines
  • Strong expertise in cloud platforms (AWS, Azure, or Google Cloud) and Infrastructure-as-Code tools (Terraform, Ansible)
  • Experience with CI/CD tools (e.g., Concourse CI, Jenkins, GitLab CI, CircleCI)
  • Familiarity with containerization technologies (Docker, Kubernetes)
  • Experience with machine learning frameworks and libraries (TensorFlow, PyTorch, Scikit-learn)
  • Knowledge of model deployment strategies and techniques (e.g., REST APIs, serverless functions)
  • Experience with MLOps tools and platforms (e.g., MLflow, Kubeflow)
  • Excellent communication, interpersonal, and leadership skills
  • Strong analytical and problem-solving abilities
  • Experience with large language models (LLMs) and transformer architectures highly preferred
  • Familiarity with LangChain, Hugging Face and other modern AI/ML frameworks and tools highly preferred

Benefits

We’ve got you covered:

EverService is proud to offer a variety of benefits to support employees and their families, including:

  • Medical, Vision, Dental
  • Retirement
  • Life Insurance
  • Sick Time and Paid Time Away (PTO)

This job description is intended to describe the general nature and level of work being performed by people assigned to this position. It is not to be construed as an exhaustive list of all responsibilities, duties, and skills required of personnel. All personnel may be required to perform duties outside of their normal responsibilities from time to time, as needed.

We are an equal employment opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, national origin, disability status, protected veteran status or any other characteristic protected by law.

Apply now Apply later
  • Share this job via
  • 𝕏
  • or

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  0  0  0

Tags: Ansible APIs Architecture AWS Azure CI/CD Computer Science DevOps Docker Engineering GCP GitLab Google Cloud GPU Jenkins Kubeflow Kubernetes LangChain LLMs Machine Learning MLFlow ML infrastructure ML models MLOps Model deployment Model training Pipelines PyTorch Scikit-learn Security TensorFlow Terraform Testing

Perks/benefits: Career development Health care Insurance

Regions: Remote/Anywhere North America
Country: United States

More jobs like this