Staff Machine Learning Operations Engineer (Secret) (4172)
Boulder, CO
Full Time Senior-level / Expert Clearance required USD 103K - 172K
SMX
SMX harnesses the transformative power of technology to help realize your digital future.Outside Analytics has recently become a proud subsidiary of SMX, marking an exciting collaboration that enhances our collective capabilities to deliver cutting-edge digital transformation solutions.
Are you interested in the next generation of Space Force Remote Sensing capabilities? At Outside Analytics we're on the ground floor of helping across the future remote sensing ecosystem across all orbital regimes (LEO, MEO, HEO, and GEO)! We build, integrate, and operationally support our customer's emerging space-ground systems to include real-time data processing frameworks, sensor data processing, and data visualization. We are teamed with the most passionate companies in industry, dedicated to bringing best-of-breed capabilities to address our customers most pressing needs.
We are seeking an experienced Machine Learning Operations (MLOps) Engineer to join and help shape our new MLOps team. This role focuses on deploying and optimizing machine learning models for always-on, high-availability systems in real-world, real-time unclassified and classified environments. As part of a new and growing team, you will have the unique opportunity to evangelize MLOps practices, contribute to the development of an on-premises development platform, and drive innovation in mission-critical applications.
Position location is on-site in Boulder, CO 5 days per week.
Essential Duties & Responsibilities
- Deploy and maintain high-performing ML models (e.g., ensembles of LSTMs and Random Forests) in real-time environments
- Monitor deployed models for drift or performance degradation and implement automated retraining pipelines.
- Implement advanced deployment strategies (e.g., Blue-Green, Canary, Champion-Challenger)
- Develop modular and flexible ML pipelines that ensure uptime and reliability
- Build and manage scalable infrastructure using Kubernetes, Docker, Terraform, and related tools
- Design and implement an on-premises development platform using Kubeflow to replicate cloud capabilities in classified environments
- Set up robust monitoring, logging, and alerting systems using Prometheus, Grafana, and Loki
- Optimize performance metrics like inference latency and system throughput while ensuring fault tolerance
- Work with cross-functional teams, including Data Engineering, Machine Learning, and DevOps, to integrate and enhance ML systems
- Define touchpoints and handoffs with DevOps and Data Engineering to ensure seamless integration of ML workflows with existing infrastructure and data pipelines
- Mentor junior team members and contribute to building a collaborative and innovative team culture
- Other duties as assigned
Required Skills & Experience
- Secret clearance
- 4+ years, including deploying and/or maintaining at least one ML model or pipeline in a production environment.
- Proficiency in writing clean, maintainable Python code for automation and basic scripting tasks
- Basic experience building and maintaining CI/CD pipelines for small-scale projects or systems
- Basic familiarity with distributed environments and frameworks like Protobufs or ZeroMQ
- Basic familiarity with MLflow, Kubeflow, or similar platforms for managing ML experiments and pipelines
- Basic familiarity with Kubernetes and Terraform for managing containerized environments and infrastructure
- Strong problem-solving and analytical skills
- Excellent communication and collaboration capabilities
- Ability to thrive in a dynamic, fast-paced environment
- Good written and verbal communication skills
- Detail oriented
Desired Skills & Experience
- Bachelor’s, Master’s, or PhD in Computer Science, Engineering, or a related technical field
- Relevant certifications (e.g., Certified Kubernetes Administrator, Certified Kubernetes Application Developer, Terraform Associate) are a plus
- Familiarity with C++ and/or Rust.
- Experience with workflow orchestration tools such as Airflow or Prefect
- Experience with distributed data processing frameworks such as PySpark
- Familiarity with SQL and modern database technologies (e.g., MinIO, Yugabyte)
- Experience with DVC, Ansible, Kustomize, Helm, Prometheus, and Grafana
- Understanding of secure software development practices and/or experience working in classified environments
Application Deadline: April 14, 2025
# cjpost
#LI-onsite
The SMX salary determination process takes into account a number of factors, including but not limited to, geographic location, Federal Government contract labor categories, relevant prior work experience, specific skills, education and certifications. At SMX, one of our Core Values is to Invest in Our People so we offer a competitive mix of compensation, learning & development opportunities, and benefits. Some key components of our robust benefits include health insurance, paid leave, and retirement.
The proposed salary for this position is:$103,200—$172,000 USD
At SMX®, we are a team of technical and domain experts dedicated to enabling your mission. From priority national security initiatives for the DoD to highly assured and compliant solutions for healthcare, we understand that digital transformation is key to your future success.
We share your vision for the future and strive to accelerate your impact on the world. We bring both cutting edge technology and an expansive view of what’s possible to every engagement. Our delivery model and unique approaches harness our deep technical and domain knowledge, providing forward-looking insights and practical solutions to power secure mission acceleration.
All qualified candidates will receive consideration for employment without regard to disability status, protected veteran status, race, color, age, religion, national origin, citizenship, marital status, sex, sexual orientation, gender identity or expression, pregnancy or genetic information.
Selected applicant may be subject to a background investigation and/or education verification.
Tags: Airflow Ansible CI/CD Computer Science Data pipelines Data visualization DevOps Docker Engineering Grafana Helm Kubeflow Kubernetes Machine Learning MLFlow ML models MLOps PhD Pipelines PySpark Python Rust Security SQL Terraform
Perks/benefits: Career development Competitive pay Flex hours Health care Insurance
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.