Sr DevOps Engineer - AI Services
Remote (United States)
- Remote-first
- Website
- @anacondainc 𝕏
- GitHub
- Search
Anaconda
Democratize AI innovation with the world’s most trusted open ecosystem for data science and AI development.About Anaconda
Be at the center of AI
With more than 45 million users, Anaconda is the most popular operating system for AI providing access to the foundational open-source Python packages used in modern AI, data science, and machine learning through a seamless platform. We pioneered the use of Python for data science, championed its vibrant community, and continue to steward open-source projects that make tomorrow’s innovations possible. Our enterprise-grade solutions enable corporate, research, and academic institutions around the world to harness the power of open source for competitive advantage, groundbreaking research, and a better world. To learn more visit https://www.anaconda.com.
Here is what people love most about working here: We’re not just a company, we’re part of a movement. Our dedicated employees and user community are democratizing data science and creating and promoting open-source technologies for a better world, and our commercial offerings make it possible for enterprise users to leverage the most innovative output from open source in a secure, governed way.
Summary:
Anaconda is seeking a talented Senior Devops Engineer to join our rapidly growing company. This is an excellent opportunity for you to leverage your experience and skills and apply it to the world of data science, artificial intelligence, and machine learning.
What You'll Do:
- Design and implement scalable AWS infrastructure, with particular focus on Lambda functions, RDS, and message bus architectures
- Build and maintain robust MLOps pipelines for deploying and monitoring LLM models in production environments
- Develop and optimize real-time communication systems using WebSockets and WebRTC for ML inference services
- Create and maintain Python packages with C extensions, focusing on performance optimization and reliability
- Design and implement comprehensive monitoring and telemetry systems across our infrastructure
- Manage and optimize Kubernetes clusters for ML workloads, ensuring efficient resource utilization and high availability
- Architect and maintain efficient CI/CD pipelines for both infrastructure and application deployments
- Collaborate with AI and research teams to understand and implement infrastructure requirements for new ML models and features
- Optimize system performance and cost efficiency across our AWS infrastructure
- Lead technical discussions and provide expertise in infrastructure and deployment strategies
- Implement and maintain security best practices across our infrastructure
- Participate in on-call rotations and lead incident response efforts when necessary
What You Need:
- 7+ years of software engineering experience, with at least 4 years focused on infrastructure and DevOps
- Deep expertise with AWS services, particularly Lambda, RDS, and message bus architectures
- Strong experience with Kubernetes in production environments
- Extensive experience building and maintaining production ML deployment pipelines
- Expert-level Python programming skills and experience building Python packages
- Proven experience with C/C++ programming, particularly in building Python extensions
- Strong background in WebSocket and WebRTC implementations
- Demonstrated experience with monitoring and telemetry systems
- Experience with high-performance, distributed systems
- Strong understanding of security best practices in cloud environments
- Bachelor's degree in Computer Science, Engineering, or related field
- Experience with CI/CD pipelines and infrastructure automation
- Proven track record of optimizing system performance and reliability
- Team attitude: “I am not done until WE are done”
- Embody our core values:
- Great People
- Great Product
- Great Performance
- Care deeply about fostering an environment where people of all backgrounds and experiences can flourish
What Will Make You Stand Out:
- Experience with Rust programming language
- Knowledge of WASM deployments and optimization
- Experience with Llama.cpp or similar ML optimization frameworks
- Contributions to open-source infrastructure or MLOps tools
- Experience with large-scale LLM deployments
- Advanced degree in Computer Science or related field
- Experience with multi-region AWS deployments
- Background in network optimization and protocols
- Track record of building developer tools and platforms
- Experience working in a fast-paced startup environment
- Experience working in an open-source, AI, or data science-oriented company
Why You'll Like Working Here:
- Unique opportunity to translate strong open-source adoption and user enthusiasm into commercial product growth
- Dynamic company that rewards high-performers
- On the cutting edge of enterprise application of data science, machine learning, and AI
- Collaborative team environment that values multiple perspectives and clear thinking
- Employees-first culture
- Flexible working hours
- Medical*, Dental*, Vision*, HSA*, Life* and 401K*
- Paid parental leave - both parents
- Monthly productivity stipend
- Open vacation policy*
- Quarterly Snake days (company-wide bonus day off)
- 100% remote
*FTE employees based on your region
An Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or protected veteran status and will not be discriminated against on the basis of disability.
Anaconda, Inc. (“We”, “Us”) are committed to protecting and respecting your privacy. This Privacy Notice sets out the basis on which the personal data collected from you, or that you provide to Us, will be processed by Us in connection with Our recruitment processes. By clicking “Submit Application”, you acknowledge you have read our Privacy Policy and that Anaconda can retain your application data for up to 1-year, unless otherwise stated. For the purpose of the General Data Protection Regulation (“GDPR”) ”) and the version of the GDPR retained in UK law (the “UK GDPR”) the Data Controller is Sydney Artt.
This job post expires 30 days from its original post date
Anaconda is an EEO/AA employer M/F/V/D.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Anaconda Architecture AWS CI/CD Computer Science DevOps Distributed Systems Engineering Kubernetes Lambda LLaMA LLMs Machine Learning ML models MLOps Open Source Pipelines Privacy Python Research Rust Security
Perks/benefits: Career development Flex hours Flex vacation Health care Home office stipend Medical leave Parental leave Salary bonus Startup environment
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.