Senior Site Reliability Engineer (SRE)
Remote
⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️
Full Time Senior-level / Expert USD 110K - 170K
Invisible AI
Invisible AI is a visual intelligence platform that works with operators to ensure manufacturing processes are done correctly every time.
At Invisible AI, we are building the future of computer vision. Today, our core focus is on developing an end-to-end platform that can digitize manufacturing operations. We deploy edge AI cameras to digitize all steps of manual assembly work which helps people-driven manufacturing be accurate, reliable, and safe. Coming from the world of self-driving cars, the founders of Invisible AI have years of experience in building and deploying large-scale AI & Machine Learning pipelines. Join us and help build a company that will deliver the endless possibilities of computer vision to real-world customers!
As a Site Reliability Engineer, you will build the technology to enable our platform to deploy, run, and monitor Invisible AI’s software at scale across tens of independent deployments and thousands of devices. The SRE works closely with all other engineering teams and owns internal tools to enable faster development and deployment, like secure ephemeral debug environments, streamlined access controls, CI/CD systems, and a custom in-house device management platform for device configuration and software releases.
As a Site Reliability Engineer, you will build the technology to enable our platform to deploy, run, and monitor Invisible AI’s software at scale across tens of independent deployments and thousands of devices. The SRE works closely with all other engineering teams and owns internal tools to enable faster development and deployment, like secure ephemeral debug environments, streamlined access controls, CI/CD systems, and a custom in-house device management platform for device configuration and software releases.
Responsibilities:
- Design, build, and maintain scalable and resilient infrastructure on the edge.
- Develop automation and infrastructure-as-code solutions using Terraform, Ansible, and scripting languages (Python, Bash).
- Deploy and manage containerized applications using Docker and related technologies.
- Ensure system observability by building and optimizing monitoring systems, particularly using Prometheus.
- Troubleshoot and optimize Linux-based systems (e.g., Red Hat, CentOS, Ubuntu).
- Collaborate with security teams to implement robust security practices and ensure compliance with best practices.
- Work closely with software engineers to improve system performance, reliability, and deployment pipelines.
- Support and maintain networking infrastructure, including troubleshooting protocols and configurations.
- Manage cloud and on-premise infrastructure, with a focus on automation and scalability.
- Contribute to incident response, postmortems, and process improvements.
Requirements:
- 8+ years of experience in Site Reliability Engineering and building/managing infrastructure at scale, particularly on edge devices.
- Strong experience with Python scripting (able to read and write code fluently).
- Comfortable working with Linux systems, Docker, and infrastructure-as-code tools like Terraform and Ansible.
- Hands-on experience with observability stacks (e.g., Prometheus, Grafana).
- Deep understanding of SLAs/SLOs/SLIs and how to operationalize them.
- Strong systems thinking: understands how distributed systems work and how to make them resilient.
- Experience with CI/CD pipelines, incident management, and system hardening.
- Deep understanding of networking concepts and protocols.
- Familiarity with cloud platforms (AWS, Azure, Google Cloud) is a plus.
- Experience with Windows Services/VMs is a plus.
- Bachelor’s degree in Computer Science, Information Technology, or a related field, or equivalent experience.
Job stats:
2
0
0
Categories:
Big Data Jobs
Engineering Jobs
Tags: Ansible AWS Azure CI/CD Computer Science Computer Vision Distributed Systems Docker Engineering GCP Google Cloud Grafana Linux Machine Learning Pipelines Python Security Terraform
Perks/benefits: Career development Equity / stock options
Region:
Remote/Anywhere
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.
Power BI Developer jobsBusiness Intelligence Developer jobsBI Developer jobsData Scientist II jobsPrincipal Data Engineer jobsStaff Data Scientist jobsStaff Machine Learning Engineer jobsPrincipal Software Engineer jobsJunior Data Analyst jobsDevOps Engineer jobsData Science Intern jobsSoftware Engineer II jobsData Science Manager jobsData Manager jobsStaff Software Engineer jobsLead Data Analyst jobsAI/ML Engineer jobsData Analyst Intern jobsBusiness Data Analyst jobsSr. Data Scientist jobsData Specialist jobsData Engineer III jobsBusiness Intelligence Analyst jobsData Governance Analyst jobsSenior Backend Engineer jobs
Consulting jobsMLOps jobsAirflow jobsOpen Source jobsEconomics jobsLinux jobsKafka jobsKPIs jobsGitHub jobsTerraform jobsJavaScript jobsPostgreSQL jobsPrompt engineering jobsBanking jobsRAG jobsStreaming jobsRDBMS jobsData Warehousing jobsNoSQL jobsScikit-learn jobsClassification jobsPhysics jobsComputer Vision jobsdbt jobsGoogle Cloud jobs
GPT jobsHadoop jobsPandas jobsLooker jobsLangChain jobsData warehouse jobsR&D jobsScala jobsReact jobsOracle jobsBigQuery jobsDistributed Systems jobsELT jobsMicroservices jobsScrum jobsCX jobsIndustrial jobsPySpark jobsOpenAI jobsRedshift jobsJira jobsSAS jobsRobotics jobsTypeScript jobsE-commerce jobs