Sr. Cloud Engineer (Software-Focused)
San Francisco, CA
⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️
Location: Remote or SF, CA
Department: Engineering
Reports to: Engineering Lead
About the Role
We’re looking for a Senior Cloud Engineer with a software engineering background to help build, scale, and support the infrastructure powering our applications. This is a hands-on role ideal for someone who enjoys working with Kubernetes and related tools across multiple cloud providers, and is excited to grow in a dynamic, fast-paced environment.
As a member of the Cloud team, you will work closely with software engineers and product teams to build platforms, support deployments, improve reliability, reduce costs, and help us scale our systems.
What You’ll Do
- Work with the cloud team to maintain and improve the existing Kubernetes and AWS infrastructure.
- Work with the software development and research teams to help them architect their applications and deploy them to the cloud.
- Help build a platform to run our products inside private customer cloud environments in AWS, Azure, and GCP.
- Support and improve CI/CD pipelines, automated deployments.
- Support and improve an existing LGTM Observability stack.
- Write clean, maintainable scripts and tooling in Python, Go, or similar languages.
- Contribute to the design and automation of scalable, resilient, and secure systems
- Help triage and resolve infrastructure-related issues in staging and production environments
- Participate in on-call rotation (as needed) and contribute to system reliability initiatives
- Assist with SOC2 audits.
Tools you should know well
- AWS (Control Tower, Identity Center, VPC, and more)
- Kubernetes (EKS)
- Teraform and Terragrunt
- Helm
- Docker
- ArgoCD
- LGTM (Loki, Grafana, Temp, Mimir)
- Python
Nice to have skills, but not required.
- Experience working with research and software teams.
- Experience using or designing agentic systems.
- Experience with using LLMs or building systems that use LLMs.
- Experience running GPU workloads on Kubernetes would be a huge plus.
Nice to have tools, but not required.
- Pulumi
- Atlantis
- Zitadel
- GCP
- Azure
- Google Workspaces
- Fivetran
- Cloudflare
- Snowflake
- Postgresql
- Tailscale
- Javascript/Typescript
- Docker compose
- Redis / Valkey
Why Join Us?
At Arcee, we’re building the infrastructure powering the next generation of intelligent systems, and we’re doing it with a team that values curiosity, ownership, and thoughtful collaboration.
-
Work on high-impact problems: You’ll tackle real infrastructure challenges that support AI research, agentic systems, and production ML workflows across AWS, Azure, and GCP.
-
Join a sharp, mission-driven team: Our engineers are deeply technical and collaborative, and we care about doing things the right way, not just the fast way.
-
Grow with autonomy and impact: We’re still small, which means your voice matters. You’ll shape strategy, ship real things, and see your work in action.
-
Remote-first, with roots in SF: We support remote work and async collaboration, and we’re opening an office in San Francisco for those who prefer a hybrid setup.
-
Take the time you need: We offer unlimited PTO and US bank holidays and we genuinely want you to take it. Rested teams do better work.
-
Be part of something future-facing: Our work directly supports large language models and intelligent agents. You'll be at the intersection of infrastructure and innovation.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: AWS Azure CI/CD Docker Engineering FiveTran GCP GPU Grafana Helm JavaScript Kubernetes LLMs Machine Learning Pipelines PostgreSQL Python Research Snowflake TypeScript
Perks/benefits: Career development Unlimited paid time off
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.