AI Software Engineer, Inference

San Francisco

Applications have closed

Nexus

We are building the world supercomputer powered by the Nexus Layer 1 blockchain and Nexus zkVM to enable the Verifiable Internet and prove everything from AI to identity.

View all jobs at Nexus

Find more jobs like this Jobs in the United States

Posted 1 month ago

About Nexus

Nexus is building a world supercomputer by leveraging the latest advancements in cryptography, engineering, and science. Our team of experts is developing and deploying the Nexus Layer 1, the Nexus Network, and Nexus zkVM in support of our mission to enable the Verifiable Internet.

Nexus raised $25M in Series A funding, co-led by Lightspeed and Pantera, with participation from Dragonfly, SV Angel and more.

We are headquartered in San Francisco, and this role will be in-person with the rest of the Nexus team.

AI Software Engineer, Inference
We’re looking for an AI Software Engineer focused on Inference to help us bring powerful AI models to life — fast, efficient, and at scale. This role is all about building the systems that deliver real-time predictions, keeping latency low and performance high. If you love optimizing ML workloads and making complex systems run smoothly in production, this one's for you.

At our startup, speed matters — not just in how our models perform, but in how quickly we learn, ship, and grow. You’ll be a core part of the engineering team, collaborating closely with researchers and product engineers to build scalable inference systems that support everything we do.

Responsibilities

Design and optimize lightning-fast inference pipelines for both real-time and batch predictions
Deploy and scale machine learning models in production across cloud and containerized environments
Leverage frameworks like TensorFlow Serving, TorchServe, or Triton to serve models at scale
Monitor performance in the wild — build tools to track model behavior, latency, and reliability
Work with researchers to productionize models, implement model compression, and make inference as efficient as possible
Solve problems fast — whether it’s a scaling bottleneck, a failed deployment, or a rogue latency spike
Build internal tools that streamline how we deploy and monitor inference workloads

Requirements

3+ years of experience in software engineering, preferably with exposure to ML systems in production
Strong skills in Python, Go, or Java, and a solid understanding of system performance fundamentals
Experience with containerization (Docker, Kubernetes) and deploying services in the cloud (AWS, GCP, or Azure)
Solid understanding of model serving architectures and techniques for optimizing latency and throughput
Comfort with performance tuning and profiling of ML model execution
A practical mindset and eagerness to own production systems from build to run

Bonus Points