AI Software Engineer, Inference

San Francisco

Apply now Apply later

About Nexus

Nexus is building a world supercomputer by leveraging the latest advancements in cryptography, engineering, and science. Our team of experts is developing and deploying the Nexus Layer 1, the Nexus Network, and Nexus zkVM in support of our mission to enable the Verifiable Internet.

Nexus raised $25M in Series A funding, co-led by Lightspeed and Pantera, with participation from Dragonfly, SV Angel and more.  

We are headquartered in San Francisco, and this role will be in-person with the rest of the Nexus team.

AI Software Engineer, Inference
We’re looking for an AI Software Engineer focused on Inference to help us bring powerful AI models to life — fast, efficient, and at scale. This role is all about building the systems that deliver real-time predictions, keeping latency low and performance high. If you love optimizing ML workloads and making complex systems run smoothly in production, this one's for you.

At our startup, speed matters — not just in how our models perform, but in how quickly we learn, ship, and grow. You’ll be a core part of the engineering team, collaborating closely with researchers and product engineers to build scalable inference systems that support everything we do.

Responsibilities

  • Design and optimize lightning-fast inference pipelines for both real-time and batch predictions

  • Deploy and scale machine learning models in production across cloud and containerized environments

  • Leverage frameworks like TensorFlow Serving, TorchServe, or Triton to serve models at scale

  • Monitor performance in the wild — build tools to track model behavior, latency, and reliability

  • Work with researchers to productionize models, implement model compression, and make inference as efficient as possible

  • Solve problems fast — whether it’s a scaling bottleneck, a failed deployment, or a rogue latency spike

  • Build internal tools that streamline how we deploy and monitor inference workloads

Requirements 

  • 3+ years of experience in software engineering, preferably with exposure to ML systems in production

  • Strong skills in Python, Go, or Java, and a solid understanding of system performance fundamentals

  • Experience with containerization (Docker, Kubernetes) and deploying services in the cloud (AWS, GCP, or Azure)

  • Solid understanding of model serving architectures and techniques for optimizing latency and throughput

  • Comfort with performance tuning and profiling of ML model execution

  • A practical mindset and eagerness to own production systems from build to run

Bonus Points

  • Experience with hardware acceleration for inference (GPUs, TPUs, etc.)

  • Familiarity with real-time data processing and streaming tools

  • Hands-on with edge deployment (mobile, embedded, etc.)

  • Contributions to open-source projects in model serving or ML infrastructure

Benefits

  • Competitive salary and generous equity compensation

  • Health insurance for employees and their dependents

  • Daily lunch and dinner provided at SF headquarters

  • Company-paid travel to events and conferences

Nexus is committed to diversity in our workforce and is proud to be an Equal Opportunity Employer (EEO).

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  0  0  0

Tags: Architecture AWS Azure Docker Engineering GCP Java Kubernetes Machine Learning ML infrastructure ML models Open Source Pipelines Python Streaming TensorFlow

Perks/benefits: Career development Competitive pay Conferences Equity / stock options Health care Salary bonus Startup environment Team events

Region: North America
Country: United States

More jobs like this