AI Software Engineer, Inference
San Francisco
About Nexus
Nexus is building a world supercomputer by leveraging the latest advancements in cryptography, engineering, and science. Our team of experts is developing and deploying the Nexus Layer 1, the Nexus Network, and Nexus zkVM in support of our mission to enable the Verifiable Internet.
Nexus raised $25M in Series A funding, co-led by Lightspeed and Pantera, with participation from Dragonfly, SV Angel and more.
We are headquartered in San Francisco, and this role will be in-person with the rest of the Nexus team.
AI Software Engineer, Inference
We’re looking for an AI Software Engineer focused on Inference to help us bring powerful AI models to life — fast, efficient, and at scale. This role is all about building the systems that deliver real-time predictions, keeping latency low and performance high. If you love optimizing ML workloads and making complex systems run smoothly in production, this one's for you.
At our startup, speed matters — not just in how our models perform, but in how quickly we learn, ship, and grow. You’ll be a core part of the engineering team, collaborating closely with researchers and product engineers to build scalable inference systems that support everything we do.
Responsibilities
Design and optimize lightning-fast inference pipelines for both real-time and batch predictions
Deploy and scale machine learning models in production across cloud and containerized environments
Leverage frameworks like TensorFlow Serving, TorchServe, or Triton to serve models at scale
Monitor performance in the wild — build tools to track model behavior, latency, and reliability
Work with researchers to productionize models, implement model compression, and make inference as efficient as possible
Solve problems fast — whether it’s a scaling bottleneck, a failed deployment, or a rogue latency spike
Build internal tools that streamline how we deploy and monitor inference workloads
Requirements
3+ years of experience in software engineering, preferably with exposure to ML systems in production
Strong skills in Python, Go, or Java, and a solid understanding of system performance fundamentals
Experience with containerization (Docker, Kubernetes) and deploying services in the cloud (AWS, GCP, or Azure)
Solid understanding of model serving architectures and techniques for optimizing latency and throughput
Comfort with performance tuning and profiling of ML model execution
A practical mindset and eagerness to own production systems from build to run
Bonus Points
Experience with hardware acceleration for inference (GPUs, TPUs, etc.)
Familiarity with real-time data processing and streaming tools
Hands-on with edge deployment (mobile, embedded, etc.)
Contributions to open-source projects in model serving or ML infrastructure
Benefits
Competitive salary and generous equity compensation
Health insurance for employees and their dependents
Daily lunch and dinner provided at SF headquarters
Company-paid travel to events and conferences
Nexus is committed to diversity in our workforce and is proud to be an Equal Opportunity Employer (EEO).
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Architecture AWS Azure Docker Engineering GCP Java Kubernetes Machine Learning ML infrastructure ML models Open Source Pipelines Python Streaming TensorFlow
Perks/benefits: Career development Competitive pay Conferences Equity / stock options Health care Salary bonus Startup environment Team events
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.