Senior Research Engineer - Performance Optimization
Palo Alto, California
Full Time Senior-level / Expert USD 180K - 250K
We are looking for engineers with significant problem solving experience in PyTorch, CUDA and distributed systems. You will work with Research Scientists to build & train cutting edge foundation models on thousands of GPUs.
Responsibilities
- Ensure efficient implementation of models & systems for data processing, training, inference and deployment
- Identify and implement optimization techniques for massively parallel and distributed systems
- Identify and remedy efficiency bottlenecks (memory, speed, utilization) by profiling and implementing high-performance CUDA, Triton, C++ and PyTorch code
- Work closely together with the research team to ensure systems are planned to be as efficient as possible from start to finish
- Build tools to visualize, evaluate and filter datasets
- Implement cutting-edge product prototypes based on multimodal generative AI
Experience
- Experience training large models using Python & Pytorch, including practical experience working with the entire development pipeline from data processing, preparation & data loading to training and inference.
- Experience optimizing and deploying inference workloads for throughput and latency across the stack (inputs, model inference, outputs, parallel processing etc.)
- Experience with profiling CPU & GPU code in PyTorch, including Nvidia Nsight or similar.
- Experience writing & improving highly parallel & distributed PyTorch code, with familiarity in DDP, FSDP, Tensor Parallel, etc.
- Experience writing high-performance parallel C++. Bonus if done within an ML context with PyTorch, like for data loading, data processing, inference code.
- Experience with high-performance Triton / CUDA and writing custom PyTorch kernels. Top candidates will be able to utilize tensor cores; optimize performance with CUDA memory and other similar skills.
- Good to have experience working with Deep learning concepts such as Transformers & Multimodal Generative models such as Diffusion Models and GANs.
- Good to have experience building inference / demo prototype code (incl. Gradio, Docker etc.)
Compensation
- The pay range for this position in California is $180,000 - $250,000yr; however, base pay offered may vary depending on job-related knowledge, skills, candidate location, and experience. We also offer competitive equity packages in the form of stock options and a comprehensive benefits plan.
Job stats:
0
0
0
Categories:
Engineering Jobs
Research Jobs
Tags: CUDA DDP Deep Learning Diffusion models Distributed Systems Docker FSDP GANs Generative AI Generative modeling GPU Gradio Machine Learning Model inference Python PyTorch Research Transformers
Perks/benefits: Career development Competitive pay Equity / stock options Salary bonus
Region:
North America
Country:
United States
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.
BI Developer jobsData Engineer II jobsSr. Data Engineer jobsStaff Data Scientist jobsPrincipal Data Engineer jobsPrincipal Software Engineer jobsStaff Machine Learning Engineer jobsData Science Manager jobsBusiness Intelligence Analyst jobsData Science Intern jobsData Manager jobsSoftware Engineer II jobsDevOps Engineer jobsData Specialist jobsJunior Data Analyst jobsData Analyst Intern jobsSr. Data Scientist jobsBusiness Data Analyst jobsStaff Software Engineer jobsLead Data Analyst jobsAI/ML Engineer jobsSenior Backend Engineer jobsData Governance Analyst jobsData Engineer III jobsResearch Scientist jobs
NLP jobsAirflow jobsOpen Source jobsMLOps jobsTerraform jobsKPIs jobsLinux jobsEconomics jobsKafka jobsJavaScript jobsNoSQL jobsData Warehousing jobsComputer Vision jobsGoogle Cloud jobsGitHub jobsRDBMS jobsPostgreSQL jobsScikit-learn jobsPhysics jobsData warehouse jobsStreaming jobsR&D jobsHadoop jobsScala jobsdbt jobs
Banking jobsPandas jobsBigQuery jobsClassification jobsLooker jobsReact jobsOracle jobsScrum jobsPySpark jobsCX jobsDistributed Systems jobsRAG jobsMicroservices jobsPrompt engineering jobsRedshift jobsELT jobsIndustrial jobsJira jobsRobotics jobsTypeScript jobsGPT jobsOpenAI jobsSAS jobsLangChain jobsLambda jobs