Senior Software Engineer, RL Post-Training Frameworks
USD 184K-356K Senior-level Full Time
Tasks
- Architect RL post training infrastructure
- Build distributed RL training inference rollout loops
- Coordinate actor critic and reward models across heterogeneous hardware
- Design distributed systems failure recovery and recovery approaches
- Implement fault tolerance elastic scaling and fast restarts
- Improve open source RL frameworks
- Integrate CPU driven rollout workloads for tool use code execution and agent environments
- Partner with framework owners and research teams
- Tune performance for GPUs CPUs and LPUs
Perks/Benefits
Skills/Tech-stack
Actor Based Programming | C# | C++ | Consistency models | DPO | DeepSpeed | Distributed Systems | FSDP | FSDP2 | Failure recovery | GRPO | High Performance | High-Performance Computing | High-performance inference | Infiniband | Kubernetes | LLM post training | MOE | Megatron-LM | Mixed Precision | NCCL | NVLink | PPO | Performance Computing | Pipeline parallelism | Post-training | PyTorch | Python | Quantization aware training | RLHF | Ray | Reinforcement Learning | Reinforcement Learning for LLM Post Training | Reward Modeling | Service boundaries | Task-based programming | Tensor Parallelism | TensorRT-LLM | VLLM
Education
Regions
Countries
States
Cities
Related jobs
-
Forward Deployed Engineer, Generative AI, Google Cloud USD 183K-265KAccelerators | C++ | Cloud Architecture | Data Pipelines | Distributed TrainingTravelSenior-level Full TimeAddison, TX, USA; Washington D.C., DC, …2h ago
-
Software Engineer, BigQuery Managed Storage USD 147K-211KAlgorithms | BigQuery | Change Data Capture | Cloud Computing | Data CaptureMid-level Full TimeKirkland, WA, USA2h ago
-
C++ | Data Structures | Data structures algorithms | Debugging | Google CloudMid-level Full TimeSunnyvale, CA, USA2h ago
-
C++ | Data Processing | Data Storage | Debugging | Distributed ComputingSenior-level Full TimeMountain View, CA, USA2h ago
-
Senior Software Engineer, AI/ML USD 174K-252KData Processing | Data Structures | Data Structures and Algorithms | Debugging | Distributed ComputingSenior-level Full TimeMountain View, CA, USA2h ago
-
Forward Deployed Engineer, Applied AI, Google Cloud USD 127K-183KAPI | CRM | Cloud platform | Conversational AI | Conversational agentsHigh-impact role | Opportunity to lead technical delivery | Travel up to 50 percentMid-level Full TimeNew York, NY, USA; Atlanta, GA, …2h ago
-
Software Engineer, AI System Hacker, GenAI, DeepMind USD 174K-252KArtificial Intelligence | C++ | CSS | Data Visualization | Generative AIMid-level Full TimeMountain View, CA, USA2h ago
-
Agent systems | Agentic Workflows | Cloud platform | Cost Per Request | CrewAISenior-level Full TimeNew York, NY, USA; Addison, TX, …2h ago
-
Software Engineer, Applied AI USD 147K-211KAI Agents | Cloud infrastructure | Data Processing | Debugging | Generative AIMid-level Full TimeSunnyvale, CA, USA2h ago
-
Data engineer , Machine Learning USD 170K-240KApache Airflow | Apache Spark | Audio Data | Coverage monitoring | DagsterEmployee Assistance Program (EAP) | Employer matching 401k | Health, dental, vision insurance | Unlimited PTOMid-level Full TimeSan Francisco7h ago
-
Data Platform Engineer II USD 100K-113KAzure Data | Azure Data Factory | Azure Key Vault | Azure Managed | Azure Managed IdentitiesDental insurance | Disability insurance | Health insurance | Life insurance | Unlimited paid time offMid-level Full TimeNeedham, MA7h ago
-
Senior Data Engineer USD 160K-207KDBT | Data Architecture | Data Governance | Data Observability | Data Quality401k match | Dental insurance | Equity | Family planning resources | Flexible vacation daysSenior-level Full TimeRemote - USA R9h ago
-
Systems Engineer (Network / Storage / Systems) USD 335K-455KAutomation | Bash | Cause analysis | Cluster management | Configuration ManagementHybrid work model | Relocation assistanceSenior-level Full TimeSan Francisco9h ago
-
Machine Learning Engineer USD 138K-183KAWS | Airflow | Amazon Redshift | Amazon SageMaker | Apache Flink401k retirement savings | Health care insurance | Occasional travel for meetings | Paid parental leave | Paid sick timeMid-level Full TimeRemote - US R10h ago
-
Staff Data Engineer USD 160K-207KAnomaly Detection | DBT | Data Governance | Data Modeling | Data Quality401k match | Dental insurance | Family planning resources | Flexible vacation policy | Fully remoteSenior-level Full TimeRemote - USA R11h ago
-
Senior Software Engineer, Data Engineering USD 164K-227KAWS | AWS Glue | Amazon EMR | Amazon Redshift | Apache Airflow401k match | Backup Child Care | Backup elder care | Backup pet care | Chime days paid time offSenior-level Full TimeSan Francisco, CA, USA12h ago
-
Database Engineer USD 162K-203KAWS | Backup and Restore | C++ | Cost Optimization | Data WarehouseCommuter stipend | Generous PTO | Health, dental, vision coverage | Learning and development stipend | Retirement benefitsSenior-level Full TimeSan Francisco, CA; New York, NY12h ago
-
Sr Analyst, Data Engineer USD 76K-143KAPI Integration | AWS | Agile | Batch Processing | CI/CD401k match | Business Casual Attire | Dental insurance | Health insurance | Hybrid work optionSenior-level Full TimeOhio - Columbus, Three Nationwide Plaza, … R13h ago
-
Software Engineer - Voice AI (Inference Runtime) USD 165K-330KAPI Development | CLI Development | Docker | Kubernetes | Language Processing401k matching | Fertility and family building stipend | Flexible PTO | Medical, dental, and vision insurance | Paid parental leaveSenior-level Full TimeSan Francisco13h ago
-
AI Search | AWS | AWS Bedrock | Azure | Azure AI401k | Dental insurance | Medical insurance | Paid sick hours | Vision insuranceSenior-level Contract Full TimeRidgefield Park, NJ, United States14h ago
-
Staff Machine Learning Engineer, Developer Platform USD 230K-322KA/B | A/B Testing | AWS | Automation | B testingCoaching benefits | Comprehensive healthcare benefits | Employer 401k match | Family planning support | Flexible vacationSenior-level Full TimeRemote - United States R14h ago
-
Autonomy and Robotics Software Engineer USD 125K-220KC++ | CI/CD | Classification | Computer Vision | Dataset versioningE-Verify enrollment | Health insurance | Professional development | Retirement plansMid-level Full TimeHuntington Beach14h ago
-
Senior Data Platform Engineer, Remote USD 142K-180KAWS | AWS Lambda | Access Control | Amazon Aurora | Amazon CloudWatch401k matching | Dental insurance | Flexible time off | Flexible work environment | Medical insuranceSenior-level Full TimeUnited States, UNITED STATES, United States R14h ago
-
Intern, Mission Optimization Engineer USD 70K-120KAgile | Bash | CI/CD | Datadog | DockerAccess to LinkedIn Learning | Commuter benefits | Internet reimbursement | Paid time offEntry-level InternshipSan Francisco, CA R14h ago
-
Autonomy and Robotics Software Engineer USD 125K-220KC++ | CI/CD | Embedded Systems | Fault detection | GNSSHealth insurance | Professional development | Retirement plansMid-level Full TimeHuntington Beach14h ago