Senior Software Engineer, RL Post-Training Frameworks
USD 184K-356K Senior-level Full Time
Tasks
- Architect RL post training infrastructure
- Build distributed RL training inference rollout loops
- Coordinate actor critic and reward models across heterogeneous hardware
- Design distributed systems failure recovery and recovery approaches
- Implement fault tolerance elastic scaling and fast restarts
- Improve open source RL frameworks
- Integrate CPU driven rollout workloads for tool use code execution and agent environments
- Partner with framework owners and research teams
- Tune performance for GPUs CPUs and LPUs
Perks/Benefits
Skills/Tech-stack
Actor Based Programming | C# | C++ | Consistency models | DPO | DeepSpeed | Distributed Systems | FSDP | FSDP2 | Failure recovery | GRPO | High Performance | High-Performance Computing | High-performance inference | Infiniband | Kubernetes | LLM post training | MOE | Megatron-LM | Mixed Precision | NCCL | NVLink | PPO | Performance Computing | Pipeline parallelism | Post-training | PyTorch | Python | Quantization aware training | RLHF | Ray | Reinforcement Learning | Reinforcement Learning for LLM Post Training | Reward Modeling | Service boundaries | Task-based programming | Tensor Parallelism | TensorRT-LLM | VLLM
Education
Regions
Countries
States
Cities
Related jobs
-
Featured Feat. Applied AI Engineer - Bay Area USD 211K-263KArtificial Intelligence | C plus plus | C# | Embeddings | Feature Engineering401k | Comprehensive health and wellness benefits | Learning and development opportunities | Unlimited time offMid-level Full TimeHQ (San Francisco)24d ago
-
Ai Engineer USD 100K-150KAI Agents | API Development | AWS | AWS Bedrock | Agentic Workflows401k | Commuter benefits | Dental insurance | Disability coverage | EAPMid-level Full TimeColumbia, MD, United States10h ago
-
Senior-level Contract Full TimeReston, VA, United States11h ago
-
Senior Data Engineer (Remote) USD 155KAgile | Apache Spark | BigQuery | Cassandra | Data Governance401k match | Dental insurance | Employee assistance program | Employee stock purchase plan | Flexible scheduleSenior-level Full TimeWork From Home, United States R13h ago
-
Senior AI Operations Engineer USD 170K-180KAI infrastructure | Azure | CI/CD | Cloud infrastructure | Container Engine for Kubernetes401k match | Employee assistance program | Employee stock purchase plan | Flexible schedule | Flexible spending accountSenior-level Full TimeWork From Home, United States R13h ago
-
API Development | Airflow | Automated retraining | CI/CD | Cloud PlatformsEquityMid-level Full TimeNaples, United States15h ago
-
Adversarial Machine Learning | Anomaly Detection | Cloud Security | Machine Learning | PythonSecurity clearance premiumsMid-level Full TimeNaples, United States15h ago
-
API Design | AWS | AWS Cloud | AWS Cloud Development Kit | AWS cloud developmentSenior-level ContractGlendale, United States16h ago
-
Mid-level Full TimeUS-Kansas-Wichita17h ago
-
Senior-level Full TimeCincinnati, OH, United States17h ago
-
Delivery Senior Consultant, Data Engineering and Gen AI USD 119K-208K.NET | AWS | Agentic AI | Agile | AngularSenior-level Full TimeGilbert, Arizona, United States; Lake Mary, …17h ago
-
Software Engineer/Researcher, AI-Native Database Systems USD 156K-387KC++ | Database Architecture | Distributed Systems | Indexing | Information RetrievalSenior-level Full TimeSan Jose, California, United States17h ago
-
Software Engineer Level 1 -FFNN-8889 USD 78K-250KAccumulo | BSON | Bigtable | Distributed Systems | HBase401k match | Employee referral programs | FSA | Flexible work arrangements | Mental health supportMid-level Full TimeHanover, MD17h ago
-
Software Engineer Level 2 -FFNN-8890 USD 78K-250KAccumulo | BSON | Bigtable | Database Design | Development Lifecycle401k match | Dental insurance | Employee referral programs | Flexible spending accounts | Flexible work arrangementsMid-level Full TimeHanover, MD17h ago
-
Data Pipelines | Data Storage | Distributed Systems | High Performance | High-Performance ComputingCareer growthEntry-level Full TimeSan Jose, California, United States17h ago
-
Agent architecture | Backend Development | Document ingestion | Frontend Development | IndexingCross-functional collaboration | Hands-on experience | MentorshipEntry-level InternshipSan Jose, California, United States17h ago
-
Apache Flink | Apache Spark | Automation | C++ | Cause analysisSenior-level Full TimeSan Jose, California, United States17h ago
-
Cost estimation | Distributed Caches | Distributed Systems | Document Databases | Embedding IngestionSenior-level Full TimeSeattle, Washington, United States17h ago
-
Research Engineer / Scientist - Storage for LLM USD 156K-387KAttention Mechanisms | CUDA | Caching | Distributed Systems | Eviction policiesCompetitive compensation | Conference attendance | Generous research resources | Innovation-driven culture | Open source contributionsEntry-level Full TimeSan Jose, California, United States17h ago
-
Agentic data | Apache Hive | Apache Spark | Coding Data | Data CurationSenior-level Full TimeMenlo Park, CA18h ago
-
Staff Software Engineer, Agentic AI, Trust and Safety USD 207K-301KAgentic AI | Anti-abuse | Anti-abuse systems | Architecture ownership | Artificial IntelligenceSenior-level Full TimeKirkland, WA, USA18h ago
-
Cloud Data and AI Engineer, Professional Services USD 127K-183KBigtable | C++ | Cloud Databases | Cloud SQL | Cloud platformTravel up to 30%Mid-level Full TimeReston, VA, USA18h ago
-
Senior Software Engineer, AI/ML, Google Cloud AI USD 174K-253KC++ | Data Processing | Data Structures | Data Structures and Algorithms | DebuggingSenior-level Full TimeKirkland, WA, USA; Sunnyvale, CA, USA18h ago
-
Customer Engineer II, Applied AI, Google Cloud USD 148K-216KC++ | Cloud Architecture | Conversational AI | Document AI | Generative AITravel opportunitiesSenior-level Full TimeSunnyvale, CA, USA; Chicago, IL, USA18h ago
-
Software Engineer III, AI/ML GenAI, Google Ads USD 147K-211KC plus plus | Data Processing | Debugging | Generative AI | Language ProcessingSenior-level Full TimeMountain View, CA, USA18h ago