Software Dev Engineer II, Stores Foundational AI -SFAI
Tasks
- Build observability systems for training dynamics and experiment tracking
- Build scalable data infrastructure for ingestion processing and delivery
- Collaborate to improve training efficiency and reliability
- Design training system for model training and reinforcement learning
- Develop system design and technical roadmap
- Implement RL post training pipelines rollout reward optimization
- Monitor and tune RL training stability using PPO GRPO RLOO
- Optimize RL post training efficiency for GPU utilization and batching
- Profile and eliminate compute networking and storage bottlenecks
- Run experiments and iterate to unblock research progress
- Translate RL algorithms into scalable production systems
Perks/Benefits
Skills/Tech-stack
Async Rollouts | Batching | C++ | CUDA | Data Delivery | Data Ingestion | Data Processing | Distributed Training | Experiment tracking | Fault Tolerance | GPU Optimization | GRPO | Generative AI | KL divergence | Kernel development | Language Models | Large Language Models | Learning algorithms | Machine Learning | Memory Management | Monitoring | Observability | PPO | Parallel Computing | RLOO | Reinforcement Learning | Reinforcement Learning Stability | Reinforcement learning algorithms | Reward Optimization | Sequence Packing | Training pipelines | Transformers
Education
N/A
Related jobs
-
Featured Feat. Associate Director, Data Labs USD 167K-167KAWS | Cloud Computing | Compute Infrastructure | Data Analysis | LLM GovernanceConference speaking opportunities | Hybrid work schedule | Media appearancesSenior-level Full TimeWashington, District of Columbia, 20004, United … R3d ago
-
Solutions Engineer – Robotics & Autonomous Driving USD 152K-500K2D Imaging | 3D Imaging | Autonomous Driving | C++ | CameraCommuter stipend | Generous PTO | Global travel | Health, dental, vision coverage | Learning and development stipendSenior-level Full TimeSan Francisco, CA; New York, NY9h ago
-
Machine Learning Research Engineer USD 161K-189KAWS | Azure | Bias Variance | Bias-Variance Tradeoff | C++Flexible hybrid work model | Mental health counseling | Mentorship programs | Paid parental leave | Paid volunteer time offMid-level Full TimeNew York, US, New York11h ago
-
Sr. Application Software Engineer, Data Analytics USD 160K-225KAngular | C# | CI/CD | Computer Vision | Continuous integrationExtended hours | Travel | Weekend workSenior-level Full TimeBastrop, TX12h ago
-
Mid-level Full TimeEl Segundo13h ago
-
Senior-level Full TimeSan Francisco, California13h ago
-
Senior-level Full TimeOnsite - Austin, TX13h ago
-
AWS Bedrock | Agent systems | Anthropic API | Autogen | Azure401k matching program | Adoption Assistance | Development and career growth opportunities | Fertility treatments | Flexible work schedulesSenior-level Contract Full TimeRemote, OR, United States R14h ago
-
Robotics Test Engineer USD 100K-150KASan | Bitbucket Pipelines | C++ | CI/CD | DebuggingDirect hire | Mentorship | Ownership of test suitesMid-level Full TimeOnsite - Austin, TX15h ago
-
Staff Data Engineer USD 185K-220KAWS | Apache Airflow | Apache Kafka | Benthos | Big DataDental insurance | Disability insurance | Flexible work hours | Health insurance | Health savings accountSenior-level Full TimeRosslyn, VA or Remote R17h ago
-
Principal Engineer - Data Platform USD 221K-387KAWS | Airflow | Apache Hive | Apache Iceberg | Apache ImpalaRemote workSenior-level Full TimeSanta Clara, California, United States R18h ago
-
AI Engineer 1 - Platform Integration & AI/Data USD 80K-150KAWS | Agentic Systems | Backend Development | Backend Services | CI/CD401k match | Dental insurance | Holidays | Medical insurance | Paid time offEntry-level Full TimeWashington, DC18h ago
-
Senior Machine Learning Engineer, Agentic AI USD 163K-245KAI Observability | Agent systems | Autonomous Agents | Benchmark Datasets | Distributed SystemsHealth insurance | Life and disability insurance | Lifestyle Benefits Account | Mental health benefits | Paid time offSenior-level Full TimeBellevue, WA; Menlo Park, CA; New …19h ago
-
Blob Storage | C# | C++ | Distributed Caches | Distributed Caching401k | Dental insurance | Flexible time off | Health insurance | Paid parental leaveSenior-level Full TimeAliso Viejo, California, United States19h ago
-
Access layers | Blob Storage | C# | C++ | Caching Strategies401k | Dental insurance | Health insurance | Unlimited Flex Time Off | Vision insuranceSenior-level Full TimeNew York, New York, United States19h ago
-
Agile | Automated testing | CI/CD | Cloud Computing | CrewAIDental insurance | Health insurance | Vision insuranceMid-level Full TimeAshburn, VA, United States20h ago
-
Senior Data Engineer - Databricks USD 180K-248KAWS | Access Control | Amazon Web Services | Apache Spark | Automated testing401k match | Corporate Benefit Program | Discounted pet insurance | Educational resources | Employee Referral Bonus ProgramSenior-level Full TimeUS - Remote R21h ago
-
AI Machine Learning Skill 2-FFPP-8904 USD 78K-250KC# | Data Governance | Data Modeling | Data pipeline | Java401k plan with company match | Dental insurance | Diverse inclusive workplace | Employee referral programs | Flexible spending accountsMid-level Full TimeHanover, MD21h ago
-
AWS | AWS SageMaker | Azure | Cloud Pak for Data | Cloud infrastructureAccess to national security mission work | Hybrid work | Travel opportunitiesSenior-level Full TimeUSA-VA-Herndon22h ago
-
AI-assisted software development | AWS | Agentic AI | Azure | Cloud ComputingSenior-level Full TimeUSA-VA-Herndon22h ago
-
AI Engineer USD 180KAgent Orchestration | Cost Management | Data Pipelines | Distributed Systems | LLM401k | Commuter benefits | Dental insurance | Flexible spending | Health insuranceMid-level Full TimeNew York, New York, United States …22h ago
-
Embedded Firmware Engineer USD 70K-76KARM Cortex | ARM Cortex-M | Agile | C# | C++Dental insurance | Educational assistance | Flexible spending account | Health insurance | Health savings accountMid-level Full TimeNeenah, Wisconsin22h ago
-
Machine Learning Leader - Optical Solutions USD 180K-300KAnomaly Detection | Data analytics | Image Processing | Java | Machine LearningAdoption Assistance | Disability insurance | Educational assistance | Flexible spending account | Health savings accountSenior-level Full TimeFremont, California23h ago
-
Process and Analytics Engineer USD 105K-140KAgile | Anomaly Detection | Asset Framework | HYSYS | HYSYS OnlineDental insurance | Disability insurance | Educational assistance | Flexible spending account | Health insuranceMid-level Full TimeWichita, Kansas23h ago
-
AI Architect USD 134K-237KAI Search | AI Security | API Gateway | API Integration | AWS BedrockAdoption Assistance | Dental insurance | Disability insurance | Educational assistance | Flexible spending accountsSenior-level Full TimeHouston, Texas | Tulsa, Oklahoma | …23h ago