Find jobs in AI/ML, Data Science and Big Data
26 results
for Checkpointing
(Skill/Tech stack)
-
Machine Learning Engineer 4 INR 2475K-4500KAttention Mechanism | Automated retraining | CI/CD | Checkpointing | DDPSenior-level Full TimeNoida, India R1d ago
-
Senior Machine Learning Engineer, Model Training & Evaluation INR 2500K-4500KBenchmarking | Checkpointing | DeepSpeed | Distributed Training | Experiment trackingAccidental insurance | Flexible hours | Hybrid work | Life insurance | Medical insuranceSenior-level Full TimeBangalore, India (Hybrid) R1d ago
-
Automation and AI Solutions Lead INR 2500K-3500KAWS AgentCore | AWS Bedrock | Agent systems | Automation | CI/CDFlex My Way | Flexible vacation | Headspace app access | Hybrid work model | Mental health daysSenior-level Full TimeIndia, Hyderabad, Telangana R2d ago
-
Checkpointing | Data-Driven Optimization | Data-driven | Distributed Training | Fault ToleranceMid-level Full TimeSan Jose, California, United States2d ago
-
Automation and AI Solutions Lead INR 2500K-3500KAWS Bedrock | Agent systems | Amazon AgentCore | Async Programming | CI/CDFlexible vacation | Headspace access | Hybrid work | Mental health days | Retirement savingsSenior-level Full TimeIndia, Bengaluru, Karnataka R3d ago
-
Principal Data Engineer USD 160K-165KAWS | AWS Glue | AWS Lambda | AWS MSK | Amazon Aurora401k match | Medical, dental & vision coverage | Paid Holidays | Paid time off | Parental leaveSenior-level Full TimeWaltham, MA, United States7d ago
-
Research Engineer CHF 103K-145KCheckpointing | Data Preprocessing | Deep learning | Distributed Systems | Experiment trackingEnglish-speaking environment | Flexible working hours | Relocation packageMid-level Full TimeZurich7d ago
-
Architect - Data INR 1500K-2040KApache Flink | Apache Iceberg | Apache Kafka | Audit Logging | BigLake MetastoreSenior-level Full TimeIN MH Mumbai Eureka, India10d ago
-
Digital Factory - Data Engineer - Assistant Director EUR 75K-107KAccess Control | Airflow | Automated testing | Azure | Azure DataCareer development and training | Diverse and inclusive culture | Flexible work environment | Global team collaborationExecutive-level Full TimeLuxembourg, LU, L-185510d ago
-
Research Engineer, Frontier Speculative Decoding USD 190K-270KCheckpointing | DeepSpeed | Distributed Training | FSDP | GPU clustersEquity | Health insuranceMid-level Full TimeSan Francisco, New York City13d ago
-
Senior Machine Engineer, ML Systems and Infrastructure USD 146K-235KAWS | Apache Airflow | Apache Spark | Azure | CI/CDFully remote friendly | Mentorship and knowledge-sharingSenior-level Full TimeAMER - United States - Massachusetts … R15d ago
-
Automation and AI Solutions Lead INR 2500K-3500KAWS AgentCore | AWS Bedrock | Agent systems | Async Programming | CI/CDEmployee incentive programs | Flexible vacation | Flexible work arrangements | Headspace app access | Hybrid work modelSenior-level Full TimeIndia, Bengaluru, Karnataka R16d ago
-
Senior Product Manager, Replication & Storage Engines CAD 112K-155KCache Management | Checkpointing | Consensus Algorithms | Data durability | Database replicationAdoption Assistance | Backup child and elder care | Employee stock purchase program | Equity | Fertility assistanceSenior-level Full TimeAlberta; British Columbia; Manitoba; Nova Scotia; …23d ago
-
Senior Product Manager, Replication & Storage Engines USD 118K-231KB-Tree | Cache Management | Cause analysis | Checkpointing | Database replication401k plan | Employee stock purchase program | Equity | Fertility and adoption assistance | Flexible paid time offSenior-level Full TimeNew York City; Seattle23d ago
-
Senior Director, AI Model LifeCycle USD 301K-355KCheckpointing | Dataset versioning | Experiment tracking | Failure recovery | Fine Tuning401k match | Cell phone stipend | Commuter benefits | Dental insurance | HSA contributionsSenior-level Full TimeSan Francisco, CA - US26d ago
-
Research Engineer - LLM Infra training - Seed Infra USD 232K-427KCheckpointing | Data-Driven Optimization | Data-driven | Deep learning | Distributed TrainingMid-level Full TimeSeattle, Washington, United States27d ago
-
Lead / Staff Engineer, AI Agent Platform CNY 240K-480KAgent Orchestration | Asynchronous Concurrency | Budget Governance | Checkpointing | Context AssemblySenior-level Full TimeSuzhou28d ago
-
Applied AI ML Director - AGENT BUILDER PLATFORM USD 140K-195KA/B | A/B Testing | API Design | AWS Bedrock | AWS SageMakerBackup childcare | Financial coaching | Health care coverage | Mental health support | On-site health and wellness centersExecutive-level Full TimePalo Alto, CA, United States1mo ago
-
AI Systems Engineer USD 350K-600KArchitecture Design | CUDA | Checkpointing | Distributed Systems | Distributed TrainingIn-person collaboration | Visa sponsorshipMid-level Full TimeSan Francisco1mo ago
-
Research Engineer, Infrastructure USD 255K-400KC++ | Checkpointing | Compute efficiency | Data Pipelines | Data parallelismSenior-level Full TimeSan Francisco Bay Area1mo ago
-
Senior Data Engineer USD 172K-215KApache Airflow | Apache Flink | Apache Kafka | Apache Spark | BackpressureCommute subsidy | Competitive retirement pension plans | Comprehensive health life and disability insurance | Employee resource groups | Employee stock ownershipSenior-level Full TimeSan Francisco, CA, USA1mo ago
-
Senior Engineering Manager, AI Runtime USD 228K-297KCheckpointing | Cluster Lifecycle Management | Cluster lifecycle | DeepSpeed | Distributed TrainingSenior-level Full TimeMountain View, California; San Francisco, California1mo ago
-
Data Engineering Lead GBP 70K-95KAPI | AWS | Airbyte | BigQuery | CI/CDEquity options | Flexible working arrangements | Group life assurance | Hybrid working | Income protectionSenior-level Full TimeLondon1mo ago
-
Senior Software Engineer, Data Platform & AI Enablement USD 158K-260KAI infrastructure | Checkpointing | Distributed Systems | Exactly once | Exactly-once semanticsSenior-level Full TimeUS - San Francisco1mo ago
-
Senior Software Engineer, Data Platform & AI Enablement SGD 147K-180KApache Flink | Apache Spark | Checkpointing | Data Streaming | Distributed SystemsSenior-level Full TimeSG - Singapore1mo ago
-
ML Engineer, Open Source EUR 150K-200KBenchmarking | CI/CD | Checkpointing | Data Preprocessing | HuggingFace HubSenior-level Full TimeFreiburg or Berlin1mo ago