Sr. ML Platform Engineer (Hybrid)
Tasks
- Build observability solutions
- Conduct post-mortems
- Configure alerting workflows
- Debug memory leaks
- Debug resource contention
- Debug scheduling conflicts
- Develop runbooks
- Diagnose distributed systems issues
- Implement automated health checks
- Improve HPC cluster utilization
- Maintain platform reliability metrics
- Mentor engineers on debugging techniques
- Optimize GPU allocation
- Optimize Ray clusters
- Optimize SLURM job scheduling
- Optimize Spark jobs
- Optimize resource allocation
- Perform root cause analysis
- Profile performance bottlenecks
- Resolve production incidents for inference pipelines
- Resolve production incidents for training pipelines
- Troubleshoot JupyterHub spawner issues
- Troubleshoot kernel crashes
Perks/Benefits
- Employee networks
- On-call support
- Paid adoption leave
- Paid parental leave
- Professional development
- Vacation and holidays
- Volunteer opportunities
- Wellness programs
Skills/Tech-stack
AWS | Airflow | Apache Spark | CUDA | Capacity Planning | Chaos Engineering | Debugging | Distributed tracing | Docker | Google Cloud | Grafana | JupyterHub | Kubeflow | Kubernetes | Linux | Log Aggregation | MLflow | Microsoft Azure | OCI | Observability | Performance Tuning | Profiling | Prometheus | Python | Ray | Slurm | Unix
Education
N/A
Related jobs
-
Practice Customer Engineer, Data Analytics INR 1200K-2000KApache Spark | Batch Processing | C++ | Cloud platform | DNSSenior-level Full TimeBengaluru, Karnataka, India; Mumbai, Maharashtra, India1h ago
-
Practice Customer Engineer, Data Analytics INR 1200K-2000KApache Spark | C++ | DMZ | DNS | Data LakeEqual opportunity work environment | Travel as requiredSenior-level Full TimeMumbai, Maharashtra, India; Bengaluru, Karnataka, India1h ago
-
Data Engineer INR 938K-1200KAzure Data | Azure Data Factory | Azure Databricks | Data Factory | ETLEmployee Assistance Program (EAP) | Flexible working environment | LinkedIn Learning | Volunteer time offMid-level Full TimeChennai, TN, India5h ago
-
Software Engineer + Gen AI (Fresher) INR 300K-540KAI Model Integration | AI model | API Integration | Algorithms | Data StructuresEntry-level Full TimePune, MH, India6h ago
-
Applied AI ML Associate Senior INR 1050K-1250KAWS | Big Data | CI/CD | Continuous Delivery | Continuous integrationMid-level Full TimeHyderabad, Telangana, India6h ago
-
Data Engineer III Mainframe DB2/IMS DBA INR 2486K-4144KAtlassian Confluence | Atlassian Jira | BMC Recovery Manager | Backup and Recovery | CA Platinum24x7 rotational supportSenior-level Full TimeHyderabad, Telangana, India7h ago
-
Software Engineer III - Data Engineer INR 1500K-2146KAWS | Apache Airflow | Azure | Big Data | Data GovernanceSenior-level Full TimeMumbai, Maharashtra, India7h ago
-
CI/CD | Couchbase | Design Patterns | Docker | GuiceSenior-level Full TimeGurugram 8 B, India12h ago
-
Sr. Azure Data Engineer INR 1500K-2000KAWS | Azure | Azure Data | Azure Data Factory | Azure Data LakeSenior-level Full TimeINDIA - BENGALURU - HP, IN12h ago
-
Mid-level Full TimeBangalore, KA, IN, 56214912h ago
-
Mid-level Full TimeIndia - Hyderabad12h ago
-
Senior-level Full TimeIndia - Hyderabad12h ago
-
Mid-level Full TimeIndia - Hyderabad12h ago
-
Mid-level Full TimeIndia - Hyderabad12h ago
-
Senior-level Full TimeIndia - Hyderabad12h ago
-
Mid-level Full TimeIndia - Hyderabad12h ago
-
Senior Engineer Data - MDM INR 1628K-2146KAgile | Amazon Web Services | Azure | Cloud Platforms | Cloud platformFlexible work environment | Internal mobility | Volunteering opportunities | Well-being | Work-life balanceSenior-level Full TimeKA Bangalore, India12h ago
-
Lead Engineer Data - MDM INR 1500K-2040KAPIs | Agile | Amazon Web Services | Azure | Cloud platformFlexible work environment | Internal mobility | Volunteering opportunities | Well-being | Work-life balanceSenior-level Full TimeKA Bangalore, India12h ago
-
TTT - GTP IT - Manager INR 1500K-2500KAI Agents | Alteryx | Automation frameworks | Cloud Platforms | Data ModelingMid-level Full TimeBengaluru, KA, IN, 56001612h ago
-
Agentic AI Engineer INR 2500K-5000KAPI Integration | Agent systems | Amazon Web Services | Azure | Backend DevelopmentSenior-level Full TimePune, MH, IN, 41101812h ago
-
Machine Learning Engineer, Chakra INR 2000K-4600KBenchmarking | Conversational AI | Data Pipelines | Deep learning | DockerMid-level Full TimeHybrid in Bangalore, India R16h ago
-
Technical Support Engineer INR 850K-1100KAmazon Web Services | Apache Hadoop | Apache Spark | Bash | Cloud platformFlexible paid time offMid-level Full TimeHyderabad, India19h ago
-
Senior-level Full TimeIndia20h ago
-
Software Engineer Lead (Python + Snowflake) INR 2040K-3100KAirflow | CI/CD | DBT | Dagster | Data ModelingSenior-level Full TimeIndia (Noida)20h ago
-
Principal Consultant-Manager, GenAI Engineer (Cloud: GCP) INR 2520K-3380KAgent Development | Agent Development Kit | Agent systems | Artificial Intelligence | Automated testingContinuous learning opportunities | MentorshipSenior-level Full TimeIndia-Hyderabad23h ago