Senior Software Engineer- ML Network Stack, ML Network Stack - Annapurna Labs
Tasks
- Automate software delivery using CI/CD tools
- Build and maintain network stack for distributed AI ML systems
- Create alerting mechanisms for functional performance regressions
- Design scalable and reliable system architectures
- Develop dashboards for performance data analysis
- Monitor and report infrastructure functionality and performance
- Support multiple instance types and software stacks
- Write Python for cluster benchmarking and application runs
Perks/Benefits
Skills/Tech-stack
AWS | Amazon Athena | Amazon Managed Grafana | CI/CD | HPC | High speed | High-Speed Networking | Linux | NCCL | NCCL GIN | NIXL | NVSHMEM | Networking | Python | RDMA
Education
Related jobs
-
3D Reconstruction | AWS | Computer Vision | Data Structures | Deep learningAgile team collaboration | MentorshipEntry-level Full TimeHerzliya, Tel Aviv District, IL20h ago
-
Mid-level Full TimeGiv'atayim, Tel Aviv District, IL20h ago
-
Mid-level Full TimeTel Aviv-Yafo, Tel Aviv District, IL20h ago
-
Mid-level Full TimeTel Aviv-Yafo, Tel Aviv District, IL21h ago
-
Mid-level Full TimeHolon, Center District, IL1d ago
-
Argo CD | Azure DevOps | Azure Key Vault | Azure Kubernetes | Azure Kubernetes ServiceSenior-level Full TimeTel-Aviv, Israel, IL1d ago
-
Senior-level Full TimeRamat Gan, Israel, IL1d ago
-
Senior-level Full TimeIsrael, Yokneam1d ago
-
Senior-level Full TimeTel Aviv-Yafo, Tel Aviv, ISR1d ago
-
Bash | Data Processing | Docker | GCP | Infrastructure as CodeAsynchronous culture | Friendly work environment | Remote-friendly environmentMid-level Full TimeTel Aviv, Israel1d ago
-
Senior AI Engineer ILS 336K-504KAPI Development | AWS | Agentic Workflows | CI/CD | ClaudeCommute subsidy | Employee assistance program | Employee resource groups | Employee stock ownership | Generous vacationSenior-level Full TimeTel Aviv, Israel2d ago
-
Senior-level Full TimeCaesarea, North industrial park, IL2d ago
-
AWS | Amazon Web Services | Apache Spark | Cloud infrastructure | Data PipelinesSenior-level Full TimeTel Aviv, Israel2d ago
-
Mid-level Full TimeTel Aviv, Israel2d ago
-
Mid-level Full TimeLod, IL2d ago
-
Senior-level Full TimePetah Tikva, Petah Tikva, IL2d ago
-
AWS Glue | AWS Lake Formation | AWS Lambda | AWS Step Functions | AgileCareer growth | Collaborative work environment | Exposure to advanced AWS data and AI ML technologies | Flexible work culture | Fully remote workSenior-level Full TimeIsrael R3d ago
-
Agentic Workflows | CRUD | Conversational AI | End to End | End-to-End TestingExposure to advanced AI deployments | Fast iteration cycles | Flexible work schedule | Fully remote | High technical autonomySenior-level Full TimeIsrael R4d ago
-
Senior SW Engineer – AI Infrastructure & Optimization USD 184K-300KCUDA | Cloud Platforms | GPU Performance | GPU Performance Optimization | Gateway APISenior-level Full TimeIsrael, center, IL4d ago
-
Senior-level Full TimeRamat Gan, Tel Aviv District, IL4d ago
-
Mid-level Full TimeJerusalem, Israel5d ago
-
3D Slicer | 3D U Net | Computer Vision | Deep learning | Image ProcessingEntry-level Part TimeIsrael-Caesarea Granit5d ago
-
Machine learning operations engineer ILS 341K-443KA/B | A/B Testing | ABAC | AWS | AutoscalingBonus plan | Hybrid working | Private medical insurance | Volunteering programsSenior-level Full TimeTel Aviv-Yafo, Tel Aviv District, Israel5d ago
-
Senior-level Full TimePetah-Tikva, IL5d ago
-
Mid-level Full TimePetah Tikva, Center District, IL5d ago