Principal ML Engineer (Infra/hardware)

Poland

Full Time Senior-level / Expert EUR 87K - 162K *

Neurons Lab

Welcome to Neurons Lab. We support fast-growing companies seeking AI solutions through collaboration.

View all jobs at Neurons Lab

Apply now Apply later

Posted 7 hours ago

About the project

We're looking for an experienced ML Infrastructure Engineer who has successfully implemented large-scale ML infrastructure optimization projects. The primary focus is migrating and optimizing computer vision models from Nvidia GPU-based infrastructure to AWS Inferentia/Trainium while getting performance boost and cost reduction.

Current Infrastructure:

ML Models: RetinaFace, OpenPose, CLIP, and other CV models
Hardware: A10/T4 GPUs on EKS
Serving: Triton Inference Server
Orchestration: Mix of Kubernetes and Ray

Stage: Presale and Delivery

Duration: 2 months (preliminary)

Capacity: part-time (20h/week)

Areas of Responsibility

Technical Leadership:
- Lead the architecture design for ML infrastructure modernization
- Define compilation and optimization strategies for model migration
- Establish performance benchmarking framework
- Set up monitoring and alerting for the new infrastructure
Performance Optimization:
- Implement efficient model compilation pipelines for Inferentia2
- Optimize batch processing and memory layouts
- Fine-tune model serving configurations
- Ensure latency requirements are met across all services
Cost Optimization:
- Analyze and optimize infrastructure costs
- Implement efficient resource allocation strategies
- Set up cost monitoring and reporting
- Achieve target cost reduction while maintaining performance

Skills

Proven track record of ML infrastructure optimization projects
Hands-on experience with AWS Neuron SDK and Inferentia/Trainium deployment
Deep expertise in PyTorch model optimization and compilation
Experience with high-throughput computer vision model serving
Production experience with both Kubernetes and Ray for ML workloads

Knowledge

Model Optimization Expertise:
- Deep understanding of ML model architecture optimization
- Experience with model compilation techniques for specialized hardware (Inferentia/Trainium)
- Proficiency in optimizing computer vision models (CNN architectures)
- Knowledge of model serving optimization patterns
Performance Optimization:
- Advanced understanding of ML model inference optimization
- Expertise in batch processing strategies
- Memory layout optimization for vision models
- Experience with pipeline parallelism implementation
- Proficiency in latency/throughput optimization techniques
Hardware Acceleration:
- Deep knowledge of ML accelerator architectures
- Understanding of hardware-specific optimizations
- Experience with model compilation for specialized chips
- Proficiency in memory access pattern optimization
Performance Analysis:
- Proficiency in ML model profiling tools
- Experience with performance bottleneck identification
- Knowledge of performance monitoring techniques
- Ability to analyze and optimize inference patterns

Nice to Have:

Experience with Ray architecture for ML serving
Knowledge of distributed ML systems
Understanding of ML pipeline optimization
Experience with model quantization techniques

Experience

Model Optimization (4+ years):
- Proven track record of optimizing large-scale ML inference systems
- Successfully implemented hardware-specific model optimizations
- Demonstrated experience with computer vision model optimization
- Led projects achieving significant performance improvements
Proven Results (Examples):