Principal ML Engineer (Infra/hardware)
Poland
Neurons Lab
Welcome to Neurons Lab. We support fast-growing companies seeking AI solutions through collaboration.About the project
We're looking for an experienced ML Infrastructure Engineer who has successfully implemented large-scale ML infrastructure optimization projects. The primary focus is migrating and optimizing computer vision models from Nvidia GPU-based infrastructure to AWS Inferentia/Trainium while getting performance boost and cost reduction.
Current Infrastructure:
ML Models: RetinaFace, OpenPose, CLIP, and other CV models
Hardware: A10/T4 GPUs on EKS
Serving: Triton Inference Server
Orchestration: Mix of Kubernetes and Ray
Stage: Presale and Delivery
Duration: 2 months (preliminary)
Capacity: part-time (20h/week)
Areas of Responsibility
Technical Leadership:
Lead the architecture design for ML infrastructure modernization
Define compilation and optimization strategies for model migration
Establish performance benchmarking framework
Set up monitoring and alerting for the new infrastructure
Performance Optimization:
Implement efficient model compilation pipelines for Inferentia2
Optimize batch processing and memory layouts
Fine-tune model serving configurations
Ensure latency requirements are met across all services
Cost Optimization:
Analyze and optimize infrastructure costs
Implement efficient resource allocation strategies
Set up cost monitoring and reporting
Achieve target cost reduction while maintaining performance
Skills
Proven track record of ML infrastructure optimization projects
Hands-on experience with AWS Neuron SDK and Inferentia/Trainium deployment
Deep expertise in PyTorch model optimization and compilation
Experience with high-throughput computer vision model serving
Production experience with both Kubernetes and Ray for ML workloads
Knowledge
Model Optimization Expertise:
Deep understanding of ML model architecture optimization
Experience with model compilation techniques for specialized hardware (Inferentia/Trainium)
Proficiency in optimizing computer vision models (CNN architectures)
Knowledge of model serving optimization patterns
Performance Optimization:
Advanced understanding of ML model inference optimization
Expertise in batch processing strategies
Memory layout optimization for vision models
Experience with pipeline parallelism implementation
Proficiency in latency/throughput optimization techniques
Hardware Acceleration:
Deep knowledge of ML accelerator architectures
Understanding of hardware-specific optimizations
Experience with model compilation for specialized chips
Proficiency in memory access pattern optimization
Performance Analysis:
Proficiency in ML model profiling tools
Experience with performance bottleneck identification
Knowledge of performance monitoring techniques
Ability to analyze and optimize inference patterns
Nice to Have:
Experience with Ray architecture for ML serving
Knowledge of distributed ML systems
Understanding of ML pipeline optimization
Experience with model quantization techniques
Experience
Model Optimization (4+ years):
Proven track record of optimizing large-scale ML inference systems
Successfully implemented hardware-specific model optimizations
Demonstrated experience with computer vision model optimization
Led projects achieving significant performance improvements
Proven Results (Examples):
Successfully optimized computer vision models similar to RetinaFace/CLIP
Achieved significant cost reduction while maintaining performance
Implemented efficient batch processing strategies
Developed performance monitoring and optimization frameworks
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Architecture AWS Computer Vision GPU Kubernetes Machine Learning ML infrastructure ML models Model inference Pipelines PyTorch
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.