ML Platform Engineer
Tasks
- Build autoscaling and capacity management
- Deliver end to end observability and error tracking
- Design model serving platforms
- Develop deployment workflows with canary releases
- Document operational procedures and performance tuning
- Implement caching prompt deduplication and response reuse
- Implement routing rate limiting and quality of service
- Implement security controls at the serving layer
- Integrate serving with API gateways and observability
- Operate inference services in production
- Optimize inference performance with batching and caching
- Perform incident response and reliability improvements
- Support model releases and capability rollouts
- Tune GPU utilization memory management and KV cache
Perks/Benefits
Skills/Tech-stack
API Gateway | Abuse detection | Automated rollback | Autoscaling | C++ | Caching | Canary Releases | Capacity Planning | Content Filtering | Continuous batching | Distributed Systems | Error Tracking | FinOps | GPU Architecture | Go | Identity Systems | Incident Response | Inference Optimization | KV cache | Kubernetes | Language Models | Large Language Models | Memory Management | Metrics | Observability | Performance Engineering | Python | Quality of Service | Rate Limiting | Request Multiplexing | Request Signing | Rust | Security controls | Shadow testing | Structured Logging | TensorRT-LLM | Tracing | VLLM
Education
Related jobs
-
Featured Feat. Associate Director, Data Labs USD 167K-167KAWS | Cloud Computing | Compute Infrastructure | Data Analysis | LLM GovernanceConference speaking opportunities | Hybrid work schedule | Media appearancesSenior-level Full TimeWashington, District of Columbia, 20004, United … R3d ago
-
AWS Bedrock | Agent systems | Anthropic API | Autogen | Azure401k matching program | Adoption Assistance | Development and career growth opportunities | Fertility treatments | Flexible work schedulesSenior-level Contract Full TimeRemote, OR, United States R9h ago
-
Data Engineer USD 74K-133KAgile | Apache Airflow | BigQuery | Cloud Composer | Cloud Data401k retirement plan | Dental insurance | Disability insurance | Flexible time off | Health insuranceMid-level Full TimeLisle, IL, United States R11h ago
-
API Testing | Cypher | Data Quality | DataOps | DevOpsBenefits | Competitive pay | Growth opportunity | Remote work | Travel requiredSenior-level Full TimeReston, VA, United States R13h ago
-
Principal Engineer - Data Platform USD 221K-387KAWS | Airflow | Apache Hive | Apache Iceberg | Apache ImpalaRemote workSenior-level Full TimeSanta Clara, California, United States R14h ago
-
AWS Glue | AWS Lambda | AWS S3 | Access Control | Data GovernanceCareer growth opportunities | Collaborative and inclusive work environment | Diverse and inclusive culture | Flexible work arrangements | Permanent remote working modelSenior-level Full TimeCanada R1d ago
-
Senior-level Full TimeUnited States - Remote R1d ago
-
Edge AI Engineer USD 100K-150KC++ | Core ML | Cross Platform Inference | Cross-platform | DSPCareer growth potential | Full-time remote work | H1B transfer supportSenior-level Full TimeUnited States - Remote R1d ago
-
AI Research Engineer USD 100K-150KAblation Studies | Accelerator hardware | Data Quality | Data labeling | Data quality monitoring100 percent remote | Career growth | Full-time employment | W2 employmentMid-level Full TimeUnited States - Remote R1d ago
-
AI Research Engineer USD 100K-150KAblation Studies | Accelerator hardware | Agentic Systems | Computer Vision | Data QualityMid-level Full TimeUnited States - Remote R1d ago
-
Hadoop Big Data Developer USD 100K-150KAWS EMR | Airflow | Apache Atlas | Apache Flink | Apache SparkRemote workSenior-level Full TimeUnited States - Remote R1d ago
-
Hadoop Big Data Developer USD 100K-150KAWS EMR | Airflow | Apache Atlas | Apache Flink | Apache HiveRemote workSenior-level Full TimeUnited States - Remote R1d ago
-
AI Data Engineer USD 100K-150KActive Learning | Apache Beam | CI/CD | Caching | Code review100 percent remote | Career growth | Full-time employment | H1B transfer support | W2 employmentMid-level Full TimeUnited States - Remote R1d ago
-
Engineer – Data Engineer III USD 86K-123KDimensional Modeling | Informatica PowerCenter | Perl | Python | SQLMentorship programs | Paid caregiver leave | Paid parental leave | Training programs | Volunteer activitiesSenior-level Full TimeUSA - PA - Conshohocken - … R1d ago
-
AI Data Engineer USD 100K-150KApache Beam | CI/CD | Code review | Data Lineage | Data Modeling100 percent remote | Career growthMid-level Full TimeUnited States - Remote R1d ago
-
LLM Engineer USD 100K-150KAdapter methods | DPO | Deep reinforcement learning | Distributed Training | Efficient AttentionBenefits | Career growth | Mentorship | Remote workMid-level Full TimeUnited States - Remote R1d ago
-
LLM Engineer USD 100K-150KDPO | Deep learning | Distributed Training | Efficient Attention | Efficient Fine TuningRemote workMid-level Full TimeUnited States - Remote R1d ago
-
Prompt Engineer USD 100K-150KAgent architecture | Agent architectures | Agentic Workflows | Chunking | Deterministic systemsLong-term engagement | Mentorship | Remote workMid-level Full TimeUnited States - Remote R1d ago
-
Prompt Engineering USD 100K-150KAgent systems | Agentic Workflows | Embeddings | Evaluation Pipelines | Fine TuningCareer growth potential | H1B transfer support | Long-term engagement | Remote work | Technical coding assessment requiredMid-level Full TimeUnited States - Remote R1d ago
-
Robotics Software Engineer USD 100K-150KBehavior Tree | C++ | Camera integration | Concurrent Systems | Data Pipelines100 percent remote work | Career growth | Technical mentorshipMid-level Full TimeUnited States - Remote R1d ago
-
Robotics Software Engineer USD 100K-150KAutonomous Robots | Behavior Trees | C++ | Cameras | Concurrent SystemsCareer growth | Code review and design review | Mentorship | Remote work | Technical documentation and runbooksMid-level Full TimeUnited States - Remote R1d ago
-
Principal Applied AI Engineer, Finance USD 193K-340KAPI Development | AWS | Bias Mitigation | CI/CD | Churn modeling401k matching | Adoption Assistance | Development and career growth opportunities | Fertility treatments | Flexible work schedulesSenior-level Full TimeVirtual Office (Massachusetts), United States R1d ago
-
Senior-level Full TimeRemote US, United States R1d ago
-
Data Engineer - 2373625 USD 98K-172KAzure Databricks | Data Modeling | Data Pipeline Awareness | Data Visualization | Data Warehousing401k matching | Employee stock purchase plan | Medical, dental, and vision | TelecommutingEntry-level Full TimePrimary location: Eden Prairie, MN R1d ago
-
Senior Data Engineer - 2373647 USD 146K-172KAWS | Agile Scrum | Azure | Cloud technologies | Docker401k matching | Dental insurance | Employee stock purchase plan | Medical insurance | TelecommutingSenior-level Full TimePrimary location: Eden Prairie, MN R1d ago