Senior AI Engineer
Tasks
- Autoscale inference services and manage dynamic model loading
- Build logging and monitoring for training and inference
- Build model development toolchains service pipelines
- Cache frequently used models
- Coordinate with DevOps and IT teams
- Create automated data preprocessing feature engineering dataset versioning
- Create runbooks and perform root cause analysis
- Design user interfaces and APIs
- Develop LLM training GPU infrastructure and cluster
- Develop dashboards for latency accuracy drift detection
- Develop machine learning platform management system
- Dynamically allocate inference resources based on demand
- Enable distributed model training and hyperparameter optimization
- Evaluate platform performance scalability reliability
- Experiment with serverless architectures
- Implement A B testing for model deployments
- Implement CI/CD pipelines for model deployment
- Implement access control and security
- Implement alerting for anomalies and performance degradation
- Implement fault tolerant distributed LLM training
- Implement model memory loading and unloading
- Implement self-healing systems
- Integrate TPUs into training infrastructure
- Manage scheduling for multi tenant GPU clusters
- Optimize GPU utilization for large scale training
- Provide technical support to data scientists and engineers
- Refine metrics based on stakeholder feedback
- Support edge inference for lightweight models
- Troubleshoot training and inference issues
Perks/Benefits
Skills/Tech-stack
A/B | A/B Testing | Autoscaling | B testing | Bash | C plus plus | CUDA | CUDA kernels | Caching | DALI | DeepSpeed | Distributed Training | Docker | Edge Computing | GPU Cluster | GRPC | Go | Grafana | Horovod | Hyperparameter Optimization | Kubernetes | Logging | Machine Learning | Model Monitoring | Multi-tenant | Multi-tenant systems | NCCL | Pipeline parallelism | Prometheus | PyTorch distributed | Python | Ray | Resource scheduling | Serverless architecture | Slurm | TPU | Tensor Parallelism | Tf data | Triton | Weights and Biases
Roles
Related jobs
-
Featured Feat. Applied AI Engineer - Bay Area USD 211K-263KArtificial Intelligence | C plus plus | C# | Embeddings | Feature Engineering401k | Comprehensive health and wellness benefits | Learning and development opportunities | Unlimited time offMid-level Full TimeHQ (San Francisco)27d ago
-
Mid-level Full TimeDearborn, United States6h ago
-
Senior Data Scientist and AI Specialist USD 135K-213KAzure ML | CI/CD | Drift Detection | Embeddings | ForecastingSenior-level Full TimeAustin, Texas, United States6h ago
-
Data Engineer I USD 92K-123KAPI Integration | Data Governance | Data Modeling | Data Quality | Data cloud401k match | Career advancement opportunities | Employee resource groups | Flexible PTO | Flexible work environmentMid-level Full TimeMorrisville, NC, US, 2756012h ago
-
Member of Technical Staff (AI Software Engineer, Agents) USD 220K-405KAI Evaluation | Agent architecture | Browser technologies | Chrome DevTools | Chrome DevTools ProtocolSenior-level Full TimeSan Francisco14h ago
-
Sr Technical Solutions Engineering USD 130K-178KAWS | Automated Patch Deployment | Azure | Bash | CloudFormation24x7 on-call support | Secure facility accessSenior-level Full TimeMcLean, Virginia18h ago
-
Staff Technical Solution Engineering USD 153K-210KAir-gapped | Air-gapped networks | Automation | Bash | Cloud infrastructure24x7 on call coverage flexibility | Benefits package | Secure facility onsite workSenior-level Full TimeMcLean, Virginia18h ago
-
AI Deployment Engineer USD 197K-278KAPI Integration | ChatGPT | Cloud Architecture | Generative AI | JavaScriptHybrid work model | Relocation assistanceSenior-level Full TimeSan Francisco18h ago
-
98: Technical Staff Quantum Science and Engineering USD 145K-220KANSYS HFSS | Analog circuit | Analog circuit design | COMSOL | Circuit design401k matching | Health, dental, and vision insurance | MIT-funded pension | Mentorship programs | Paid leaveSenior-level Full TimeLexington, MA, US19h ago
-
Sr. Manager, Data Analytics USD 113K-171KAlteryx | Artificial Intelligence | Automation | Continuous Auditing | DashboardingFlexible time off | Health insurance | Life insurance | Retirement benefits | Travel up to 15 percentSenior-level Full TimeFort Worth - Main, United States19h ago
-
Senior Reliability Engineer- Surgical Robotics USD 107K-160KAutomation Scripting | Cause analysis | Cause map | Data Analysis | FMEA401k plan with employer match | Health, dental, vision insurance | Onsite work | Paid Holidays | Paid time offSenior-level Full TimeUSA-CT North Haven, United States19h ago
-
Mid-level Full TimeCosta Mesa, California, United States19h ago
-
AWS | Apache Spark | Big Data | Cloud platform | Data EngineeringSenior-level Full TimeMcLean, Virginia; Washington, D.C.19h ago
-
AWS | Apache Spark | Azure | Cloud platform | Data EngineeringConference and meetup promotion | Equity eligibility | Performance bonus eligibility | Travel opportunitiesSenior-level Full TimeMaryland; Virginia; Washington, D.C.19h ago
-
Analytics Engineer USD 147K-225KApache Airflow | BigQuery | DBT | Databricks | Python401k | Comprehensive benefits | Equity | Flexible time offSenior-level Full TimeUS Remote, San Francisco, CA; New … R19h ago
-
Staff Data & Machine Learning Engineer USD 118K-136KDBT | Data Architecture | Data Governance | Data Quality | Data Streaming401k match | Dental insurance | Family planning resources | Flexible vacation | Fully remoteSenior-level Full TimeRemote - USA R20h ago
-
Staff Software Engineer, AI Native Web Platform USD 198K-272KCI/CD | CMS | Design Systems | JavaScript | LLMSenior-level Full TimeMountain View, California20h ago
-
Senior AI Engineer, Real-World Data USD 125K-175KAI orchestration | AWS | AWS Fargate | AWS Lambda | Agile deliverySenior-level Full TimeUS Remote R20h ago
-
Senior-level Full TimeNew York, NY21h ago
-
Staff Data Platform Engineer USD 210K-240KAuditing | Azure Event | Azure Event Hubs | Batch Processing | CI/CDHealth plan subsidies | Paid global offsites | Remote-first work culture | WFH office reimbursementSenior-level Full TimeRemote - US R21h ago
-
GTM AI Engineer USD 192K-238KAPI Integration | Claude | Data Enrichment | Data Pipelines | Error HandlingAdoption leave | Commuter benefits | Dental insurance | Disability insurance | ESPPMid-level Full TimeSan Mateo, CA, United States21h ago
-
Software Engineer USD 120K-145KC++ | Consensus | Cypher | Distributed Systems | DockerHybrid work arrangement | Remote workMid-level Full TimeMilpitas, CA21h ago
-
Sr AI Engineer USD 84K-105KC# | Data Preprocessing | Digital Signal | Digital Signal Processing | Edge AICommuter benefits | Dental insurance | Employee resource groups | Flexible spending account | Life insuranceSenior-level Full TimeColumbia, MARYLAND, United States22h ago
-
LA Kings - Sr. Data Engineer USD 111K-135KAlerting | Apache Spark | Batch Processing | Cost Optimization | Data Governance401k match | Health savings account | Life insurance | Paid Holidays | Parental leaveSenior-level Full TimeEl Segundo, CA22h ago
-
A/B | A/B Testing | B testing | C++ | Cloud Computing401k employer match | Family planning support | Flexible vacation | Gender-affirming care | Healthcare benefitsSenior-level Full TimeRemote - United States R22h ago