AI Data Infrastructure Engineer
Tasks
- Build high throughput data loading systems to maximize GPU utilization
- Build ingestion systems for text image audio video and structured data
- Construct evaluation datasets with integrity and contamination controls
- Design and operate large scale AI data pipelines supporting training and evaluation
- Design storage architectures balancing cost throughput latency
- Develop dataset versioning lineage and provenance tracking for reproducible training
- Document data systems schemas and operational procedures
- Drive observability of data quality drift and pipeline health
- Implement data cleaning deduplication filtering and quality assurance at petabyte scale
- Implement data privacy redaction and consent enforcement
- Implement labeling workflows active learning pipelines and human in the loop data improvement
- Optimize cost and performance using compression format selection and caching
Perks/Benefits
Skills/Tech-stack
Active Learning | Apache Beam | CI/CD | Code review | Data Lineage | Data Modeling | Data Privacy | Data Quality | Data Storage | Data provenance | Dataset versioning | Distributed Systems | ETL | GPU Utilization | Human-in-the-loop | Java | Python | Ray | Rédaction | Scala | Spark | Testing | The Loop
Education
Related jobs
-
Data Science Engineer (Shreveport, LA) USD 37K-40KData Historian | Data Visualization | Data analytics | Excel | Machine Learning401k match | Dental insurance | Disability insurance | Health insurance | Life insuranceMid-level Full TimeAtlanta, GA, United States R9h ago
-
Principal Research Data Engineer USD 142K-185KAirflow | Analytical processing | ArcGIS | Avro | CI/CDDental | Health care | PTO | Retirement | Sick leaveSenior-level Full TimeSt. Louis, Missouri, US R18h ago
-
Machine Learning Engineer (Active Secret Clearance) USD 160K-190KAgile | Asynchronous programming | CI/CD | Data Engineering | Docker401k plan | FSA | Fully remote work | HSA | Hybrid onsite optionMid-level Full TimeRemote; Tacoma, WA R23h ago
-
Cloud Storage | Compute Orchestration | Computer Vision | Data Lineage | Data PipelinesEnd-to-end responsibility | Fast-paced startup environment | High autonomy | Onsite work | OwnershipMid-level Full TimeSan Mateo, CA; Onsite R1d ago
-
Principal Optimization Engineer USD 117K-234KCONOPT | Cloud Computing | Convergence analysis | Discrete Optimization | Fluid modelingHealth care benefits | Hybrid remote option | Paid Holidays | Paid sick days | Paid vacationSenior-level Full TimeCAG10: ALC HQ, 1025 Cobb Place … R1d ago
-
APIs | Agile | Azure | Azure Data | Azure Data FactoryPeriodic travel | Remote work permittedSenior-level Full Time6314 Remote/Teleworker US, United States R1d ago
-
Senior-level Full TimeUnited States - Remote R1d ago
-
Edge AI Engineer USD 100K-150KC++ | Core ML | Data Privacy | Device deployment | Embedded SystemsCareer growth | Equal opportunity employer | Health benefits | Remote workSenior-level Full TimeUnited States - Remote R1d ago
-
Senior-level Full TimeUnited States - Remote R1d ago
-
Senior-level Full TimeUnited States - Remote R1d ago
-
Edge AI Engineer USD 100K-150KBenchmarking | C plus plus | Core ML | Device security | Edge inferenceSenior-level Full TimeUnited States - Remote R1d ago
-
AI Research Engineer (Applied AI) USD 100K-150KAblation Studies | Accelerators | Computer Vision | Data Quality | Data quality monitoringHealth benefits | Remote workMid-level Full TimeUnited States - Remote R1d ago
-
AI Research Engineer (Applied AI) USD 100K-150KAblation Studies | Accelerator hardware | Agentic Systems | Computer Vision | Data QualityCareer growth | Remote workMid-level Full TimeUnited States - Remote R1d ago
-
AI Research Engineer (Applied AI) USD 100K-150KComputer Vision | Data Quality | Data labeling | Data quality monitoring | Deep learningCareer growth | Equal opportunity employer | Remote workMid-level Full TimeUnited States - Remote R1d ago
-
AI Research Engineer (Applied AI) USD 100K-150KAblation Studies | Accelerator hardware | Agentic Systems | Computer Vision | Data QualityCareer growth | Equal opportunity employment | Remote workMid-level Full TimeUnited States - Remote R1d ago
-
AI Research Engineer (Applied AI) USD 100K-150KAblation Studies | Accelerator hardware | Computer Vision | Data Modeling | Data QualityMid-level Full TimeUnited States - Remote R1d ago
-
AI Data Infrastructure Engineer USD 100K-150KActive Learning | Apache Beam | CI/CD | Caching | Code reviewMid-level Full TimeUnited States - Remote R1d ago
-
AI Data Infrastructure Engineer USD 100K-150KApache Beam | CI/CD | Caching | Code review | CompressionBenefits provided | Career growth potential | Equal opportunity employment | Long term multi year engagement | Remote workMid-level Full TimeUnited States - Remote R1d ago
-
AI Data Infrastructure Engineer USD 100K-150KApache Beam | Apache Spark | CI/CD | Caching | Code reviewMid-level Full TimeUnited States - Remote R1d ago
-
AI Data Infrastructure Engineer USD 100K-150KApache Beam | CI/CD | Caching | Compression | Data LineageBenefits | Remote workMid-level Full TimeUnited States - Remote R1d ago
-
Data Platform Architect USD 100K-150KAWS | Access Management | Amazon Redshift | Apache Flink | Apache KafkaBenefits package | Career growth | Long-term engagement | Remote workSenior-level Full TimeUnited States - Remote R1d ago
-
Embedded Software Engineer USD 114K-172KAgile | Agile Framework | Automated testing | C++ | CIPCaregiver leave | Flexible work schedule | Paid time off | Parental leaveSenior-level Full TimeUnited States of America Mayfield Heights R1d ago
-
LLM Fine-Tuning Engineer USD 100K-150KAdapter Layers | Benchmarking | Dataset Distillation | Direct Preference Optimization | Distributed TrainingBenefits | Career growth | Mentorship | Remote workMid-level Full TimeUnited States - Remote R1d ago
-
LLM Fine-Tuning Engineer USD 100K-150KAdapter based methods | DPO | DeepSpeed ZeRO | Efficient Attention | EvaluationMid-level Full TimeUnited States - Remote R1d ago
-
LLM Fine-Tuning Engineer USD 100K-150KAutomated benchmarking | DPO | Distributed Training | Evaluation methodology | FSDPMid-level Full TimeUnited States - Remote R1d ago