Member of Technical Staff, Pre-training Data
Tasks
- Build large-scale web crawling pipelines
- Collaborate on corpus strategy
- Design filtering and deduplication systems
- Improve data pipeline observability and reliability
- Manage data quality and versioning
- Optimize distributed data processing
- Run data ablation experiments
Perks/Benefits
- 401k match
- Health, dental, vision insurance
- Relocation stipend
- Unlimited paid time off
- Visa sponsorship
Skills/Tech-stack
Data Deduplication | Data Filtering | Data Processing | Data Systems | Data pipeline | Data pipeline optimization | Distributed data | Distributed data systems | Experiment design | Pipeline Optimization | Scalability | Software Engineering | System Reliability
Education
Regions
Countries
States
Related jobs
-
Distinguished Software Engineer, Data Infrastructure USD 248K-406KAI | Batch Processing | Data Infrastructure | Data Privacy | Data ProcessingExecutive-level Full TimeMountain View, CA, United States14h ago
-
Software Engineer III, AI/ML, Google Cloud AI USD 147K-211KC++ | Data Processing | Debugging | GenAI | Google CloudSenior-level Full TimeSunnyvale, CA, USA23h ago
-
Staff Software Engineer, On-Device Machine Learning USD 207K-300KAndroid | Data Processing | Data Structures | Data Structures and Algorithms | DebuggingSenior-level Full TimeSunnyvale, CA, USA23h ago
-
Software Engineer, Next Generation AI/ML Infrastructure USD 147K-211KC++ | Data Processing | Data Storage | Distributed Processing | Feature StoresMid-level Full TimeSunnyvale, CA, USA23h ago
-
Senior Software Engineer, Connect Sales, CRM, GenAI, Ads USD 174K-252KComputer Vision | Data Processing | Debugging | Distributed Computing | Generative AISenior-level Full TimeMountain View, CA, USA23h ago
-
Analytics | Concurrency | Containerization | Core Java | Data pipeline401k plan | Commuter benefits | Disability benefits | Life insurance | Paid time offSenior-level Full Time112265-NJ-MetroPark, Iselin, United States1d ago
-
Information Technologist I USD 119K-175KBusiness Process | Business process automation | Cloud Data | Cloud Data Preparation | Data PreparationFlexible work environment | Remote-friendly work environmentSenior-level Full TimeMichigan, East Lansing1d ago
-
Senior Software Engineer, AI/ML GenAI USD 174K-252KC++ | Capacity Management | Cloud platform | Computer Vision | Data ProcessingSenior-level Full TimeSunnyvale, CA, USA1d ago
-
C++ | Data Processing | Debugging | Generative AI | Language ModelsSenior-level Full TimeMountain View, CA, USA1d ago
-
C++ | Data Processing | Data Structures | Data Structures and Algorithms | DebuggingSenior-level Full TimeMountain View, CA, USA1d ago
-
Staff Software Engineer, Generative AI, Core ML USD 207K-300KAI Feedback | Computer Vision | Data Processing | Deep learning | Digital TwinSenior-level Full TimeMountain View, CA, USA1d ago
-
Software Engineer III, AI/ML GenAI, YouTube USD 147K-211KC++ | Computer Vision | Data Processing | Debugging | Distributed ComputingSenior-level Full TimeMountain View, CA, USA1d ago
-
Senior Software Engineer, Generative AI, Search Health USD 174K-252KA/B | A/B Testing | B testing | Data Analysis | Data MiningSenior-level Full TimeMountain View, CA, USA1d ago
-
Software Engineer III, AI/ML, Google Ads USD 147K-211KAlgorithms | C++ | Data Processing | Data Structures | DebuggingSenior-level Full TimeMountain View, CA, USA; Los Angeles, …1d ago
-
Staff Software Engineer, AI/ML GenAI, Google Cloud USD 207K-300KComputer Vision | Data Processing | Debugging | Distributed Computing | Fine TuningSenior-level Full TimeSunnyvale, CA, USA; San Francisco, CA, …1d ago
-
Cloud Computing | Cloud TPU | Cloud platform | Data Processing | DebuggingSenior-level Full TimeKirkland, WA, USA; Seattle, WA, USA1d ago
-
Lead Software Engineer – Development for AI Applications USD 128K-215KAWS | Auditing | Azure | Azure Cosmos | Azure Cosmos DB401k plan | Adoption reimbursement | Disability benefits | Employee assistance programs | Employee discountsSenior-level Full TimeUSA:TX:Dallas / Two AT&T Plaza (211 …2d ago
-
Software Engineer, Data Infrastructure USD 153K-376KAI systems | Access Control | Apache Airflow | Apache Flink | Apache KafkaCell phone reimbursement | Company recharge days | Generous PTO | Learning and development stipend | Mental health and wellness benefitsMid-level Full TimeSan Francisco, CA • New York, … R2d ago
-
Software Engineer III, Computer Vision, Map Geometry USD 147K-211KAlgorithms | C++ | Computer Vision | Data Processing | Data StructuresSenior-level Full TimeBoulder, CO, USA; Mountain View, CA, …2d ago
-
Senior Software Engineer, AI/ML Networking USD 174K-252KC++ | Data Processing | Data Structures | Data Structures and Algorithms | DebuggingSenior-level Full TimeRaleigh, NC, USA; Durham, NC, USA2d ago
-
Lead GenAI Backend Platform Software Engineer USD 114K-248KAWS | AWS Lambda | AWS Step Functions | Algorithms | Amazon Aurora401k match | Commuter benefits | Hybrid work environment | Tuition reimbursement | Volunteer service daysSenior-level Full TimeRockville (Gira), United States3d ago
-
Infrastructure Engineer, Pre-training USD 350K-850KApache Spark | Chunking | Cloud Computing | Data Deduplication | Distributed SystemsFlexible working hours | Generous vacation | Hybrid work flexibility | Optional equity donation matching | Parental leaveMid-level Full TimeSan Francisco, CA3d ago
-
Sr Data Engineer USD 170K-170KAWS | Data Marts | Data Processing | Dimensional Modeling | Distributed dataFully onsite | Local candidate preferenceSenior-level Full TimeUS - New York3d ago
-
AWS | Azure | C# | C++ | Cost OptimizationSenior-level Full TimeCambridge, MA, United States4d ago
-
Software Engineer III - Machine Learning USD 117K-234KA/B | A/B Testing | AWS | Approximate Nearest Neighbor | Azure401k match | Education benefits | Hybrid work | Multiple health plans | PTOSenior-level Full Time(USA) SUNNYVALE V - 640 W …4d ago