Senior AI Engineer – Pre-training Data (f/m/d)
Tasks
- Build data quality tooling
- Co own data pipelines end to end
- Convert curated corpora to training ready streaming formats
- Curate and compose data mixtures
- Design and run ablation studies
- Identify and close data coverage gaps
- Maintain data lineage and provenance
- Monitor pipeline health and data quality metrics
- Source process deduplicate filter pre training corpora
- Translate research into data experiments
Perks/Benefits
- Bike lease
- Company pension plan
- Fitness and wellness offerings
- Flexible working hours
- Hybrid work model
- Mental health support
- Paid vacation
- Stock option plan
- Technical equipment budget
- Transportation ticket subsidy
Skills/Tech-stack
Classification | Cloud Native | Common Crawl | Container Orchestration | Data Engineering | Data Filtering | Data Lineage | Data Processing | Data Quality | Decontamination | Deduplication | Deep learning | Foundation Model | Foundation Model training | Heuristics | Kubernetes | Language Processing | ML Infrastructure | Machine Learning | Model Training | Natural Language | Natural Language Processing | Perplexity | Python | Rust | WARC | Web data | Web data processing
Education
Roles
Regions
Countries
States
Related jobs
-
API Contract | API contract design | Agent Builder | Agentic Orchestration | Automated testingSenior-level Full TimeFrankfurt am Main, Germany; Munich, Germany13h ago
-
Senior Data Engineer (w/m/d) EUR 68K-77KAWS | AWS Glue | Airbyte | Amazon Athena | Amazon EC2Conferences | Health benefits | Hybrid work | Jobrad | Public transit subsidySenior-level Full TimeBerlin, BE, Germany15h ago
-
Automation | Data Visualization | Log Analysis | Python | Regular ExpressionsCompany pension plan | Flexible working hours | On-site workEntry-level Part TimeStuttgart, BW, Germany17h ago
-
Azure | BigQuery | CI/CD | Databricks | GCPCompany events | Flexible working hours | Further Education Budget | Meal or health discounts | Mobile workingEntry-level Full TimeKöln, NW, DE, 5114918h ago
-
Cloud platform | Dashboards | Data Monitoring | Data Pipelines | Data ValidationEmployee discount | Fitness center access | Free meals | Mentorship program | Networking eventsEntry-level Part TimeHamburg, Hamburg, DE, 2229719h ago
-
API Integration | JavaScript | LLM API | N8n | PandasBarrier-free workplace | Cafeteria | Childcare | Coaching | Company doctorEntry-level InternshipStuttgart, DE1d ago
-
Mid-level Full TimeRemote (DEU), Germany R1d ago
-
Senior Software Engineer - REDAPL Graph Engine EUR 97K-125KApache Iceberg | Backend Engineering | Calcite | Data Ingestion | Data ProcessingCollaborative team environment | Competitive global benefits | Continuous professional development | Hybrid workplace | Influence on product directionSenior-level Full TimeFrance, Remote; Germany, Remote; Spain, Remote R1d ago
-
AI workflows | AWS | Azure | Cloud platform | Google CloudCareer development | Collaborative culture | Flexible work schedule | Hybrid work | Knowledge sharingSenior-level Full TimeBerlin, Germany1d ago
-
Data Solutions Senior Consultant (m/w/d) EUR 55K-80K3NF | AWS | AWS Glue | Agent systems | Apache SparkBahncard | Corporate pension | Flexible work model | Jobrad | Modern hardware provisionMid-level Full TimeMünchen1d ago
-
AWS | Azure | Data Aggregation | Data Deduplication | Data Monitoring100% remote | Long-term contractMid-level ContractBerlin R1d ago
-
Code review | Deep learning | Distributed Training | GPU Optimization | GitHubConferences and training | Flexible working hours | Fully remote | Time offSenior-level Full TimeGermany R1d ago
-
ARIMA | AWS | Azure | GCP | GitEnd to end ML lifecycle exposure | Flexible engagement extension possible | Fully remote across Europe | Healthcare domain impactMid-level Full TimeGermany R1d ago
-
Agent architectures | Agent systems | Anthropic | Context Management | EmbeddingsAnnual leave | Annual professional development budget | Fully remote work | Health and wellness benefits | Open source allowanceSenior-level Full TimeGermany R1d ago
-
AWS | Agentic AI | CRM | Cloud platform | Command LineCareer growth | Fully remote | High autonomy | Learning opportunitiesSenior-level Full TimeGermany R1d ago
-
AI API | AWS | CI/CD | Data Engineering | Data ModelingCompany pension scheme | Complimentary PRIME Broker subscription | Education budget | Flexible work | German language classesEntry-level Full TimeMünchen, BY, Germany1d ago
-
Intern - Machine Learning for Digital Holography EUR 26K-26KComputer Vision | Deep learning | Image Processing | MATLAB | Machine LearningTraining opportunitiesEntry-level InternshipJena, Germany1d ago
-
Consultant (m/w/d) - Data & AI Engineering USD 70K-126KAI integration | Advanced Analytics | Big Data | Data Engineering | Data ModelingDeutschlandticket | EU travel flexibility | Fortbildung budget | Jobrad | Onsite mentoringMid-level Full TimeCologne, Hanover1d ago
-
(Senior) AI Engineer - Consultant (w/m/d) EUR 60K-75KCloud Computing | Data Analysis | Data Preparation | Deployment | GovernanceFlexible time off | Home office | Hybrid work | Learning and development | Psychological counselingMid-level Full Timehybrid, München, Köln R1d ago
-
Data Architect (m/w/d) EUR 55K-62KAWS | Apache Kafka | Apache Spark | Azure | BashChildcare subsidy | Deutschland ticket | Employee discounts | Flexible work hours | Remote workMid-level Full TimeFrankfurt am Main1d ago
-
Senior-level Full TimeEschborn, Germany1d ago
-
Marketing Analytics Engineer EUR 50K-57KAirflow | Continuous integration | DBT | Data Modeling | Data VisualizationAdditional annual leave | Autonomy | Discounts | Fitness and wellness membership | Language appsMid-level Full TimeBerlin1d ago
-
Azure Data | Azure Data Factory | Azure SQL | Azure SQL Database | Data FactoryAttractive employment benefits | Flexible work schedule | Modern equipment | Opportunities for career growth | Professional development trainingMid-level Full TimeGöttingen1d ago
-
Data Analysis | Data Modeling | Data Preprocessing | Data Visualization | KerasBike leasing | Cafeteria meals | Company pension | Disability insurance | Employee stock programEntry-level Full TimeTaufkirchen / Ottobrunn, Germany2d ago
-
Algorithmenentwickler (m/w/x) EUR 50K-63KAzure DevOps | CI/CD | Image Processing | MATLAB | Machine LearningAgile work environment | Internal travel | International team | Team collaborationEntry-level Full TimeOberkochen, Germany2d ago