Senior AI Engineer – Pre-training Data (f/m/d)
Tasks
- Build data quality tooling
- Co own data pipelines end to end
- Convert curated corpora to training ready streaming formats
- Curate and compose data mixtures
- Design and run ablation studies
- Identify and close data coverage gaps
- Maintain data lineage and provenance
- Monitor pipeline health and data quality metrics
- Source process deduplicate filter pre training corpora
- Translate research into data experiments
Perks/Benefits
- Bike lease
- Company pension plan
- Fitness and wellness offerings
- Flexible working hours
- Hybrid work model
- Mental health support
- Paid vacation
- Stock option plan
- Technical equipment budget
- Transportation ticket subsidy
Skills/Tech-stack
Classification | Cloud Native | Common Crawl | Container Orchestration | Data Engineering | Data Filtering | Data Lineage | Data Processing | Data Quality | Decontamination | Deduplication | Deep learning | Foundation Model | Foundation Model training | Heuristics | Kubernetes | Language Processing | ML Infrastructure | Machine Learning | Model Training | Natural Language | Natural Language Processing | Perplexity | Python | Rust | WARC | Web data | Web data processing
Education
Roles
Regions
Countries
States
Related jobs
-
Apache Beam | Apache Spark | BigQuery | C plus plus | C#Client stakeholder engagement | Travel up to 30 percentMid-level Full TimeMunich, Germany; Frankfurt, Germany10h ago
-
AI Software Engineer – Python (m/f/x) EUR 47K-85KAPI Development | AWS | Anthropic | CI/CD | DockerCompany pension scheme | Complimentary PRIME Broker subscription | Education budget | Flexible vacation policy | German language classesSenior-level Full TimeMünchen, BY, Germany12h ago
-
Cloud Computing | Elasticsearch | Generative AI | MLOps | Machine LearningAdditional vacation days | Company events and team activities | EGYM Wellpass subsidy | Fitness and wellness access | Home office on FridaysMid-level Full TimeErfurt, Thüringen, Germany13h ago
-
Lead Software Engineer (m/w/d) – Search & GenAI EUR 53K-53KAWS | Agentic search | Cloud infrastructure | Golang | High PerformanceCompany events | External training budget | Flexible work model | Free Germany ticket | Free lunchSenior-level Full TimeMünchen, Bayern, Germany15h ago
-
Anomaly Detection | C++ | Computer Vision | Data Preprocessing | Machine LearningCareer development support | Flexible working hours | Mentoring | Onboarding | Remote workEntry-level InternshipMünchen, BY, DE, 8080921h ago
-
Cloud Data Engineer (m/w/d) – AWS und Databricks EUR 65K-85KAWS | CI/CD | DBT | Data Governance | Data QualityAdditional Vacation Time Options | Bicycle leasing | Company pension | Flexible working hours | Germany Ticket subsidySenior-level Full TimeHannover, Niedersachsen, DE21h ago
-
DWS - Senior Data Engineer (m/f/d) EUR 50K-70KCI/CD | Cloud Data | Cloud Data Warehouse | Data Governance | Data ModelingCorporate volunteering | Flexible benefits | Health insurance | Hybrid working | Parental leaveSenior-level Full TimeBerlin Otto-Suhr-Allee 16, Germany21h ago
-
Werkstudent Automation / Data Analytics (m/w/d) EUR 32K-36KAutomation | Data Analysis | Data Quality | NumPy | PandasCafeteria meals | Flexible working hours | Home office | Internal communities | Mobile work equipmentEntry-level Part TimeNürnberg Fürther Str. 111, Germany21h ago
-
Data Engineer & Backend Developer (m/w/d) EUR 45K-76KAPIs | Authentication | BI tools | Data Consistency | Data MappingCompany pension | Fitness subsidy | Flexible working hours | Home office | IT leasingSenior-level Full TimeBarsinghausen, Germany21h ago
-
API Integration | API Management | Agile Kanban | Agile Scrum | AlertingCareer development opportunities | Fitness area | International work environment | Meal stipend | Remote Work N/AMid-level Full TimeHamburg1d ago
-
Agent Orchestration | Data Processing | Docker | Embeddings | EvaluationBicycle subsidy | Corporate discounts | Corporate pension plan | Digital meal vouchers | Educational budgetSenior-level Full TimeBerlin, Germany1d ago
-
AI Software Engineer – Python (m/f/x) EUR 47K-85KAPI | AWS | Anthropic API | CI/CD | DockerCompany pension scheme | Complimentary broker subscription | Discounted sports activities | Education budget | Flexible vacation policySenior-level Full TimeMünchen, BY, Germany1d ago
-
Artificial Intelligence | Big Data | Cloud Computing | Communication | Data AnalysisDual study program | International assignment possible | Seminars and workshopsEntry-level Full TimeWaldachtal, Germany1d ago
-
Agile | Apache Airflow | Apache Spark | Compliance | Data GovernanceCentral office location | Coffee and tea stations | Employee discounts | Networking opportunitiesSenior-level Full TimeMunich, Germany1d ago
-
Staff Data Engineer - Finance Tech (all genders) EUR 33K-33KAttribution Modeling | Clustering | Dagster | Data orchestration | Machine LearningSenior-level Full TimeHamburg, HH, Germany1d ago
-
Staff Data Engineer - Finance Tech (all genders) EUR 50K-50KAttribution Modeling | Clustering | Dagster | Data orchestration | Machine LearningSenior-level Full TimeHamburg, HH, Germany1d ago
-
Attribution Modeling | Clustering | Dagster | Data Analysis | Data orchestrationSenior-level Full TimeHamburg, HH, Germany1d ago
-
Clustering | Dagster | Data orchestration | Machine Learning | Marketing attributionAcceptance and inclusion | Afterwork drinks | Company events | Inclusive culture | Team lunchesSenior-level Full TimeHamburg, HH, Germany1d ago
-
API Design | AWS Fargate | Amazon RDS | Amazon S3 | Amazon SNSAfter work drinks | Company events | Team lunch eventsSenior-level Full TimeHamburg, Germany1d ago
-
AWS Fargate | Amazon RDS | Amazon S3 | Amazon SNS | Amazon SQSAfterwork drinks | Company events | Quick coffee | Team lunchesSenior-level Full TimeHamburg, Germany1d ago
-
ArgoCD | Command Line | Command-line Interface | Continuous Delivery | Continuous integrationRemote workEntry-level Internship Part TimeBamberg, BY, Germany1d ago
-
Staff Data Engineer - Finance Tech (all genders) EUR 45K-50KClustering | Dagster | Data orchestration | Machine Learning | PythonSenior-level Full TimeHamburg, HH, Germany1d ago
-
AI Software Engineer - Model Evaluation (f/m/d) EUR 70K-90KBenchmarking | Dashboarding | Data Pipelines | Dataset curation | Distributed SystemsBike lease | Company pension subsidy | Fitness wellness benefits | Flexible working hours | Hybrid work modelSenior-level Full TimeHeidelberg1d ago
-
Clustering | Dagster | Data orchestration | Machine Learning | PythonSenior-level Full TimeHamburg, HH, Germany1d ago
-
Amazon SageMaker | Apache Arrow | Big Data | CI/CD | Cloud ComputingSenior-level Full TimeHamburg, HH, Germany1d ago