Member of Engineering (Pre-training / Data Research)
Remote (EMEA/East Coast)
R
USD 160K-300K (estimate) Mid-level Full Time
Tasks
- Build distributed data pipelines
- Collaborate with pretraining posttraining evals and product teams
- Deduplicate datasets
- Design data curation pipelines
- Generate synthetic data
- Improve pretraining dataset quality
- Optimize data mixing
- Run training experiments and ablations
- Track model evaluation results
Perks/Benefits
- Company-provided equipment
- Flexible hours
- Frequent team get togethers
- Fully remote work
- Health insurance allowance
- Home-office allowance
- Learning allowance
- Vacation and holidays
- Well-being allowance
Skills/Tech-stack
Curriculum learning | Data Ablation | Data Curation | Data Pipelines | Data mixing | Deduplication | Distributed Systems | Distributed data | Distributed data pipelines | GPU clusters | Language Models | Large Language Models | Machine Learning | Prompt engineering | Python | Scaling Laws | Tokenization | Transformers
Education
N/A
Related jobs
-
Cloud Computing | Data Pipelines | Debugging | Deployment | ETLCareer growth opportunities | Continuous learning culture | Coworking access | Flexible schedule | Fully remoteMid-level Full TimeNetherlands R4h ago
-
Cloud Computing | Data Pipelines | ETL | Google Colab | Information RetrievalCareer growth opportunities | Continuous learning culture | Coworking access | Flexible schedule | Fully remote workMid-level Full TimeIreland R4h ago
-
Cloud Computing | ETL | Google Colab | Information Retrieval | Jupyter NotebooksCareer growth opportunities | Coworking access | Employee benefits | Flexible schedule | Fully remote workMid-level Full TimeSwitzerland R4h ago
-
Cloud Computing | Data pipeline | Debugging | ETL | Google ColabCareer growth | Continuous learning | Flexible work hours | Fully remote | International collaborationMid-level Full TimeFrance R4h ago
-
Cloud Computing | Data Pipelines | Debugging | ETL | Google ColabCareer growth opportunities | Flexible work schedule | Fully remote | Inclusive culture | Optional coworking accessMid-level Full TimeSpain R4h ago
-
Cloud infrastructure | Data Pipelines | Debugging | ETL | Google ColabCareer growth opportunities | Continuous learning opportunities | Coworking access | Flexible work hours | Fully remoteMid-level Full TimeGermany R4h ago
-
Insights Product Manager - Analytics Engineering GBP 50K-68KAmplitude | Anomaly alerting | CI/CD | DBT | Data CatalogAnnual leave | Counselling access | Employee assistance program | Free Economist content access | Moving home supportMid-level Full TimeLondon - Commercial R10h ago
-
Senior Solutions Engineer - Qatar & S.Africa Fly-in GBP 70K-100KAI Agents | AWS | Apache Spark | Apache Spark architecture | Artificial IntelligenceHybrid work schedule | Travel for customer visits and events | Workshops seminars and community buildingSenior-level Full TimeLondon, United Kingdom; Paris, France R15h ago
-
AI Agents | AWS | Apache Spark | Artificial Intelligence | Big DataFixed term contract to FTE conversion | Hybrid scheduleSenior-level Full TimeAmsterdam, Netherlands R17h ago
-
Data Engineer Azure – Digital Factory (H/F) EUR 42K-47KAlerting | Azure | Azure Container | Azure Container Instances | Azure DataLong-term mission | Onboarding program | Prime vacances | Referral bonus | Technical communitiesEntry-level Full TimeNeuilly-sur-Seine, IDF, France R19h ago
-
Data Engineer HUF 20000K-26000KAccess Control | Batch Processing | Cloud Computing | Data Governance | Data ModelingCompany paid sick time | Flexible hours | Hybrid work options | Medical benefits | Paid parental leaveSenior-level Full TimeBudapest, Hungary (Hybrid) R19h ago
-
Consultant.e Data Engineer AWS H/F EUR 42K-49KACID | AWS | Agile | Airflow | BASEAWS account | Flexible telework | Laptop reimbursement | Udemy Business | Wellness subscriptionEntry-level Full TimeLevallois-Perret, IDF, France R19h ago
-
Ingénieur·e Data (H/F) EUR 50K-60KAWS | Apache Airflow | Apache Hadoop | Cloud Computing | Data GovernanceAgile environment | Career growth | Collaborative team | Remote work | Skills developmentSenior-level Full TimeLILLE, France R19h ago
-
Azure | Data Analysis | Databricks | Deep learning | Exploratory Data AnalysisArt and cultural events | Flexible remote work | Inclusive work environment | Mentorship | Professional eventsEntry-level Contract Full TimeParis R19h ago
-
Internship Measurement Systems and Machine Learning for Inertial Sensor Characterization EUR 31K-31KCircuit design | Control Systems | Data Pipelines | Data Preprocessing | ElectronicsHybrid work setup | University enrollment requirementEntry-level Full Time InternshipKusterdingen, BW, Germany R21h ago
-
AI Scientist DKK 499K-734KApache Spark | Azure | Databricks | Deep learning | Delta LakeBusiness resource groups | Charitable donation stipend | Flexible work hours | Health stipend | Paid time offMid-level Full TimeCopenhagen R21h ago
-
DBT | Data Ingestion | Data Modeling | Data Observability | Data ValidationAnnual charitable contribution | Career development opportunities | Family health insurance | Fitness reimbursement | Inclusive collaborative cultureMid-level Full TimeNetherlands R21h ago
-
Amazon Redshift | Code review | DBT | Data Modeling | Data ObservabilityCareer development | Charitable contribution | Fitness reimbursement | Health insurance | Private pensionMid-level Full TimeIreland R21h ago
-
Code review | DBT | Data Crawling | Data Modeling | Data ObservabilityAnnual charitable contributions | Career development | Family health insurance | Fitness reimbursement | Private pension contributionsMid-level Full TimeSwitzerland R21h ago
-
Amazon Redshift | CI/CD | DBT | Data Cleansing | Data ModelingCareer development opportunities | Charitable contributions | Family health insurance | Fitness reimbursement | Private pension contributionsMid-level Full TimeFrance R21h ago
-
Amazon Redshift | Code review | DBT | Dashboards | Data CrawlingCareer development | Charitable contributions | Fitness reimbursement | Health insurance | Inclusive collaborative cultureMid-level Full TimeGermany R21h ago
-
Amazon Redshift | DBT | Data Ingestion | Data Modeling | Data ObservabilityCareer development opportunities | Charitable contribution allocations | Family health insurance | Fitness reimbursement | Private pension contributionsMid-level Full TimeSpain R21h ago
-
Analytics Engineer I - AI & Data Enablement EUR 55K-72KAutomation | Data Documentation | Data Modeling | Data Quality | Data ValidationDiscounts on transportation and food | Enhanced parental leave | External coaching or therapy | Flexible time off | Learning budgetMid-level Full TimeBarcelona, Spain R22h ago
-
ES- Data Engineer + INGLÉS EUR 28K-35KAnalytics | Data Lake | Data Modeling | Data pipeline | ETLFlexible schedule | Intensive Fridays | Remote workEntry-level Full TimeMadrid, MD, Spain R23h ago
-
GenAI Engineer EUR 15K-15KArchitecture | FastAPI | FinOps | Google Cloud | LLMAfterworks | Company vacation bonus | DevFest | Employee profit-sharing | MacBook ProEntry-level Full TimeToulouse R1d ago