Senior Machine Learning Engineer - Data
Budapest, Hungary
Colossyan
About Us
At Colossyan, we’re building the future of workplace learning with AI video.Top companies like P&G, Porsche, BASF, BDO, and Paramount already use Colossyan to create engaging and interactive video content faster and more cost effectively than traditional video production. Nearly 1 million videos have been created using Colossyan, and we’ve been recognised as a G2 Leader in multiple product categories.Here’s an overview of our standout features:
- Create text-to-speech videos hosted by one of our 150+ AI avatars
- Translate your video content to 70+ languages in just four clicks
- Bring documents to life with our document-to-video feature
- Personalize your videos by creating a custom avatar of yourself, complete with a cloned voice
- Make learning content interactive with features like branching, multiple choice quizzes, and more
To learn more about our product features, visit colossyan.com.
The role
We’re looking for a Senior ML Engineer - Data to play a key role in shaping the foundation of our AI models by curating, processing, and optimizing large-scale datasets.In this role, you’ll work closely with research and product teams to ensure our models are trained on the highest quality data. You’ll design robust data pipelines, develop automated evaluation frameworks, and explore innovative techniques like semi-supervised learning and human-in-the-loop ML to continuously improve model performance.This is an opportunity to make a real impact—your work will directly influence the effectiveness and accuracy of our AI-driven products.
Key Responsibilities:
- Design and develop scalable data pipelines, including sourcing, scraping, filtering, post-processing, de-duplicating, and versioning of data for AI model training.
- Build frameworks for data evaluation and quality assessment, ensuring that our models are trained on high-quality, reliable data.
- Develop automated evaluation pipelines to benchmark new models before deployment in our production API.
- Collaborate with research and product teams to incorporate their data needs and optimize pipelines for various tasks.
- Conduct open-ended research on data quality improvements, including semi-supervised learning, human-in-the-loop ML, and fine-tuning with human feedback.
What we’re looking for:
- 5+ years of experience as a Data Engineer, ML Engineer, or Data Scientist handling large-scale data.
- Strong belief in high-quality data and the impact of data curation on model performance.
- Experience with end-to-end ML training pipelines.
- Expertise in large-scale distributed systems.
- Strong programming skills in Python and experience with PyTorch.
- (Preferred) Experience working with visual media and computer vision algorithms
At Colossyan, we believe that diversity drives innovation and inclusion fosters a sense of belonging. We are committed to creating a workplace where everyone feels valued, respected, and empowered to bring their authentic selves to work.
We actively seek to build a diverse team and encourage applications from candidates of all backgrounds and beliefs to apply to our open positions.
We strongly encourage individuals from underrepresented and/or marginalised identities to apply. If you need any accommodations for your interview, please email recruitment@colossyan.com
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: AI video APIs Computer Vision Data pipelines Data quality Distributed Systems Machine Learning Model training Pipelines Python PyTorch Research
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.