Senior Machine Learning Engineer - Data

Budapest, Hungary

Apply now Apply later

About Us

At Colossyan, we’re building the future of workplace learning with AI video.

Top companies like P&G, Porsche, BASF, BDO, and Paramount already use Colossyan to create engaging and interactive video content faster and more cost effectively than traditional video production. Nearly 1 million videos have been created using Colossyan, and we’ve been recognised as a G2 Leader in multiple product categories.Here’s an overview of our standout features:
  • Create text-to-speech videos hosted by one of our 150+ AI avatars 
  • Translate your video content to 70+ languages in just four clicks 
  • Bring documents to life with our document-to-video feature
  • Personalize your videos by creating a custom avatar of yourself, complete with a cloned voice
  • Make learning content interactive with features like branching, multiple choice quizzes, and more

To learn more about our product features, visit colossyan.com.


The role

We’re looking for a Senior ML Engineer - Data to play a key role in shaping the foundation of our AI models by curating, processing, and optimizing large-scale datasets.In this role, you’ll work closely with research and product teams to ensure our models are trained on the highest quality data. You’ll design robust data pipelines, develop automated evaluation frameworks, and explore innovative techniques like semi-supervised learning and human-in-the-loop ML to continuously improve model performance.This is an opportunity to make a real impact—your work will directly influence the effectiveness and accuracy of our AI-driven products.

Key Responsibilities:

  • Design and develop scalable data pipelines, including sourcing, scraping, filtering, post-processing, de-duplicating, and versioning of data for AI model training.
  • Build frameworks for data evaluation and quality assessment, ensuring that our models are trained on high-quality, reliable data.
  • Develop automated evaluation pipelines to benchmark new models before deployment in our production API.
  • Collaborate with research and product teams to incorporate their data needs and optimize pipelines for various tasks.
  • Conduct open-ended research on data quality improvements, including semi-supervised learning, human-in-the-loop ML, and fine-tuning with human feedback.

What we’re looking for:

  • 5+ years of experience as a Data Engineer, ML Engineer, or Data Scientist handling large-scale data.
  • Strong belief in high-quality data and the impact of data curation on model performance.
  • Experience with end-to-end ML training pipelines.
  • Expertise in large-scale distributed systems.
  • Strong programming skills in Python and experience with PyTorch.
  • (Preferred) Experience working with visual media and computer vision algorithms




At Colossyan, we believe that diversity drives innovation and inclusion fosters a sense of belonging. We are committed to creating a workplace where everyone feels valued, respected, and empowered to bring their authentic selves to work. 

We actively seek to build a diverse team and encourage applications from candidates of all backgrounds and beliefs to apply to our open positions.

We strongly encourage individuals from underrepresented and/or marginalised identities to apply. If you need any accommodations for your interview, please email recruitment@colossyan.com 

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  1  1  0

Tags: AI video APIs Computer Vision Data pipelines Data quality Distributed Systems Machine Learning Model training Pipelines Python PyTorch Research

Region: Europe
Country: Hungary

More jobs like this