Senior Big Data Engineer

Warsaw, Masovian Voivodeship, Poland

⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️

Full Time Senior-level / Expert EUR 41K - 95K * ^est.

Globaldev Group

Globaldev drives businesses to maximize their potential with our custom software development and staff augmentation services.

View all jobs at Globaldev Group

Apply now Apply later

Posted 22 hours ago

We’re looking for a highly technical, independent, and visionary Big Data Engineer to take ownership of our next-generation distributed training pipelines and infrastructure. This is a hands-on, high-impact role in the core of our algorithmic decision-making systems - shaping how models are trained and deployed at a scale across billions of data points in real-time AdTech environments.

You’ll be responsible for designing and building scalable ML systems from the ground up from data ingestion to model training to evaluation. You'll work closely with Algo researchers, data engineers, and production teams to drive innovation and performance improvements throughout the lifecycle.

Responsibilities

Design and implement large-scale, distributed ML training pipelines.
Build scalable infrastructure for data preprocessing, feature engineering, and model evaluation.
Lead the technical design and development of new ML systems: from architecture to production.
Collaborate cross-functionally with DS, infra teams, Product, BA and Engineering teams to define and deliver impactful solutions.
Own the full lifecycle of ML infra: tooling, versioning, monitoring, automation, measuring results and quickly responding to critical issues.
Continuously research and adopt best-in-class practices in MLOps, performance tuning, and distributed systems.

Requirements

B.Sc. or M.Sc. in Computer Science, Software Engineering, or other equivalents fields.
5+ years of hands-on experience in backend or ML engineering.
Strong Python skills and experience working with distributed systems and parallel data processing frameworks such as Spark (using PySpark or Scala), Dask, or similar technologies. Familiarity with Scala is a strong advantage, especially in performance critical.
Proven track record in designing and scaling ML infrastructure.
Deep understanding of ML workflows and lifecycle management.
Experience in cloud environments (AWS, GCP, OCI) and containerized deployment (Kubernetes).
Understanding databases and SQL for data retrieval.
Strong communication skills and ability to drive initiatives independently.
A passion for clean code, elegant architecture, and measurable impact.
Monitoring and alerting tools (e.g. Grafana, Kibana).
Experience working with in-memory and NoSQL databases (e.g. Aerospike, Redis, Bigtable) to support ultra-fast data access in production-grade ML services.