Data Engineer
Wrocław, Poland
Tooploox
Discover how to build an AI software product with Tooploox Sp. z o.o., your expert partner in turning innovative ideas into successful digital solutions.Hi there!
We are Tooploox, an AI software development company offering custom AI solutions and services. We help innovative companies and startups design and build digital products with generative AI, mobile, and web technologies.
Our team, consisting of nearly 200 experts including our R&D team of over 40 engineers, many with PhDs, has pioneered AI solutions across industries like healthcare, fashion, and e-commerce. We’ve published over 15 research papers in top conferences like NeurIPS and ICML.
We're on the lookout for a Data Engineer to take on a pivotal role in our team. You'll be at the heart of working with data, focusing on scalable batch and streaming data pipelines. If you're someone who loves to merge traditional software development with innovative AI technologies, this role is tailor-made for you.
Feel invited!
What you will do:
- Design, develop, and maintain scalable batch and streaming data pipelines.
- Work with Python to transform, process, and integrate data.
- Handle a mix of structured and unstructured data, including work with NoSQL and vector databases.
- Optimize performance across big data workflows, including tuning Hive and Spark jobs.
Experience and skills you need to join us:
- 5+ years of experience in data engineering or a related field.
- Deep experience with Apache Spark (especially PySpark), Hadoop, and Apache Hive.
- Strong programming skills in Python.
- Solid understanding of database concepts, including experience with NoSQL databases (e.g., MongoDB, Redis) and ideally vector databases.
- Hands-on experience with stream processing, preferably using Apache Flink.
- Familiarity with distributed computing, data warehousing, and performance optimization techniques.
- Strong problem-solving and communication skills.
- Fluency in Polish and English.
It would be great if you also have:
- Experience with LLMs, prompt engineering, or machine learning workflows (we use this in conjunction with vector DBs).
- Proficiency in Java or Scala - useful for deeper Spark optimization or contributing to broader engineering projects.
- Familiarity with Spring Boot for building and deploying data applications.
How we work:
At Tooploox, you have the flexibility to choose your working hours and location. While we value remote work, we also believe in building relationships and invite you to join us in our Warsaw and Wrocław offices. Enjoy a relaxed atmosphere and try some “home-made” pizza from our office pizza oven. We love having pets in the office, so feel free to bring yours along.
Join us and shape the future of AI while working the way you like!
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Big Data Data pipelines Data Warehousing E-commerce Engineering Flink Generative AI Hadoop ICML Java LLMs Machine Learning MongoDB NeurIPS NoSQL Pipelines Prompt engineering PySpark Python R R&D Research Scala Spark Streaming Unstructured data
Perks/benefits: Conferences
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.