Senior /Expert Machine Learning Data Engineer

Gdańsk

PredictX

The AI-powered Data Intelligence Platform for Travel & Expense Analytics. The future of travel and expense is about to change, PredictX is here to lead the way.

View all jobs at PredictX

Apply now Apply later

About PredictX
Make a real difference at one of London’s foremost SaaS scale-ups: Be ready to pioneer the future of AI, data analytics, and technology. Step into PredictX, where we don't just see AI as a fashionable bandwagon but have lived and breathed AI & ML in every aspect of our product for the past decade.
As an Enterprise SaaS provider, we're revolutionising critical decision-making for many of the world’s largest businesses, including 3 FAANGs, seeking empowerment through our integrative AI technology and Predictive Analytics.
We pride ourselves on our commitment to staying at the forefront of technological advancements. You'll be joining a team that actively explores and integrates the latest innovations to maintain our competitive edge.
The Role
As a Senior/Expert ML/Data Engineer, you will be at the forefront of our data science and machine learning initiatives, actively contributing to the evolution of our AI-powered solutions. You will be instrumental in designing, building, and maintaining our cutting-edge data infrastructure and machine learning pipelines, with a growing focus on leveraging the power of Large Language Models (LLMs) and other emerging AI technologies.
This role demands a strong blend of data engineering prowess, machine learning understanding (including LLMs), and the ability to translate complex business needs into robust technical solutions. You will be expected to lead projects, mentor junior team members, and drive innovation within our rapidly evolving data and AI landscape.

Key Responsibilities

  • Design, develop, and maintain scalable and efficient data pipelines using technologies such as Spark, Python, and relevant ETL tools to support our machine learning models, including those leveraging LLMs, and analytical needs.
  • Architect and implement robust data warehousing solutions and data models that ensure data quality, integrity, and performance, catering to the specific data requirements of advanced AI models.
  • Lead the development, testing, and deployment of machine learning models, including exploration and integration of Large Language Models (LLMs) and other novel AI architectures, collaborating closely with Data Scientists to productionize innovative solutions.
  • Engineer approaches for storing, transforming, transporting, synchronising, archiving, and securing large and complex datasets, including unstructured and semi-structured data crucial for training and deploying advanced AI models.
  • Participate in the evaluation and testing of new machine learning models and frameworks, including LLMs, to assess their potential and applicability to our products.
  • Identify and resolve performance bottlenecks, data quality issues, and other pain points within our data and ML infrastructure. Proactively recommend and implement solutions for optimization and improvement, especially in the context of deploying large-scale AI models.
  • Define and govern data modelling and design standards, best practices, and development methodologies within the team, considering the unique challenges and opportunities presented by LLMs and other advanced AI.
  • Create and maintain comprehensive technical documentation for data pipelines, data models, and machine learning workflows, including details specific to LLM integration and testing.
  • Collaborate effectively with Business Analysts, Data Scientists, and other engineering teams to understand data requirements and deliver impactful data and AI solutions.
  • Stay abreast of the latest advancements in data engineering, machine learning (including LLMs and generative AI), and big data technologies, and actively participate in the evaluation and integration of promising new technologies.
  • Mentor and guide junior Data Engineers and Data Scientists within the team, sharing knowledge about new AI developments and best practices.

Experience/Skills

  • Extensive (5+ years) proven experience in building and maintaining complex data pipelines and data warehousing solutions in a production environment.
  • Expert proficiency in data engineering tools and technologies, including Spark (PySpark and/or Scala), Python, SQL, and various ETL/ELT tools.
  • Deep understanding of data modelling techniques (e.g., star schema, dimensional modelling) and data warehousing concepts.
  • Strong knowledge of data governance, data quality principles, and data security best practices, with an awareness of the specific security and ethical considerations related to AI models.
  • Significant experience with data integration, data cleansing, and data transformation processes on large datasets, including data preparation for machine learning and LLMS.
  • Familiarity with data profiling and data lineage tools.
  • Excellent ability to identify, diagnose, and resolve data issues, performance bottlenecks, and data quality problems effectively, including those encountered when working with large AI models.
  • Strong analytical and problem-solving skills to analyse complex data sets and translate them into actionable technical solutions, with an aptitude for understanding the nuances of AI model performance.
  • Excellent written and verbal communication skills to effectively convey technical concepts to both technical and non-technical audiences, including discussions around AI model capabilities and limitations.
  • Strong teamwork and collaboration skills with the ability to build effective working relationships across teams, especially when integrating new AI technologies.
  • A proactive and solution-oriented approach with a strong drive to learn and implement new technologies, particularly within the rapidly evolving fields of AI and LLMS.
  • Meticulous attention to detail to ensure data accuracy and integrity, which is critical for the reliability of AI models.
  • Proven ability to write clear and concise technical documentation, including documentation for AI model development and deployment.

Desired Skills

  • Solid understanding of machine learning fundamentals, algorithms, and libraries (e.g., scikit-learn, TensorFlow, PyTorch), with specific interest or experience in Natural Language Processing (NLP) and Large Language Models (LLMs).
  • Experience in deploying and monitoring machine learning models, including LLMs, in a production environment.
  • Experience with building and maintaining data pipelines using orchestration tools like Apache Airflow.
  • Familiarity with big data technologies beyond Spark, such as Hadoop, Kafka, NoSQL databases (e.g., MongoDB, Cassandra), and data streaming platforms, and their application in AI workflows.
  • Experience with cloud platforms like AWS, Azure, or GCP, and their data engineering and machine learning services, including those specific to LLMs (e.g., SageMaker, Azure AI).Understanding of CI/CD pipelines for data and machine learning deployments, including considerations for deploying and updating AI models.
  • Exposure to statistical analysis and data visualization tools (e.g., R, Tableau, Power BI), and their use in understanding AI model performance and data insights.
  • Experience with prompt engineering and fine-tuning of Large Language Models.

Technical Skills

  • Programming: Expert proficiency in Python and strong skills in at least one other relevant language like Scala or Java, with experience in libraries relevant to NLP and LLMs.
  • SQL: Mastery of SQL for complex data querying and manipulation across various database systems, including those used in conjunction with AI model training and inference.
  • Data Modelling: Advanced knowledge of data modelling principles and experience in designing efficient and scalable data models that can support the demands of large AI datasets.
  • ETL/ELT Processes: Deep understanding and practical experience with a variety of ETL/ELT tools and frameworks, including those optimized for handling the data requirements of AI model.
  • Big Data Technologies: In-depth understanding and hands-on experience with Spark and ideally other big data ecosystem components relevant to AI data processing.
  • Cloud Platforms: Strong experience with at least one major cloud platform and its data and ML services, including those tailored for LLMs and generative AI.

Soft Skills

  • Communication: Exceptional written and verbal communication skills with the ability to articulate complex technical concepts clearly and concisely to diverse audiences, including discussions about the capabilities and implications of new AI technologies.
  • Collaboration: Proven ability to lead and work effectively within cross-functional teams, fostering strong relationships with stakeholders, especially when exploring and integrating novel AI solutions.
  • Problem-Solving: A highly proactive and analytical approach to problem-solving with the ability to think strategically and implement innovative solutions, particularly in the rapidly evolving landscape of AI.
  • Attention to Detail: A strong commitment to data accuracy and quality, which is paramount when developing and deploying AI models.
  • Documentation: Ability to create and maintain comprehensive and easily understandable technical documentation, including details specific to AI model architecture, training, and deployment.
  • Adaptability & Learning Agility: A passion for continuous learning and the ability to quickly adapt to new technologies and paradigms, especially within the dynamic fields of AI and data engineering.
  • Leadership (Implicit): While not explicitly a management role, the ability to provide technical guidance and mentorship within the team, particularly around new AI technologies, is highly valued.

What we offer....

  • Innovative Projects: Work on cutting-edge projects that push the boundaries of AI, machine learning, and data engineering, including the exciting application of Large Language Models.
  • Dynamic Technology Environment: Be part of a team that actively explores, tests, and integrates the latest technological advancements, including the newest developments in AI and LLMs.
  • Innovation Hub: Collaborate with a team of experts and leverage the latest technologies to drive innovation in the realm of AI and data.
  • Collaborative Culture: A supportive and team-focused environment where your expertise and contributions, especially in the area of new AI technologies, are highly valued.
  • Growth Opportunities: Opportunities for professional development and growth within a rapidly expanding company at the forefront of AI innovation.
  • Impactful Work: Play a key role in shaping the future of our product and influencing critical business decisions for world-leading companies through the application of advanced AI technologies.
Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  1  0  0

Tags: Airflow Architecture AWS Azure Big Data Cassandra CI/CD Data Analytics Data governance Data pipelines Data quality Data visualization Data Warehousing ELT Engineering ETL GCP Generative AI Hadoop Java Kafka LLMs Machine Learning ML infrastructure ML models Model training MongoDB NLP NoSQL Pipelines Power BI Prompt engineering PySpark Python PyTorch R SageMaker Scala Scikit-learn Security Spark SQL Statistics Streaming Tableau TensorFlow Testing

Perks/benefits: Career development

Region: Europe
Country: Poland

More jobs like this