Consulting Data Engineer

South Africa - Cape Town

RELX

Make better decisions, get better results and be more productive with RELX's analytics and decision tools

View all jobs at RELX

Apply now Apply later

About The Team

The Content Engineering teams at LexisNexis Intellectual Property (IP) are driving innovation in how patent data is processed, enriched, and leveraged. These teams comprise Data Engineers, Data Scientists, and Data Analysts, supported by subject matter experts in patent data. Our teams work closely with Databricks to migrate legacy ETL systems to a modern Strategic Data Platform built on Python, PySpark, and Databricks. This platform uses a medallion architecture to ingest, transform, and enrich global patent data. It will serve not only our flagship product PatentSight+ but also many other strategic products within our IP portfolio.

This is an opportunity to join a high-impact initiative still in its formative stages, where you can influence and evolve architecture, tooling, and best practices from the ground up.

About the Role

As the Principal Data Engineer for one of the Data Platform teams you will act as a technical lead in bringing new content and enrichments to the strategic Data Platform at LexisNexis IP. You will be instrumental in executing our data strategy for the Data Platform. Your role will be pivotal in developing and implementing advanced solutions for data integration, quality control, and continuous delivery, driving our data operations to new heights.

This is a senior technical individual contributor and does not have line management responsibilities. You will technically lead a high-performing agile team, guiding them through complex delivery projects, identifying technical needs, and devising innovative solutions. Your expertise will be crucial in embedding best practices and state-of-the-art data engineering tools, ensuring that our workflows are both efficient and scalable.

You will work closely with a range of technical leaders, data scientists, and data analysts across the Data Platform and the wider technology department. With colleagues based in the UK and EU, you will also engage with a diverse range of stakeholders across the UK, Germany, Netherlands, and the USA.

What does success look like? In the first 3 months you will lead and deliver content expansion ETL’s in our key pipeline’s, setting architectural best practices and standards as you go. You will build networks with other teams in Content, especially on the Data Platform.

Responsibilities

  • Architect and lead the development of our patent data ingestion pipeline using Databricks, Python, and PySpark.
  • Mentor and guide a team of data engineers, fostering a collaborative environment that encourages growth and innovation. You will enable and lead technical discussions within the team and with stakeholders.
  • Ensure the pipeline is efficient, scalable, and robust, capable of handling terabytes of data with low latency. Eliminate inefficiencies and teach the techniques to the team.
  • Work closely with the wider cross-functional engineering department, including data scientists, analysts, and product managers, to ensure the pipeline meets business needs.
  • Contribute to the overall data engineering strategy and drive the adoption of best practices in coding, architecture, and deployment.
  • Identify and resolve technical challenges, ensuring the smooth operation of the data ingestion pipeline.
  • Translate strategic business objectives into technical architecture and delivery plans.
  • Contribute to platform-wide standards, tooling, and architecture decisions.

Requirements

  • Expertise in Python and PySpark is essential for you to lead and develop the skills of the team.
  • Expertise in Databricks would be highly desirable and advantageous.
  • Demonstrated ability to design and implement scalable data architectures for both batch and streaming data processing.
  • Proficiency in using cloud platforms such as AWS, Azure, or Google Cloud for data infrastructure management would be beneficial.
  • Prior experience with Patent data, or other complex data sources, is extremely beneficial.
  • Knowledge of data governance practices, including data quality management, metadata management, and data lineage is also beneficial.
  • Exposure to CI/CD for data pipelines using tools like GitHub Actions, Azure DevOps, or Airflow.
  • Proven experience in technically leading and mentoring data engineering teams.

Work in a way that works for you

We promote a healthy work/life balance across the organization. We offer an appealing working prospect for our people. With numerous wellbeing initiatives, shared parental leave, study assistance and sabbaticals, we will help you meet your immediate responsibilities and your long-term goals.

  • Working flexible hours - flexing the times when you work in the day to help you fit everything in and work when you are the most productive


Working for you

We know that your well-being and happiness are key to a long and successful career. These are some of the benefits we are delighted to offer:

  • Medical Aid

  • Retirement Plan inclusive of Risk Benefits (Disability, Critical Illness, Life Cover & Funeral Cover)

  • Modern family benefits, including adoption and surrogacy

  • Study Leave


About the Business

LexisNexis Legal & Professional® provides legal, regulatory, and business information and analytics that help customers increase their productivity, improve decision-making, achieve better outcomes, and advance the rule of law around the world. As a digital pioneer, the company was the first to bring legal and business information online with its Lexis® and Nexis® services.

We are committed to providing a fair and accessible hiring process. If you have a disability or other need that requires accommodation or adjustment, please let us know by completing our Applicant Request Support Form or please contact 1-855-833-5120.

Criminals may pose as recruiters asking for money or personal information. We never request money or banking details from job applicants. Learn more about spotting and avoiding scams here.

Please read our Candidate Privacy Policy.

We are an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law.

USA Job Seekers:

EEO Know Your Rights.

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  0  0  0

Tags: Agile Airflow Architecture AWS Azure Banking CI/CD Consulting Databricks Data governance DataOps Data pipelines Data quality Data strategy DevOps Engineering ETL GCP GitHub Google Cloud Pipelines Privacy PySpark Python Streaming

Perks/benefits: Career development Flex hours Medical leave Parental leave Startup environment

Region: Africa
Country: South Africa

More jobs like this