NLP Data Engineer

Remote, Czechia

IDC

IDC examines consumer markets by devices, applications, networks, and services to provide complete solutions for succeeding in these expanding markets.

View all jobs at IDC

Apply now Apply later

Overview

IDC is seeking a dynamic and experienced NLP Data Engineer to join our AI teams which  focused on delivering production-grade Python applications deployed in Kubernetes and AWS Lambda.

We design scalable, high-performance AI systems that blend generative AI with proven ML approaches like classification and predictive modeling.

IDC produces massive volumes of multilingual natural language data (English, Japanese, Chinese, and more) - a goldmine for any NLP enthusiast.

We collaborate with other teams to integrate AI and ML solutions effectively, ensuring practical impact and reliability.

Responsibilities

  • Extract information from Word, Excel, Power-Point and databases, and design APIs and data structures to access extracted data.
  • Improve data quality using LLMs and heuristics and design quality control mechanisms.
  • Setup databases in AWS and write Python connectors to them.
  • Architect, implement and monitor data pipelines.
  • Collaborate with AI researchers and data scientists, providing technical support, performing code reviews.
  • Learn new AI technologies on the run.

Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
  • Proficiency in Python, and at least one database technology (PostgreSQL, MongoDB, or Elasticsearch).
  • Experience with data pipeline orchestration tools (e.g., Apache Airflow).
  • Background in Natural Language Processing or hands-on experience with NLP projects.
  • Strong debugging skills for data issues, including root cause analysis and resolution.
  • Experience with cloud platforms, preferably AWS.

 

 

 

 

IDC, is an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, sex, sexual orientation, gender identity, religion, national origin, disability, veteran status, age, marital status, pregnancy, genetic information, or other legally protected status.

 

Why IDC ?

IDC is the most respected global technology market research firm.  We are changing the way the world thinks about the impact of technology on business and society. Our people, data, and analytics create global technology insights that accelerate customer success.  IDC has been recognized for five consecutive years (2020, 2021, 2022, 2023, 2024) by the IIAR as the Analyst Firm of the Year which is one of the highest accolades for the technology market research industry.

 

 

Recruitment Fraud Notice: IDG/IDC/Foundry  would like to inform you that we conduct our formal communications via corporate email, our Applicant Tracking System iCIMS, LinkedIn messaging, or directly by phone. We do not use any other platform (including Telegram, WhatsApp, Signal, text, instant message, etc.) to communicate with prospective candidates. If you receive any communication outside of our formal communications channels, please ignore it and block the sender or caller. In addition, we do not ask candidates to provide sensitive personally identifiable information such as bank account or social security numbers. If you have been contacted by someone claiming to represent a job offer, please report it as potential job fraud to law enforcement.

 

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  0  0  0

Tags: Airflow APIs AWS Classification Computer Science Data pipelines Data quality Elasticsearch Engineering Excel Generative AI Kubernetes Lambda LLMs Machine Learning Market research MongoDB NLP Pipelines PostgreSQL Predictive modeling Python Research Security

Regions: Remote/Anywhere Europe
Country: Czechia

More jobs like this