Data Analytics Web Scraping Engineer (part-time)
Prague, Czechia
IDC
IDC examines consumer markets by devices, applications, networks, and services to provide complete solutions for succeeding in these expanding markets.Overview
IDC is seeking a part-time Data Analytics Engineer for our Webscraping and Data Harvesting Team based in Ostrava, Czech Republic. This role involves supporting our established team that focuses on web crawling and gathering data from the Internet. The primary responsibilities include deploying web crawling technology to collect structured and unstructured data from various sites on a specific schedule, as well as data cleaning, classifying, validating, and unifying based on business rules and taxonomy. Additionally, the role involves enriching the data with other information and integrating it into existing products and internal business processes.
Responsibilities
- Assisting in web crawling and data gathering for our largest data product line.
- Supporting the evaluation, creation, and deployment of web crawling technology.
- Helping develop machine learning algorithms with a focus on Natural Language Processing to clean, classify, and match gathered data to existing taxonomy.
- Collaborating with internal business stakeholders to integrate scraped data into existing research processes and proprietary systems.
- Working cross-departmentally to define metrics, guidelines, and strategies to measure data coverage and its quality.
- Contributing to a global team in designing and building new products that aggregate and visualize scraped data from various sources.
Qualifications
- Bachelor's Degree or equivalent in Mathematics, Computer Science, Statistics or Information Management.
- Experience in data engineering or roles related to data engineering.
- Demonstrated strong technical knowledge of object-oriented programming in Python.
- Strong analytic skills related to working with unstructured datasets.
- SQL knowledge and experience working with relational databases.
- Proven ability to work independently and ensure completion of tasks accurately and on time.
- Strong English communication skills in both verbal and written form.
- Open to learn new technologies and tools.
Preferred Qualifications:
- 1+ years of experience in machine learning or natural language processing.
- Experience using technologies including Browse.ai.
- Python-Scrapy, Octoparse, Beautiful Soup, Mozenda, NLTK, PostgreSQL/Snowflake.
Perks & Benefits
- 5 weeks of holidays + extra corporate day off
- Sick days
- Flexibility to work from home most of the week
- Certain flexibility to schedule your working hours
- Cafeteria system (use points on Flexipasses, pension/life insurance, or Multisport card)
- Meal allowance
IDC is an Equal Opportunity Employer. Applicants and employees are considered for positions and are evaluated without regard to mental or physical disability, handicap, race, color, religion, gender, gender identity and expression, ancestry, national origin, age, genetic information, military or veteran status, sexual orientation, marital status or other categories protected by law.
Why IDC ?IDC is the most respected global technology market research firm. We are changing the way the world thinks about the impact of technology on business and society. Our people, data, and analytics create global technology insights that accelerate customer success. IDC has been recognized for five consecutive years (2020, 2021, 2022, 2023, 2024) by the IIAR as the Analyst Firm of the Year which is one of the highest accolades for the technology market research industry.
Our collaborative, innovative and entrepreneurial culture is the perfect place for you to discover your future.
This position is part-time and is based in our Prague or Ostrava offices, with a Hybrid work schedule.
Recruitment Fraud Notice: IDG/IDC would like to inform you that we conduct our formal communications via corporate email, our Applicant Tracking System iCIMS, LinkedIn messaging, or directly by phone. We do not use any other platform (including Telegram, WhatsApp, Signal, text, instant message, etc.) to communicate with prospective candidates. If you receive any communication outside of our formal communications channels, please ignore it and block the sender or caller. In addition, we do not ask candidates to provide sensitive personally identifiable information such as bank account or social security numbers. If you have been contacted by someone claiming to represent a job offer, please report it as potential job fraud to law enforcement.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Computer Science Data Analytics Engineering Machine Learning Market research Mathematics NLP NLTK OOP PostgreSQL Python RDBMS Research Security Snowflake SQL Statistics Unstructured data
Perks/benefits: Career development Insurance
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.