Senior Web Scraping Specialist
Buenos Aires
Ryz Labs
Unlock the power of LatAm's elite nearshore talent with Ryz Labs. Our top-tier staff augmentation services provide access to the brightest minds in software development, IT, sales, ops, and CS. Elevate your team and achieve unparalleled results...
Remote position (only candidates located in Argentina or Uruguay will be considered)
RYZ is seeking a highly skilled Senior Web Scraping Specialist to join the data engineering team of one of our partners. You will be responsible for designing, building, and maintaining scalable data extraction solutions from multiple online sources, including e-commerce marketplaces. This role requires a deep understanding of web scraping techniques, data integration, and ETL pipelines.
As a key team member, you will collaborate with data scientists, software engineers, and product teams to ensure seamless integration of scraped data into our ecosystem. You should have extensive experience with Python, data warehousing, and data integration tools, as well as expertise in handling large-scale data extraction with a focus on performance and compliance.
Key Responsibilities
- Web Scraping & Data Extraction- Design, develop, and optimize web scraping strategies for large-scale data extraction from dynamic websites.- Identify and assess relevant data sources, ensuring alignment with business objectives.- Implement automated web scraping solutions using Python and libraries like Scrapy, BeautifulSoup, and Selenium.- Build resilient and adaptable scrapers that can handle website structure changes, rate limits, and anti-scraping measures.- Data Processing & Integration- Cleanse, validate, and transform extracted data to ensure accuracy, consistency, and usability.- Store and manage large volumes of scraped data using best-in-class storage solutions.- Develop ETL pipelines to integrate scraped data into data warehouses and analytics platforms.- Collaborate with cross-functional teams, including data scientists and engineers, to make scraped data actionable.- Optimize scraping procedures to improve efficiency, reliability, and scalability across multiple data sources.- Implement solutions for bypassing CAPTCHAs, rotating user agents, and managing proxy services.- Continuously monitor, troubleshoot, and maintain scraping scripts to minimize disruptions due to site changes.- Stay up to date with legal, ethical, and compliance considerations related to web scraping and data collection.- Maintain clear and detailed documentation of scraping methodologies, data pipelines, and best practices.
Required Qualifications
- 5+ years of hands-on experience in web scraping, data extraction, and integration.- Strong proficiency in Python and web scraping frameworks (Scrapy, BeautifulSoup, Selenium).- Expertise in handling dynamic content, browser fingerprinting, and bypassing anti-bot mechanisms (e.g., CAPTCHAs, rate limits, proxy rotation).- Deep understanding of HTML, CSS, XPath, and JavaScript-rendered content.- Experience working with large-scale data storage solutions and optimizing retrieval performance.- Strong grasp of ETL processes, data pipelines, and data warehousing.- Familiarity with APIs for data extraction and integration from public and restricted sources.- Strong problem-solving skills with an ability to debug and adapt to changing web structures.- Solid understanding of web scraping ethics, legal implications, and compliance guidelines.
Preferred Qualifications
- Bachelor’s degree in Computer Science, Data Science, Information Technology, or a related field.- Experience with cloud-based distributed scraping systems (AWS, GCP, Azure).- Knowledge of big data frameworks and experience handling high-volume datasets within Snowflake- Familiarity with machine learning techniques for data extraction and natural language processing (NLP).- Experience working with JSON, XML, CSV, and other structured data formats.- Proficiency with version control systems (Git).
Our values and what to expect:
- Customer First Mentality - every decision we make should be made through the lens of the customer.- Bias for Action - urgency is critical, expect that the timeline to get something done is accelerated.- Ownership - step up if you see an opportunity to help, even if not your core responsibility. Humility and Respect - be willing to learn, be vulnerable, and treat everyone that interacts with RYZ with respect.- Frugality - being frugal and cost-conscious helps us do more with less.- Deliver Impact - get things done in the most efficient way. - Raise our Standards - always be looking to improve our processes, our team, our expectations. Status quo is not good enough and never should be.
RYZ is seeking a highly skilled Senior Web Scraping Specialist to join the data engineering team of one of our partners. You will be responsible for designing, building, and maintaining scalable data extraction solutions from multiple online sources, including e-commerce marketplaces. This role requires a deep understanding of web scraping techniques, data integration, and ETL pipelines.
As a key team member, you will collaborate with data scientists, software engineers, and product teams to ensure seamless integration of scraped data into our ecosystem. You should have extensive experience with Python, data warehousing, and data integration tools, as well as expertise in handling large-scale data extraction with a focus on performance and compliance.
Key Responsibilities
- Web Scraping & Data Extraction- Design, develop, and optimize web scraping strategies for large-scale data extraction from dynamic websites.- Identify and assess relevant data sources, ensuring alignment with business objectives.- Implement automated web scraping solutions using Python and libraries like Scrapy, BeautifulSoup, and Selenium.- Build resilient and adaptable scrapers that can handle website structure changes, rate limits, and anti-scraping measures.- Data Processing & Integration- Cleanse, validate, and transform extracted data to ensure accuracy, consistency, and usability.- Store and manage large volumes of scraped data using best-in-class storage solutions.- Develop ETL pipelines to integrate scraped data into data warehouses and analytics platforms.- Collaborate with cross-functional teams, including data scientists and engineers, to make scraped data actionable.- Optimize scraping procedures to improve efficiency, reliability, and scalability across multiple data sources.- Implement solutions for bypassing CAPTCHAs, rotating user agents, and managing proxy services.- Continuously monitor, troubleshoot, and maintain scraping scripts to minimize disruptions due to site changes.- Stay up to date with legal, ethical, and compliance considerations related to web scraping and data collection.- Maintain clear and detailed documentation of scraping methodologies, data pipelines, and best practices.
Required Qualifications
- 5+ years of hands-on experience in web scraping, data extraction, and integration.- Strong proficiency in Python and web scraping frameworks (Scrapy, BeautifulSoup, Selenium).- Expertise in handling dynamic content, browser fingerprinting, and bypassing anti-bot mechanisms (e.g., CAPTCHAs, rate limits, proxy rotation).- Deep understanding of HTML, CSS, XPath, and JavaScript-rendered content.- Experience working with large-scale data storage solutions and optimizing retrieval performance.- Strong grasp of ETL processes, data pipelines, and data warehousing.- Familiarity with APIs for data extraction and integration from public and restricted sources.- Strong problem-solving skills with an ability to debug and adapt to changing web structures.- Solid understanding of web scraping ethics, legal implications, and compliance guidelines.
Preferred Qualifications
- Bachelor’s degree in Computer Science, Data Science, Information Technology, or a related field.- Experience with cloud-based distributed scraping systems (AWS, GCP, Azure).- Knowledge of big data frameworks and experience handling high-volume datasets within Snowflake- Familiarity with machine learning techniques for data extraction and natural language processing (NLP).- Experience working with JSON, XML, CSV, and other structured data formats.- Proficiency with version control systems (Git).
Our values and what to expect:
- Customer First Mentality - every decision we make should be made through the lens of the customer.- Bias for Action - urgency is critical, expect that the timeline to get something done is accelerated.- Ownership - step up if you see an opportunity to help, even if not your core responsibility. Humility and Respect - be willing to learn, be vulnerable, and treat everyone that interacts with RYZ with respect.- Frugality - being frugal and cost-conscious helps us do more with less.- Deliver Impact - get things done in the most efficient way. - Raise our Standards - always be looking to improve our processes, our team, our expectations. Status quo is not good enough and never should be.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Job stats:
0
0
0
Tags: APIs AWS Azure Big Data Computer Science CSV Data pipelines Data Warehousing E-commerce Engineering ETL GCP Git JavaScript JSON Machine Learning NLP Pipelines Python Selenium Snowflake XML
Region:
South America
Country:
Argentina
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.
Sr. Data Engineer jobsData Scientist II jobsStaff Data Scientist jobsBI Developer jobsStaff Machine Learning Engineer jobsPrincipal Data Engineer jobsData Manager jobsSenior AI Engineer jobsJunior Data Analyst jobsData Science Intern jobsData Science Manager jobsResearch Scientist jobsBusiness Data Analyst jobsPrincipal Software Engineer jobsData Specialist jobsLead Data Analyst jobsSoftware Engineer II jobsData Analyst Intern jobsSr. Data Scientist jobsData Engineer III jobsBI Analyst jobsJunior Data Engineer jobsDevOps Engineer jobsSoftware Engineer, Machine Learning jobsAI/ML Engineer jobs
Snowflake jobsEconomics jobsLinux jobsOpen Source jobsData Warehousing jobsComputer Vision jobsMLOps jobsGoogle Cloud jobsAirflow jobsNoSQL jobsRDBMS jobsKafka jobsBanking jobsHadoop jobsJavaScript jobsClassification jobsScala jobsScikit-learn jobsPhysics jobsKPIs jobsData warehouse jobsOracle jobsTerraform jobsStreaming jobsGitHub jobs
PostgreSQL jobsScrum jobsPySpark jobsR&D jobsLooker jobsPandas jobsSAS jobsCX jobsBigQuery jobsData Mining jobsDistributed Systems jobsJira jobsdbt jobsRobotics jobsIndustrial jobsRedshift jobsUnstructured data jobsReact jobsMicroservices jobsJenkins jobsData strategy jobsNumPy jobsE-commerce jobsPharma jobsGPT jobs