Lead Product Software - Data Science Engineer
IND-Pune-IndiQube Orchid, India
- Remote-first
- Website
- @wolters_kluwer 𝕏
- Search
Wolters Kluwer
Wolters Kluwer is a global provider of professional information, software solutions, and services.Job Description Summary
Designs, develops, tests, debugs and implements more complex operating systems components, software tools, and utilities with full competency. Coordinates with users to determine requirements. Reviews systems under development and related documentation. Makes more complex modifications to existing software to fit specialized needs and configurations, and maintains program libraries and technical documentation. May coordinate activities of the project team and assist in monitoring project schedules and costs.
Responsibilities
Design and implement scalable data pipelines for both ML and non-ML applications
Build and maintain data lakes and feature stores preferably optimized for machine learning
Develop ETL processes for complex, high-volume datasets
Create and maintain infrastructure for ML model training and deployment
Collaborate with data scientists to productionize ML models
Implement CI/CD pipelines for ML models
Optimize data processing for model training and inference
Monitor data ystems performance and troubleshoot issues
Ensure data quality, integrity, and governance
Design real-time data processing solutions for ML applications and other consumer applications
Requirements
Bachelor's or master's degree in computer science, Engineering, or related technical field
Minimum of 5 years' experience in building data pipelines for both structured and unstructured data.
At least 2 years' experience in Azure data pipeline development.
Preferably 3 or more years' experience with Hadoop, Azure Databricks, Stream Analytics, Eventhub, Kafka, and Flink.
Strong proficiency in Python and SQL
Experience with big data technologies (Spark, Hadoop, Kafka)
Familiarity with ML frameworks (TensorFlow, PyTorch, scikit-learn)
Knowledge of model serving technologies (TensorFlow Serving, MLflow, KubeFlow) will be a plus
Experience with one pof the cloud platforms (Azure preferred) and their Data Services. Understanding ML services will get preference.
Understanding of containerization and orchestration (Docker, Kubernetes)
Experience with data versioning and ML experiment tracking will be great addition
Knowledge of distributed computing principles
Familiarity with DevOps practices and CI/CD pipelines
Preferred Qualifications
Bachelor’s degree in Computer Science or equivalent practical experience.
Experience with Agile/Scrum methodologies.
Background in tax and accounting domains is advantageous.
Azure Data Engineer certification is beneficial.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Agile Azure Big Data CI/CD Computer Science Databricks Data pipelines Data quality DevOps Docker Engineering ETL Flink Hadoop Kafka Kubeflow Kubernetes Machine Learning MLFlow ML models Model training Pipelines Python PyTorch Scikit-learn Scrum Spark SQL TensorFlow Unstructured data
Perks/benefits: Career development Team events
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.