Lead Product Software - Data Science Engineer

IND-Pune-IndiQube Orchid, India

Wolters Kluwer

Wolters Kluwer is a global provider of professional information, software solutions, and services.

View all jobs at Wolters Kluwer

Apply now Apply later

Job Description Summary

Designs, develops, tests, debugs and implements more complex operating systems components, software tools, and utilities with full competency. Coordinates with users to determine requirements. Reviews systems under development and related documentation. Makes more complex modifications to existing software to fit specialized needs and configurations, and maintains program libraries and technical documentation. May coordinate activities of the project team and assist in monitoring project schedules and costs.

Responsibilities 

  • Design and implement scalable data pipelines for both ML and non-ML applications 

  • Build and maintain data lakes and feature stores preferably optimized for machine learning 

  • Develop ETL processes for complex, high-volume datasets 

  • Create and maintain infrastructure for ML model training and deployment 

  • Collaborate with data scientists to productionize ML models 

  • Implement CI/CD pipelines for ML models  

  • Optimize data processing for model training and inference 

  • Monitor data ystems performance and troubleshoot issues 

  • Ensure data quality, integrity, and governance  

  • Design real-time data processing solutions for ML applications and other consumer applications 

Requirements 

  • Bachelor's or master's degree in computer science, Engineering, or related technical field 

  • Minimum of 5 years' experience in building data pipelines for both structured and unstructured data. 

  • At least 2 years' experience in Azure data pipeline development. 

  • Preferably 3 or more years' experience with Hadoop, Azure Databricks, Stream Analytics, Eventhub, Kafka, and Flink. 

  • Strong proficiency in Python and SQL 

  • Experience with big data technologies (Spark, Hadoop, Kafka) 

  • Familiarity with ML frameworks (TensorFlow, PyTorch, scikit-learn) 

  • Knowledge of model serving technologies (TensorFlow Serving, MLflow, KubeFlow) will be a plus 

  • Experience with one pof the cloud platforms (Azure preferred) and their Data Services. Understanding ML services will get preference. 

  • Understanding of containerization and orchestration (Docker, Kubernetes) 

  • Experience with data versioning and ML experiment tracking will be great addition 

  • Knowledge of distributed computing principles 

  • Familiarity with DevOps practices and CI/CD pipelines 
     

  • Preferred Qualifications

  • Bachelor’s degree in Computer Science or equivalent practical experience.  

  • Experience with Agile/Scrum methodologies.  

  • Background in tax and accounting domains is advantageous.  

  • Azure Data Engineer certification is beneficial. 

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  0  0  0

Tags: Agile Azure Big Data CI/CD Computer Science Databricks Data pipelines Data quality DevOps Docker Engineering ETL Flink Hadoop Kafka Kubeflow Kubernetes Machine Learning MLFlow ML models Model training Pipelines Python PyTorch Scikit-learn Scrum Spark SQL TensorFlow Unstructured data

Perks/benefits: Career development Team events

Region: Asia/Pacific
Country: India

More jobs like this