Data Engineer / Data Scientist
Noida Berger Tower, India
Thales
From Aerospace, Space, Defence to Security & Transportation, Thales helps its customers to create a safer world by giving them the tools they need to perform critical tasksData Engineer Cum Data Scientist
Key Responsibilities:
Data Engineering:
- Design, build, and maintain scalable and efficient data pipelines on Databricks for machine learning workflows.
- Optimize data processing workflows to handle large-scale datasets using Spark and Delta Lake.
- Implement best practices for data versioning, quality, and governance.
Model Development & Deployment:
- Develop, train, and fine-tune machine learning models like Cross-sell, classification and segmentation to meet business objectives.
- Use MLflow to track experiments, manage model lifecycle, and ensure reproducibility of results.
- Register and version models in Databricks Model Registry and maintain model lineage.
Model Productionization:
- Deploy models to production environments and integrate them into real-time or batch systems.
- Implement robust CI/CD pipelines for machine learning workflows in Databricks.
- Monitor model performance in production using metrics and feedback loops.
Innovation, Collaboration & Best Practices:
- Stay updated with the latest trends and advancements in data engineering and machine learning technologies.
- Promote best practices in machine learning, including feature engineering, hyperparameter tuning, and model evaluation.
- Present results and insights from machine learning experiments to non-technical audiences.
Qualifications & Skills:
Must-Have Skills:
- Experience: 3+ years in data engineering and machine learning roles, with expertise in Databricks.
- Technical Stack: Strong experience with Databricks, PySpark, SQL, MLflow, Model Registry, and Apache Spark, MLOps, Terraform
- Programming: Proficiency in Python for data processing and machine learning.
- Model Deployment: Hands-on experience with deploying and managing models in production.
- ML Lifecycle: Deep understanding of the end-to-end machine learning lifecycle, including model evaluation, monitoring, and maintenance.
- Cloud Platforms: Experience with cloud platforms like GCP for deploying Databricks solutions.
Good-to-Have Skills:
- Familiarity with generative AI and deep learning frameworks such as TensorFlow or PyTorch.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: CI/CD Classification Databricks Data pipelines Deep Learning Engineering Feature engineering GCP Generative AI Machine Learning MLFlow ML models MLOps Model deployment Pipelines PySpark Python PyTorch Security Spark SQL TensorFlow Terraform
Perks/benefits: Career development
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.