Lead Data Engineer - Databricks
Pune, Maharashtra, India
Citco
At Citco, we don't just provide bespoke solutions and better results. We’re a true partner dedicated to developing rich, long-term relationships through gold standard services.About Citco:
Citco is a global leader in fund services, corporate governance and related asset services with staff across 50 office locations worldwide. With more than $1.8 trillion in assets under administration, we deliver end-to-end solutions and exceptional service to meet our clients’ needs.
For more information about Citco, please visit www.citco.com
The Data Engineering Manager will be a key member of the Operations Data Lake (ODL) team and will be responsible for overseeing the design, development, and optimization of data pipelines to ingest structured and unstructured data in the lake. The incumbent will play a pivotal role in supporting multiple domain teams to create organizational data assets while ensuring data integrity, security and implementing centralized governance.
The Operational Data Lake (ODL) is multi-skilled team responsible for the implementation of Citco’s operational data lake and scaling data-driven analytics across different business units by leveraging advanced techniques and technologies. The ODL team is also focused on breaking data silos and building a single source of centrally governed operational data products.
Your Role:
- Lead and manage multiple data projects leveraging Databricks, from project initiation to completion, ensuring adherence to project timelines, budget, and quality standards.
- Collaborate closely with cross-functional teams including Subject Matter Experts, data scientists, engineers, analysts, and other stakeholders to define project requirements, scope, and deliverables.
- Define and implement scalable and robust data lake architectures leveraging Databricks Delta Lake technology
- Design data ingestion, transformation, and storage strategies to ensure efficient and reliable data management
- Oversee the development of data pipelines to ingest, process, and transform data from various sources into Databricks Delta Lake
- Define data models and schemas to support analytical and reporting needs.
- Optimize data structures, partitioning strategies, and storage formats for efficient query performance
- Implement ML pipelines and workflows for model training, validation, and deployment using Databricks MLflow and related tools to support real-time and batch inference.
- Work closely with BI Analysts and Data Visualization specialists to design and optimize data schemas and structures for BI reporting and analytics.
- Establish monitoring and alerting mechanisms to proactively detect issues and optimize data lake performance
- Stay abreast of industry trends, best practices, and emerging technologies in data engineering and Databricks Delta Lake
- Provide technical guidance and leadership on Databricks best practices, methodologies, and implementation strategies.
- Manage Databricks clusters and resources efficiently to optimize performance, scalability, and cost-effectiveness.
- Develop and maintain metadata management solutions to capture, organize, and govern data assets across the organization.
About You:
- Bachelor's degree in Computer Science, Information Systems, Data Science, or a related field. Advanced degree preferred.
- Excellent communication, interpersonal, and leadership skills, with the ability to effectively collaborate with diverse teams and stakeholders.
- Strong analytical and problem-solving abilities, with a focus on delivering innovative and impactful data-driven solutions.
- 8-12 years of total years of experience.
- Deep expertise in Apache Spark, Databricks runtime environment, and Databricks Delta Lake
- Professional/Associate level Databricks certification is required.
- Strong understanding of master data management principles, metadata management, and data cataloging concepts and best practices.
- Strong background in data modeling, ETL/ELT development, and data warehousing.
- Experience with cloud platforms such as AWS, Azure, or Google Cloud Platform and big data technologies.
- Assets:
- Financial product knowledge and knowledge of Hedge Fund Administration
- Experience in setting up and managing Data Center of Excellence (CoE) is highly desirable
- Create interactive reports in Qlik/Tableau/Power BI/Alteryx
- Experience integrating machine learning models and algorithms into data pipelines (experience with Databricks MLflow is a plus).
- Experience working in an Agile environment with knowledge of JIRA, Confluence etc
Our Benefits
Your well-being is of paramount importance to us, and central to our success. We provide a range of benefits, training and education support, and flexible working arrangements to help you achieve success in your career while balancing personal needs. Ask us about specific benefits in your location.
We embrace diversity, prioritizing the hiring of people from diverse backgrounds. Our inclusive culture is a source of pride and strength, fostering innovation and mutual respect.
Citco welcomes and encourages applications from people with disabilities. Accommodations are available upon request for candidates taking part in all aspects of the selection.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Agile Architecture AWS Azure Big Data Computer Science Confluence Databricks Data management Data pipelines Data visualization Data Warehousing ELT Engineering ETL GCP Google Cloud Jira Machine Learning MLFlow ML models Model training Pipelines Power BI Qlik Security Spark Tableau Unstructured data
Perks/benefits: Career development Flex hours
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.