Senior Data Engineer - Databricks

Gurugram, Haryana, India

⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️

Full Time Senior-level / Expert USD 59K - 109K * ^est.

Srijan Technologies

Your trusted Drupal partner Srijan (now as Material) continues to help brands drive digital transformation through data, AI, Cloud and platform engineering.

View all jobs at Srijan Technologies

Apply now Apply later

Posted 5 hours ago

Location: Gurugram,Haryana,India

About Us

We turn customer challenges into growth opportunities. Material is a global strategy partner to the world’s most recognizable brands and innovative companies. Our people around the globe thrive by helping organizations design and deliver rewarding customer experiences. We use deep human insights, design innovation and data to create experiences powered by modern technology. Our approaches speed engagement and growth for the companies we work with and transform relationships between businesses and the people they serve. Srijan, a Material company, is a renowned global digital engineering firm with a reputation for solving complex technology problems using their deep technology expertise and leveraging strategic partnerships with top-tier technology partners

Job Summary:

We are seeking a Senior Data Engineer – Databricks with a strong development background in Azure Databricks and Python, who will be instrumental in building and optimising scalable data pipelines and solutions across the Azure ecosystem. This role requires hands-on development experience with PySpark, data modelling, and Azure Data Factory. You will collaborate closely with data architects, analysts, and business stakeholders to ensure reliable and high-performance data solutions.

Experience Required: 4+ Years

Lead/Senior Data Engineer (Microsoft Azure, Databricks, Data Factory, Data Engineer, Data Modelling)

Key Responsibilities:

Develop and Maintain Data Pipelines:
Design, implement, and optimise scalable data pipelines using Azure Databricks (PySpark) for both batch and streaming use cases.
Azure Platform Integration:
Work extensively with Azure services including Data Factory, ADLS Gen2, Delta Lake, and Azure Synapse for end-to-end data pipeline orchestration and storage.
Data Transformation & Processing:
Write efficient, maintainable, and reusable PySpark code for data ingestion, transformation, and validation processes within the Databricks environment.
Collaboration:
Partner with data architects, analysts, and data scientists to understand requirements and deliver robust, high-quality data solutions.
Performance Tuning and Optimisation:
Optimise Databricks cluster configurations, notebook performance, and resource consumption to ensure cost-effective and efficient data processing.
Testing and Documentation:
Implement unit and integration tests for data pipelines. Document solutions, processes, and best practices to enable team growth and maintainability.
Security and Compliance:
Ensure data governance, privacy, and compliance are upheld across all engineered solutions, following Azure security best practices.

Preferred Skills :

Strong hands-on experience with Delta Lake, including table management, schema evolution, and implementing ACID-compliant pipelines.
Skilled in developing and maintaining Databricks notebooks and jobs for large-scale batch and streaming data processing.
Experience writing modular, production-grade PySpark and Python code, including reusable functions and libraries for data transformation.
Experience in streaming data ingestion and Structured Streaming in Databricks for near real-time data solutions.
Knowledge of performance tuning techniques in Spark – including job optimization, caching, and partitioning strategies.
Exposure to data quality frameworks and testing practices (e.g., pytest, data validation libraries, custom assertions).
Basic understanding of Unity Catalog for managing data governance, access controls, and lineage tracking from a developer’s perspective.
Familiarity with Power BI - able to structure data models and views in Databricks or Synapse to support BI consumption.