Data & AI/ML Architect
Kochi, IN
Milestone Technologies, Inc.
The world's leading companies partner with Milestone Technologies, an IT Services and Digital Solutions company to deliver IT services and technologies at scale, accelerate digital operations, develop innovative applications, and drive...Job Summary:
We are seeking a highly experienced and visionary Databricks Data Architect with over 14 years in data engineering and architecture, including deep hands-on experience in designing and scaling Lakehouse architectures using Databricks. The ideal candidate will possess deep expertise across data modeling, data governance, real-time and batch processing, and cloud-native analytics using the Databricks platform. You will lead the strategy, design, and implementation of modern data architecture to drive enterprise-wide data initiatives and maximize the value from the Databricks platform.
Key Responsibilities:
- Lead the architecture, design, and implementation of scalable and secure Lakehouse solutions using Databricks and Delta Lake.
- Define and implement data modeling best practices, including medallion architecture (bronze/silver/gold layers).
- Champion data quality and governance frameworks leveraging Databricks Unity Catalog for metadata, lineage, access control, and auditing.
- Architect real-time and batch data ingestion pipelines using Apache Spark Structured Streaming, Auto Loader, and Delta Live Tables (DLT).
- Develop reusable templates, workflows, and libraries for data ingestion, transformation, and consumption across various domains.
- Collaborate with enterprise data governance and security teams to ensure compliance with regulatory and organizational data standards.
- Promote self-service analytics and data democratization by enabling business users through Databricks SQL and Power BI/Tableau integrations.
- Partner with Data Scientists and ML Engineers to enable ML workflows using MLflow, Feature Store, and Databricks Model Serving.
- Provide architectural leadership for enterprise data platforms, including performance optimization, cost governance, and CI/CD automation in Databricks.
- Define and drive the adoption of DevOps/MLOps best practices on Databricks using Databricks Repos, Git, Jobs, and Terraform.
- Mentor and lead engineering teams on modern data platform practices, Spark performance tuning, and efficient Delta Lake optimizations (Z-ordering, OPTIMIZE, VACUUM, etc.).
Technical Skills:
- 10+ years in Data Warehousing, Data Architecture, and Enterprise ETL design.
- 5+ years hands-on experience with Databricks on Azure/AWS/GCP, including advanced Apache Spark and Delta Lake.
- Strong command of SQL, PySpark, and Spark SQL for large-scale data transformation.
- Proficiency with Databricks Unity Catalog, Delta Live Tables, Autoloader, DBFS, Jobs, and Workflows.
- Hands-on experience with Databricks SQL and integration with BI tools (Power BI, Tableau, etc.).
- Experience implementing CI/CD on Databricks, using tools like Git, Azure DevOps, Terraform, and Databricks Repos.
- Proficient with streaming architecture using Spark Structured Streaming, Kafka, or Event Hubs/Kinesis.
- Understanding of ML lifecycle management with MLflow, and experience in deploying MLOps solutions on Databricks.
- Familiarity with cloud object stores (e.g., AWS S3, Azure Data Lake Gen2) and data lake architectures.
- Exposure to data cataloging and metadata management using Unity Catalog or third-party tools.
- Knowledge of orchestration tools like Airflow, Databricks Workflows, or Azure Data Factory.
- Experience with Docker/Kubernetes for containerization (optional, for cross-platform knowledge).
Preferred Certifications (a plus):
- Databricks Certified Data Engineer Associate/Professional
- Databricks Certified Lakehouse Architect
- Microsoft Certified: Azure Data Engineer / Azure Solutions Architect
- AWS Certified Data Analytics – Specialty
- Google Professional Data Engineer
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Airflow Architecture AWS Azure CI/CD Data Analytics Databricks Data governance Data quality Data Warehousing DevOps Docker Engineering ETL GCP Git Kafka Kinesis Kubernetes Machine Learning MLFlow MLOps Pipelines Power BI PySpark Security Spark SQL Streaming Tableau Terraform
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.