Data Engineer (Databricks and AWS)
Pampanga, Manila, Philippines
Citco
At Citco, we don't just provide bespoke solutions and better results. We’re a true partner dedicated to developing rich, long-term relationships through gold standard services.
Position: Data Engineer (Databricks & AWS)
Company Overview Citco is a global leader in financial services, delivering innovative solutions to some of the world's largest institutional clients. We harness the power of data to drive operational efficiency and informed decision-making. We are looking for a Data Engineer with strong Databricks expertise and AWS experience to contribute to mission-critical data initiatives.
Role Summary as a Data Engineer, you will be responsible for developing and maintaining end-to-end data solutions on Databricks (Spark, Delta Lake, MLflow, etc.) while working with core AWS services (S3, Glue, Lambda, etc.). You will work within a technical team, implementing best practices in performance, security, and scalability. This role requires solid understanding of Databricks and experience with cloud-based data platforms.
Key Responsibilities
1.Databricks Platform & Development
- Implement Databricks Lakehouse solutions using Delta Lake for ACID transactions and data versioning
- Utilize Databricks SQL Analytics for querying and report generation
Support cluster management and Spark job optimization - Develop structured streaming pipelines for data ingestion and processing
- Use Databricks Repos, notebooks, and job scheduling for development workflows
2.AWS Cloud Integration
- Work with Databricks and AWS S3 integration for data lake storage
- Build ETL/ELT pipelines using AWS Glue catalog, AWS Lambda, and AWS Step Functions
- Configure networking settings for secure data access
- Support infrastructure deployment using AWS CloudFormation or Terraform
3.Data Pipeline & Workflow Development
- Create scalable ETL frameworks using Spark (Python/Scala)
- Participate in workflow orchestration and CI/CD implementation
- Develop Delta Live Tables for data ingestion and transformations
- Support MLflow integration for data lineage and reproducibility
4.Performance & Optimization
- Implement Spark job optimizations (caching, partitioning, joins)
- Support cluster configuration for optimal performance
- Optimize data processing for large-scale datasets
5.Security & Governance
- Apply Unity Catalog features for governance and access control
- Follow compliance requirements and security policies
- Implement IAM best practices
6.Team Collaboration
- Participate in code reviews and knowledge-sharing sessions
- Work within Agile/Scrum development framework
- Collaborate with team members and stakeholders
7.Monitoring & Maintenance
- Help implement monitoring solutions for pipeline performance
- Support alert system setup and maintenance
- Ensure data quality and reliability standards
Qualifications
1.Educational Background
- Bachelor's degree in Computer Science, Data Science, Engineering, or equivalent experience
2.Technical Experience
- Databricks Experience: 2+ years of hands-on Databricks (Spark) experience
- AWS Knowledge: Experience with AWS S3, Glue, Lambda, and basic security practices
- Programming Skills: Strong proficiency in Python (PySpark) and SQL
- Data Warehousing: Understanding of RDBMS and data modeling concepts
- Infrastructure: Familiarity with infrastructure as code concepts
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Agile AWS AWS Glue CI/CD CloudFormation Computer Science Databricks Data quality Data Warehousing ELT Engineering ETL Lambda MLFlow Pipelines PySpark Python RDBMS Scala Scrum Security Spark SQL Step Functions Streaming Terraform
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.