Senior Data Engineer - GCP
Kathmandu
Fusemachines
Unleash your AI Transformation with AI Products and AI Solutions.About Fusemachines
Fusemachines is a 10+ year old AI company, dedicated to delivering state-of-the-art AI products and solutions to a diverse range of industries. Founded by Sameer Maskey, PhD, an Adjunct Associate Professor at Columbia University, our company is on a steadfast mission to democratize AI and harness the power of global AI talent from underserved communities. With a robust presence in four countries and a dedicated team of over 400 full-time employees, we are committed to fostering AI transformation journeys for businesses worldwide. At Fusemachines, we not only bridge the gap between AI advancement and its global impact but also strive to deliver the most advanced technology solutions to the world.
About the role
This is a full-time position in the Media-AdTech Industry, responsible for designing, building, and maintaining the infrastructure required for data integration, storage, processing, and analytics (BI, visualization, and Advanced Analytics).
We are seeking an experienced Google Cloud Data Engineer with deep expertise in Cloud Composer and BigQuery to design, build, and maintain data processing systems using Google Cloud Platform (GCP). The ideal candidate will combine strong technical expertise in orchestration and data warehousing with a thorough understanding of data governance principles.
Qualification & Experience
- Bachelor's degree in Computer Science, Engineering, or similar from a top-tier school.
- 8+ years of experience in data engineering roles, with 3+ years on the Google Cloud platform working on the generation of big datasets using different data sources, in the Media industry.
- Expertise in Python for efficient data integration, storage, and manipulation.
- Expert knowledge, understanding, and experience with SQL and writing advanced SQL queries.
- Advanced SQL skills for complex querying, data modeling, and database design.
- Familiarity with SDLC tools: Jira, GitHub, CI/CD pipelines, and Artifact Registry.
- Proficient in data integration from APIs, databases, flat files, and event streaming.
- Design and maintain ETL processes using Cloud Composer for workflow orchestration.
- Experience with distributed data technologies: Spark/PySpark, DBT/Dataform, and Kafka.
- Advanced expertise in:
- Cloud Computing in GCP, including deep knowledge of a variety of GCP services like: Google Cloud Composer (including custom operator development and complex DAG design)
- Google BigQuery (including advanced SQL, optimization techniques, and best practices).
- Data governance frameworks and implementation
- Strong proficiency and experience with Google Cloud Pub/Sub, Google Cloud Storage, Dataflow, Cloud Spanner, Vertex AI, Google Cloud SQL,
- Experience with Terraform, Kubernetes, and data monitoring tools for pipeline optimization.
- Familiarity with regulatory requirements (GDPR, CCPA, HIPAA).
- Demonstrated experience implementing data governance policies and procedures
Certifications preferred:
Google Cloud Professional Data Engineer certification
Key Responsibilities:
- Design and implement complex data workflows using Google Cloud Composer (Apache Airflow), including custom operators and advanced orchestration patterns.
- Architect and optimize large-scale data solutions in BigQuery, including performance tuning and cost optimization.
- Develop and implement comprehensive data governance policies and procedures, including:
- Data classification and cataloging
- Access control and security policies
- Data retention and archival strategies
- Compliance monitoring and reporting
- Create and optimize BigQuery schemas, partitioning strategies, and query performance
- Design and maintain ETL processes using Cloud Composer for workflow orchestration.
- Develop data governance policies, including data lineage, classification, and access control.
- Establish data quality frameworks and compliance processes for data accuracy.
- Actively participate in Agile meetings, contributing to planning, resource allocation, and project tracking.
Fusemachines is an Equal Opportunities Employer, committed to diversity and inclusion. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or any other characteristic protected by applicable federal, state, or local laws.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Agile Airflow APIs BigQuery CI/CD Classification Computer Science Dataflow Data governance Data quality Data Warehousing dbt Engineering ETL GCP GitHub Google Cloud Jira Kafka Kubernetes PhD Pipelines PySpark Python SDLC Security Spark SQL Streaming Terraform Vertex AI
Perks/benefits: Career development
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.