GCP Data Engineer

United States

Saama

Saama automates key clinical development and commercialization processes, with artificial intelligence (AI), machine learning (ML) and advanced-analytics, accelerating your time to market.

View all jobs at Saama

Apply now Apply later

GCP Data Engineer Responsible for construction and development of "large-scale cloud data processing systems" in the Google Cloud Platform (GCP). The GCP Data Engineer must have considerable expertise in data warehousing and the job requires proven coding expertise with Python, Java, SQL, and Spark languages. Must be able to implement enterprise cloud data architecture designs, and will work closely with the rest of the scrum team and internal business partners to identify, evaluate, design, and implement large scale data solutions, structured and unstructured, public and proprietary data. The GCP Data Engineer will work iteratively on the cloud platform to design, develop and implement scalable, high performance solutions that offer measurable business value to customers.  Qualifications and Education:
  • GCP Data Engineer certification preferred
  • Bachelor's in computer engineering or equivalent field or equivalent foreign degree required
 Required Work Experience:  
  • Minimum of 10+ years of work experience
  • 5+ years of experience in an engineering role using Python, Java, Spark, and SQL.
  • 5+ experience working as a Data Engineer in GCP
  • Demonstrated proficiency with Google’s Identity and Access Management (IAM) API
  • Demonstrated proficiency with Airflow
 Desired Work Experience:  
  • Coding experience with Python, Java, Spark, and SQL
  • Strong Linux/Unix background and hands on knowledge.
  • Past experience with big data technologies including HDFS, Spark, Impala, Hive
  • Experience with gcp platform development tools Pub/sub, cloud storage, big table, big query, data flow, data proc, and composer desired.
  • Knowledge in Hadoop and cloud platforms and surrounding ecosystems.
  • Experience with web services and APIs as in RESTful and SOAP.
  • Strong experience working with real time streaming applications and batch style large scale distributed computing applications using tools like Spark, Kafka, Flume, pubsub, and airflow.
  • Ability to work with different file formats like Avro, Parquet, and JSON.
  • Experience with Shell scripting and bash.
  • Experience with version control platform github
  • Experience unit testing code.
  • Experience with development ecosystem including Jenkins, Artifactory, CI/CD, and Terraform.
  • Works on problems of diverse scope and complexity ranging from moderate to substantial
  • Assists senior professionals in determining methods and procedures for new tasks
  • Leads basic or moderately complex projects/activities on semi-regular basis
  • Must possess excellent written and verbal communication skills
  • Ability to understand and analyze complex data sets
  • Exercises independent judgment on basic or moderately complex issues regarding job and related tasks
  • Makes recommendations to management on new processes, tools and techniques, or development of new products and services
  • Makes decisions regarding daily priorities for a work group; provides guidance to and/or assists staffon non-routine or escalated issues
  • Decisions have a moderate impact on operations within a department
  • Works under minimal supervision, uses independent judgment requiring analysis of variable factors
  • Requires little instruction on day-to-day work and general direction on more complex tasks and projects
  • Collaborates with senior professionals in the development of methods, techniques and analytical approach
  • Ability to advise management on approaches to optimize for data platform success.
  • Able to effectively communicate highly technical information to numerous audiences, including management, the user community, and less-experienced staff.
  • Consistently communicate on status of project deliverables
  • Consistently provide work effort estimates to management to assist in setting priorities
  • Deliver timely work in accordance with estimates
  • Solve problems as they arise and communicate potential roadblocks to manage expectations
  • Adhere strictly to all security policies
  • Proficient in multiple programming languages, frameworks, domains, and tools.
  • Coding skills in Scala
  • Ability to document designs and concepts
  • API Orchestration and Choreography for consumer apps
  • Well rounded technical expertise in Apache packages and Hybrid cloud architectures
  • Pipeline creation and automation for Data Acquisition
  • Metadata extraction pipeline design and creation between raw and finally transformed datasets
  • Quality control metrics data collection on data acquisition pipelines
  • Able to collaborate with scrum team including scrum master, product owner, data analysts, Quality Assurance, business owners, and data architecture to produce the best possible end products
  • Experience contributing to and leveraging jira and confluence.
  • Managing and scheduling batch jobs.
  • Hands on experience in Analysis, Design, Coding and Testing phases of Software Development Life Cycle (SDLC).
Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  0  0  0
Category: Engineering Jobs

Tags: Airflow APIs Architecture Avro Big Data BigQuery Bigtable CI/CD Confluence Data Warehousing Engineering GCP GitHub Google Cloud Hadoop HDFS Java Jenkins Jira JSON Kafka Linux Parquet Pipelines Python Scala Scrum SDLC Security Shell scripting Spark SQL Streaming Terraform Testing

Region: North America
Country: United States

More jobs like this