Lead Data Engineer
Sofia, Bulgaria
About hireworks
hireworks is building a community of top talent in Bulgaria and unlocking unparalleled access to positions at leading U.S. based companies. As your employer, hireworks will ensure you have a seamless interview, onboarding and employee experience - providing ongoing support and resources along the way. Established in 2023, hireworks is forging corp to corp relationships with leading U.S. based organizations looking to grow their teams with best in class talent out of Bulgaria. Working with hireworks means unlocking access to a network of local peers and mentors and career opportunities through our client network.
Position Overview
Our client is seeking a Lead Data Engineer who will be responsible for designing, developing, and maintaining our data pipelines and architectures. You will work closely with cross-functional teams to ensure our data systems are scalable, reliable, and capable of handling complex data processing tasks.
What You'll Do
Design and implement scalable data architectures using Lambda or Kappa architecture patterns to support both batch and real-time data processing needs.
Develop and maintain data pipelines on Google Cloud Platform (GCP) using Dataproc, Dataflow, and Composer (Airflow).
Work with non-relational databases, including Column Family databases (e.g., Cassandra, Bigtable) and Document databases (e.g., Firestore, DynamoDB).
Manage and optimize different file systems such as Avro, Protobuf, and Parquet for efficient data storage and retrieval.
Utilize Apache Spark for batch processing tasks, ensuring data is processed efficiently and accurately.
Collaborate with software engineers and data scientists to integrate data processing solutions with the overall data ecosystem.
Ensure the reliability, availability, and performance of data infrastructure in a production environment.
Stay updated on industry trends and continuously seek to improve the data engineering practices within the organization.
About You:
6+ Years of proven experience in data engineering, with a strong focus on Lambda or Kappa architecture.
In-depth knowledge of GCP services, particularly Dataproc, Dataflow, and Composer (Airflow).
Hands-on experience with non-relational databases such as Cassandra, Bigtable, Firestore, and DynamoDB.
Proficiency in working with file systems like Avro, Protobuf, and Parquet.
Expertise in batch processing using Apache Spark in a production environment.
Proficiency in programming languages such as Java, Python, or Scala.
Strong problem-solving skills and the ability to work independently and as part of a team.
Excellent communication skills and the ability to articulate complex technical concepts to non-technical stakeholders.
Preferred Qualifications:
Experience with real-time streaming technologies and event-driven architectures.
Familiarity with other GCP services and tools.
Knowledge of best practices in data security and governance.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Airflow Architecture Avro Bigtable Cassandra Dataflow Data pipelines Dataproc DynamoDB Engineering GCP Google Cloud Java Lambda Parquet Pipelines Python RDBMS Scala Security Spark Streaming
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.