Big Data & Cloud Data Engineer
Paris, France
⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️
Blackfluo.ai
Big Data & Cloud Data Engineer
Position Overview
We are seeking a Big Data & Cloud Data Engineer to design, implement, and manage large-scale data processing systems using big data technologies (Hadoop, Spark, Kafka) and cloud-based data ecosystems (Azure, GCP, AWS), enabling advanced analytics and real-time data processing capabilities across our enterprise.
Key Responsibilities
Big Data Platform Development
Design and implement Hadoop ecosystems including HDFS, YARN, and distributed computing frameworks
Develop real-time and batch processing applications using Apache Spark (Scala, Python, Java)
Configure Apache Kafka for event streaming, data ingestion, and real-time data pipelines
Implement data processing workflows using Apache Airflow, Oozie, and workflow orchestration tools
Build NoSQL database solutions using HBase, Cassandra, and MongoDB for high-volume data storage
Cloud Data Architecture
Design multi-cloud data architectures using Azure Data Factory, AWS Glue, and Google Cloud Dataflow
Implement data lakes and lakehouses using Azure Data Lake, AWS S3, and Google Cloud Storage
Configure cloud-native data warehouses including Snowflake, BigQuery, and Azure Synapse Analytics
Build serverless data processing solutions using AWS Lambda, Azure Functions, and Google Cloud Functions
Implement containerized data applications using Docker, Kubernetes, and cloud container services
Data Pipeline Engineering
Develop ETL/ELT pipelines for structured and unstructured data processing
Create real-time streaming analytics using Kafka Streams, Apache Storm, and cloud streaming services
Implement data quality frameworks, monitoring, and alerting for production data pipelines
Build automated data ingestion from various sources including APIs, databases, and file systems
Design data partitioning, compression, and optimization strategies for performance
Platform Administration & Optimization
Manage cluster provisioning, scaling, and resource optimization across big data platforms
Monitor system performance, troubleshoot issues, and implement capacity planning strategies
Configure security frameworks including Kerberos, Ranger, and cloud IAM services
Implement backup, disaster recovery, and high availability solutions
Optimize query performance and implement data governance policies
Required Qualifications
Technical Skills
5+ years experience with big data technologies (Hadoop, Spark, Kafka, Hive, HBase)
Strong programming skills in Python, Scala, Java, and SQL for data processing
Expert knowledge of at least one major cloud platform (Azure, AWS, GCP) and data services
Experience with containerization (Docker, Kubernetes) and infrastructure as code (Terraform, CloudFormation)
Proficiency in stream processing frameworks and real-time analytics architectures
Knowledge of data modeling, schema design, and database optimization techniques
Data Engineering Skills
Experience with data pipeline orchestration and workflow management tools
Strong understanding of distributed systems, parallel processing, and scalability patterns
Knowledge of data formats (Parquet, Avro, ORC) and serialization frameworks
Experience with version control, CI/CD pipelines, and DevOps practices for data platforms
Preferred Qualifications
Bachelor's degree in Computer Science, Data Engineering, or related field
Cloud certifications (Azure Data Engineer, AWS Data Analytics, Google Cloud Data Engineer)
Experience with machine learning platforms and MLOps frameworks
Background in data governance, data cataloging, and metadata management
Knowledge of emerging technologies (Delta Lake, Apache Iceberg, dbt)
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Airflow APIs Architecture Avro AWS AWS Glue Azure Big Data BigQuery Cassandra CI/CD CloudFormation Computer Science Data Analytics Dataflow Data governance Data pipelines Data quality dbt DevOps Distributed Systems Docker ELT Engineering ETL GCP Google Cloud Hadoop HBase HDFS Java Kafka Kubernetes Lambda Machine Learning MLOps MongoDB NoSQL Oozie Parquet Pipelines Python Scala Security Snowflake Spark SQL Streaming Terraform Unstructured data
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.