De I
Tamil Nadu, Chennai, India
Bungee Tech
Simplify your pricing and category operations with a retail price optimization platform fueled by competitive intelligence. Book a demo.About Clear Demand- Clear Demand is a leader in AI-powered price and promotion optimization for retailers. Our platform turns pricing into a strategic advantage by enabling smarter, data-informed decisions across the entire pricing lifecycle. With built-in competitive intelligence, pricing rules, and advanced demand modeling, we help retailers increase profitability, foster growth, and boost customer loyalty — all while ensuring pricing compliance and brand consistency. Our innovative platform empowers retailers to automate complex pricing processes, stay ahead of the competition, and unlock new growth opportunities.
Summary:Building on the foundation of a Software Development Engineer I (SDE-I), the Data Engineer I role offers increased technical responsibility and leadership opportunities. In this role, you'll help evolve and enhance our high-volume data platform, which processes terabyte-scale datasets and billions of data points, supporting key pricing and analytics initiatives.
Key Responsibilities:
Design, develop, and optimize scalable data pipelines and infrastructure to support large-scale data collection and processing.Build distributed data processing solutions that efficiently manage terabyte-scale datasets across multi-region cloud environments (AWS, GCP, DigitalOcean).
Develop and maintain real-time data streaming and batch processing workflows using technologies like Spark, Kafka, and related big data tools.
Optimize data storage strategies using systems like Amazon S3, HDFS, and columnar formats such as Parquet or Avro for efficient storage and querying.
Develop high-quality ETL pipelines that are robust, fault-tolerant, and scalable, ensuring accurate data transformation and delivery.
Collaborate with analysts, researchers, and engineering teams to define and uphold data quality standards and enforce data validation and security best practices.
Mentor junior team members, contributing to a collaborative, knowledge-sharing environment.
Participate in architectural discussions, contributing ideas to improve system performance, scalability, and cost efficiency.
Ensure observability and performance monitoring using tools like Datadog, New Relic, Grafana, or Prometheus to proactively detect issues and maintain system health.
Implement indexing and partitioning strategies in distributed databases like DynamoDB, Cassandra, or HBase for optimized performance.
Stay up to date with new tools, frameworks, and best practices in cloud-based data engineering and distributed computing.
Qualification& Experience: 3+ years of experience building scalable data pipelines and workflows using tools like AWS Glue, Step Functions, or custom schedulers.
Strong skills in AWS (Athena, Glue, DynamoDB), Apache Spark, PySpark, SQL, and NoSQL databases.
Experience designing distributed data systems on AWS, GCP, or DigitalOcean for large-scale processing.
Proficiency in web crawling using Node.js, Puppeteer, Playwright, and Chromium.
Familiar with Grafana, Prometheus, Elasticsearch, and optimizing distributed systems.
Hands-on with Terraform, CI/CD (Jenkins), and Kafka for event-driven architectures.
Experience with data lakes and storage formats like Parquet, Avro, or ORC.
Skilled in optimizing queries and using Spark, Flink, or Hadoop for efficient processing.
Knowledge of Docker, Kubernetes, and deploying distributed systems.
Strong understanding of fault-tolerant, resilient, and disaster-ready data platforms.
Solid ETL, stream processing, and distributed system development skills.
Strong communicator with a problem-solving mindset and team-oriented approach.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Architecture Athena Avro AWS AWS Glue Big Data Cassandra CI/CD Data pipelines Data quality Distributed Systems Docker DynamoDB Elasticsearch Engineering ETL Flink GCP Grafana Hadoop HBase HDFS Jenkins Kafka Kubernetes Node.js NoSQL Parquet Pipelines Playwright PySpark Security Spark SQL Step Functions Streaming Terraform
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.