Senior Data Engineer
Hyderabad, India
DATAECONOMY
Enabling Businesses to Monetize Data at Data Speeds with cutting edge Technology Services and Solutions. Big Data Management, Cloud enablement, Data Science, etc..Location: Hyderabad/Pune.
Experience: 5-10 Years
Employment Type: Full-Time
Job Description:
We are seeking a Senior Data Engineer with 5-10 years of experience to join our dynamic team. The ideal candidate will be responsible for designing, building, and optimizing big data pipelines using Data bricks and PySpark on AWS. You will work closely with data scientists, analysts, and other engineers to transform raw data into actionable insights, supporting critical business decisions.
Key Responsibilities:
1. Data Engineering & ETL Pipelines:
• Design, develop, and optimize scalable ETL pipelines using Data bricks and PySpark.
• Process structured and unstructured data from various sources such as Amazon S3, Delta Lake, and relational databases.
2. Big Data Processing:
• Utilize PySpark to process and analyze large datasets efficiently.
• Implement distributed computing solutions to solve complex data problems.
3. Cloud Integration:
• Work extensively with AWS services like S3, Glue, Redshift, Lambda, and EMR to build and manage data pipelines.
• Integrate Data bricks workflows with the broader AWS ecosystem.
4. Delta Lake & Storage Optimization:
• Design and implement Delta Lake solutions to ensure data reliability, scalability, and performance.
• Optimize data storage for faster querying and analytics.
5. Job Scheduling & Automation:
• Schedule and monitor data pipelines using Data bricks jobs or AWS-based orchestration tools.
6. Collaboration & Best Practices:
• Collaborate with data scientists and analysts to understand business needs and deliver data solutions.
• Enforce coding best practices and performance optimization techniques for PySpark and Data bricks workflows.
7. Monitoring & Troubleshooting:
• Monitor job performance, troubleshoot failures, and optimize cluster usage to ensure cost-effectiveness.
Requirements
• Technical Expertise:
• 5 to 10 years of experience with Data bricks and Apache Spark, preferably with PySpark.
• Strong experience with AWS services like S3, Glue, Redshift, Lambda, and EMR.
• Proficiency in programming with Python and SQL.
• Expertise in Delta Lake and data lake management.
• Knowledge of job scheduling and orchestration tools (e.g., Data bricks Jobs, Airflow).
• Data Processing:
• Hands-on experience with distributed data processing, streaming, and batch data pipelines.
• Proficient in working with large datasets and implementing performance-optimized solutions.
• Soft Skills:
• Excellent problem-solving and analytical skills.
• Strong communication and collaboration abilities.
• Ability to work independently and in cross-functional teams.
Preferred Skills:
• Experience with machine learning workflows and integrating with tools like SageMaker or MLflow.
• Familiarity with Kubernetes and containerized Spark clusters.
• Knowledge of data governance frameworks and compliance standards like GDPR or HIPAA.
Education:
• Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field.
Benefits
What We Offer:• Opportunity to work with cutting-edge technologies in a collaborative environment.
• Continuous learning and professional development opportunities.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Airflow AWS Big Data Computer Science Databricks Data governance Data pipelines Engineering ETL Kubernetes Lambda Machine Learning MLFlow Pipelines PySpark Python RDBMS Redshift SageMaker Spark SQL Streaming Unstructured data
Perks/benefits: Career development
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.