Sr. Data Engineer AWS Snowflake
India - Remote
Fusemachines
Unleash your AI Transformation with AI Products and AI Solutions.About Fusemachines
Fusemachines is a 10+ year old AI company, dedicated to delivering state-of-the-art AI products and solutions to a diverse range of industries. Founded by Sameer Maskey, Ph.D., an Adjunct Associate Professor at Columbia University, our company is on a steadfast mission to democratize AI and harness the power of global AI talent from underserved communities. With a robust presence in four countries and a dedicated team of over 400 full-time employees, we are committed to fostering AI transformation journeys for businesses worldwide. At Fusemachines, we not only bridge the gap between AI advancement and its global impact but also strive to deliver the most advanced technology solutions to the world.
About the role:
This is a remote, full time consulting position (contract) responsible for designing, building, and maintaining the infrastructure required for data integration, storage, processing, and analytics (BI, visualization and Advanced Analytics) to optimize digital channels and technology innovations with the end goal of creating competitive advantages for food services industry around the globe. We’re looking for a solid lead engineer who brings fresh ideas from past experiences and is eager to tackle new challenges.
We’re in search of a candidate who is knowledgeable about and loves working with modern data integration frameworks, big data and cloud technologies. Candidates must also be proficient with data programming languages (Python and SQL), AWS cloud and Snowflake Data Platform. The data engineer will build a variety of data pipelines and models to support advanced AI/ML analytics projects, with the intent of elevating the customer experience and driving revenue and profit growth globally.
Qualification & Experience:
- Must have a full-time Bachelor's degree in Computer Science or similar from an accredited institution.
- At least 3 years of experience as a data engineer with strong expertise in Python, Snowflake, PySpark, and AWS.
- Proven experience delivering large-scale projects and products for Data and Analytics, as a data engineer.
Skill Set Requirement:
- Vast background in all things data-related.
- 3+ years of real-world data engineering development experience in Snowflake and AWS (certifications preferred).
- Highly skilled in one or more programming languages, must have Python, and proficient in writing efficient and optimized code for data integration, storage, processing, manipulation and automation.
- Strong experience in working with ELT and ETL tools and being able to develop custom integration solutions as needed, from different sources such as APIs, databases, flat files, and event streaming. Including experience with modern ETL tools such as Informatica, Matillion, or DBT; Informatica CDI is a plus.
- Strong experience with scalable and distributed Data Technologies such as Spark/PySpark, DBT and Kafka, to be able to handle large volumes of data.
- Strong programming skills in SQL, with proficiency in writing efficient and optimized code for data integration, storage, processing, and manipulation.
- Strong experience in designing and implementing Data Warehousing solutions in AWS with Snowflake.
- Good understanding of Data Modelling and Database Design Principles. Being able to design and implement efficient database schemas that meet the requirements of the data architecture to support data solutions.
- Proven experience as a Snowflake Developer, with a strong understanding of Snowflake architecture and concepts.
- Proficient in Snowflake services such as Snowpipe, stages, stored procedures, views, materialized views, tasks and streams.
- Robust understanding of data partitioning and other optimization techniques in Snowflake.
- Knowledge of data security measures in Snowflake, including role-based access control (RBAC) and data encryption.
- Experience with Kafka, Pulsar, or other streaming technologies.
- Experience orchestrating complex task flows across a variety of technologies, Apache Airflow preferred.
- Expert in Cloud Computing in AWS, including deep knowledge of a variety of AWS services like Lambda, Kinesis, S3, Lake Formation, EC2, ECS/ECR, IAM, CloudWatch, EKS, API Gateway, etc
- Good understanding of Data Quality and Governance, including implementation of data quality checks and monitoring processes to ensure that data is accurate, complete, and consistent.
- Good Problem-Solving skills: being able to troubleshoot data processing pipelines and identify performance bottlenecks and other issues.
Responsibilities:
- Follow established design and constructed data architectures. Developing and maintaining data pipelines (streaming and batch), ensuring data flows smoothly from source (point-of-sale, back of house, operational platforms and more of a Global Data Hub) to destination. Handle ETL/ELT processes, including data extraction, loading, transformation and loading data from various sources into Snowflake to enable best-in-class technology solutions.
- Play a key role in the Data Operations team - developing data solutions responsible for driving Growth.
- Contribute to standardizing and developing a framework to extend these pipelines globally, across markets and business areas.
- Develop on a data platform by building applications using a mix of open-source frameworks (PySpark, Kubernetes, Airflow, etc.) and best-in-breed SaaS tools (Informatica Cloud, Snowflake, Domo, etc.).
- Implement and manage production support processes around data lifecycle, data quality, coding utilities, storage, reporting and other data integration points.
- Ensure the reliability, scalability, and efficiency of data systems are maintained at all times.
- Assist in the configuration and management of Snowflake data warehousing and data lake solutions, working under the guidance of senior team members.
- Work with cross-functional teams, including Product, Engineering, Data Science, and Analytics teams to understand and fulfill data requirements.
- Contribute to data quality assurance through validation checks and support data governance initiatives, including cataloging and lineage tracking.
- Takes ownership of storage layer, SQL database management tasks, including schema design, indexing, and performance tuning.
- Continuously evaluate and integrate new technologies to enhance data engineering capabilities and actively participate in our Agile team meetings and improvement activities.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Agile Airflow APIs Architecture AWS Big Data Computer Science Consulting CX Data governance DataOps Data pipelines Data quality Data Warehousing dbt EC2 ECS ELT Engineering ETL Informatica Kafka Kinesis Kubernetes Lake Formation Lambda Machine Learning Matillion Open Source Pipelines Pulsar PySpark Python Security Snowflake Spark SQL Streaming
Perks/benefits: Team events
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.