Data Engineering Intern

Remote, US

CALSTART

CALSTART accelerates growth of the clean transportation technology industry for a better and more prosperous future.

View all jobs at CALSTART

Apply now Apply later

Please be aware of recruiting scams!
All legitimate communication from our recruitment team will come from an official calstart.org email address via email, we will not text you about a role you have not applied to or shown interest in. We will not perform any interviews via text or Zoom chat.

CALSTART does not ask for any fees or personal information such as social security numbers or bank details during the recruitment process.

About Us:
CALSTART is a mission-driven industry organization focused on transportation decarbonization and clean air for all.
For over 30 years, it’s been CALSTART’s mission to develop, assess, and implement large-scale, zero-emission transportation solutions to mitigate climate change and support economic growth. CALSTART works with businesses, organizations, governments, and communities to create real-life impact toward clean air and equitable access to clean transportation for all. CALSTART provides scientific, technical and policy support for regulatory development and clean technology and infrastructure acceleration.
About the Role:  
Proposed Internship Project: Building a Data Lake and Advanced Data Pipelines for Clean Transportation Insights 
Project Overview:
CALSTART aims to enhance its data infrastructure and analytics capabilities by building a robust data lake that consolidates diverse data sources to support clean transportation initiatives. This project will focus on creating a data lake environment, developing automated data pipelines, and designing powerful visualizations to gain insights into clean vehicle adoption, infrastructure planning, and sustainability efforts. The intern will contribute to building scalable data solutions that support CALSTART's mission while gaining hands-on experience in cloud-based data engineering and data science. 
This is a part-time (25 hours per week) 6 month internship.

What You'll Do:

  • Data Lake Architecture: Collaborate with the data engineering team to design and build a scalable data lake architecture using cloud platforms (AWS) and technologies like Amazon S3, RDS or EC2. 
  • Data Pipeline Development: Assist in building end-to-end ETL (Extract, Transform, Load) pipelines that pull data from various sources, process it, and store it in the data lake in an organized and efficient manner. 
  • Data Transformation and Quality Assurance: Implement data cleansing, transformation, and validation processes to ensure data accuracy, completeness, and consistency before storing it in the data lake. 
  • Data Visualization: Develop interactive dashboards and visualizations using tools like Power BI, Tableau, or open-source alternatives to present insights related to clean transportation, such as vehicle performance, infrastructure coverage, and funding distribution. 
  • Documentation and Knowledge Sharing: Ensure proper documentation of the data lake architecture, pipeline processes, and visualization tools for knowledge transfer and future improvements. 

What You'll Bring To The Table:

  • Proficiency in Python for data processing and analysis
  • Experience with data science workflows and tools
  • Knowledge of ETL processes and pipeline development
  • Familiarity with AWS services for cloud-based data infrastructure
  • Strong communication skills, both written and verbal
  • Collaborative team player with a proactive mindset

Desired Qualifications:

  • Bachelor/Master degrees in Math, Data Science, Statistics, Enginnering, Computer Science or related fields 
  • Experience with Data science/ analytics 
  • Proficiency in SQL and experience with relational databases such as MySQL, PostgreSQL, or Microsoft SQL Server. 
  • Some experience with the ETL pipeline will be an add. 
  • AWS experience
We understand that not everyone will match the above qualifications 100%. If your background isn't perfectly aligned but you feel you would be a great addition to the team, we'd love to hear from you.
We're a tight-knit team of world-class innovators, business minds, and change agents who believe passionately in our mission and put our team ahead of self. We are committed to the continued development and growth of our employees and invest in your success!
We care about your personal well being as much as your professional success and offer generous benefits to full time employees including: 100% company paid comprehensive health benefits for Medical, Dental, Vision, Short Term Disability, Long Term Disability and Life Insurance, Retirement plan with generous company contributions, FSA for Health and Dependent Care, 3 weeks of vacation time in the first year of employment, 11 paid company holidays, paid sick time, paid family leave, and more!
Our inclusive environment focuses on making decisions based on merit without regard to race, color, hair texture, gender, religion, age, nationality, social or ethnic origin, sexual orientation, gender identity, gender expression, LGBTQIA+ status, marital status, pregnancy, disability, genetics, veteran status, or any other characteristic protected by law.
Apply now Apply later
Job stats:  3  2  0
Category: Engineering Jobs

Tags: Architecture AWS Computer Science Data pipelines Data visualization EC2 Engineering ETL Mathematics MySQL Open Source Pipelines PostgreSQL Power BI Python RDBMS Security SQL Statistics Tableau

Perks/benefits: Career development Flex vacation Health care Insurance Medical leave

Regions: Remote/Anywhere North America
Country: United States

More jobs like this