Data Engineering Intern
United States
Sayari
Get instant access to public records, financial intelligence and structured business information on over 455 million companies worldwide.
About Sayari: Sayari is the counterparty and supply chain risk intelligence provider trusted by government agencies, multinational corporations, and financial institutions. Its intuitive network analysis platform surfaces hidden risk through integrated corporate ownership, supply chain, trade transaction and risk intelligence data from over 250 jurisdictions. Sayari is headquartered in Washington, D.C., and its solutions are used by thousands of frontline analysts in over 35 countries.
Our company culture is defined by a dedication to our mission of using open data to enhance visibility into global commercial and financial networks, a passion for finding novel approaches to complex problems, and an understanding that diverse perspectives create optimal outcomes. We embrace cross-team collaboration, encourage training and learning opportunities, and reward initiative and innovation. If you like working with supportive, high-performing, and curious teams, Sayari is the place for you.
Internship Description:Sayari is looking for an intern to join its Data Engineering team! Sayari’s flagship product, Sayari Graph, provides instant access to structured business information from billions of corporate, legal, and trade records. As a member of Sayari's data team you will work with our Product and Software Engineering teams to collect data from around the globe, maintain existing ETL pipelines, and develop new pipelines that power Sayari Graph.
Our application tier is built primarily in TypeScript, running in Kubernetes, and backed by Postgres, Cassandra, Elasticsearch, and Memgraph. Our data ingest tier runs on Spark, processing terabytes of data collected from hundreds of data sources. The platform allows users to explore a large knowledge graph sourced from hundreds of millions of structured and unstructured records from over 200 countries and 30 languages. As part of this team, you'll have the chance to contribute to our growing library of open-source work, including our WebGL-powered network visualization library Trellis.
This is a remote paid internship with work expectations being between 20-30 hours a week.
Our company culture is defined by a dedication to our mission of using open data to enhance visibility into global commercial and financial networks, a passion for finding novel approaches to complex problems, and an understanding that diverse perspectives create optimal outcomes. We embrace cross-team collaboration, encourage training and learning opportunities, and reward initiative and innovation. If you like working with supportive, high-performing, and curious teams, Sayari is the place for you.
Internship Description:Sayari is looking for an intern to join its Data Engineering team! Sayari’s flagship product, Sayari Graph, provides instant access to structured business information from billions of corporate, legal, and trade records. As a member of Sayari's data team you will work with our Product and Software Engineering teams to collect data from around the globe, maintain existing ETL pipelines, and develop new pipelines that power Sayari Graph.
Our application tier is built primarily in TypeScript, running in Kubernetes, and backed by Postgres, Cassandra, Elasticsearch, and Memgraph. Our data ingest tier runs on Spark, processing terabytes of data collected from hundreds of data sources. The platform allows users to explore a large knowledge graph sourced from hundreds of millions of structured and unstructured records from over 200 countries and 30 languages. As part of this team, you'll have the chance to contribute to our growing library of open-source work, including our WebGL-powered network visualization library Trellis.
This is a remote paid internship with work expectations being between 20-30 hours a week.
Job Responsibilities:
- Write and deploy crawling scripts to collect source data from the web
- Write and run data transformers in Scala Spark to standardize bulk data sets
- Write and run modules in Python to parse entity references and relationships from source data
- Diagnose and fix bugs reported by internal and external users
- Analyze and report on internal datasets to answer questions and inform feature workWork collaboratively on and across a team of engineers using basic agile principles
- Give and receive feedback through code reviews
Required Skills & Experience:
- Experience with Python and/or a JVM language (e.g., Scala)
- Experience working collaboratively with git
Desired Skills & Experience:
- Experience with Apache Spark and Apache Airflow
- Experience working on a cloud platform like GCP, AWS, or Azure
- Understanding of or interest in knowledge graphs
Job stats:
0
0
0
Category:
Engineering Jobs
Tags: Agile Airflow AWS Azure Cassandra Elasticsearch Engineering ETL GCP Git Kubernetes Open Source Pipelines PostgreSQL Python Scala Spark Transformers TypeScript
Region:
North America
Country:
United States
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.
Data Engineer II jobsSr. Data Engineer jobsStaff Data Scientist jobsPrincipal Data Engineer jobsBI Developer jobsStaff Machine Learning Engineer jobsSenior AI Engineer jobsData Manager jobsData Science Intern jobsPrincipal Software Engineer jobsBusiness Data Analyst jobsJunior Data Analyst jobsData Specialist jobsData Science Manager jobsResearch Scientist jobsSoftware Engineer II jobsLead Data Analyst jobsData Analyst Intern jobsSr. Data Scientist jobsDevOps Engineer jobsData Engineer III jobsJunior Data Engineer jobsAI/ML Engineer jobsBI Analyst jobsSoftware Engineer, Machine Learning jobs
Snowflake jobsEconomics jobsLinux jobsOpen Source jobsKafka jobsNoSQL jobsHadoop jobsData Warehousing jobsAirflow jobsRDBMS jobsBanking jobsPhysics jobsComputer Vision jobsScala jobsMLOps jobsJavaScript jobsKPIs jobsGoogle Cloud jobsClassification jobsData warehouse jobsPostgreSQL jobsScikit-learn jobsGitHub jobsOracle jobsStreaming jobs
Terraform jobsLooker jobsSAS jobsR&D jobsPySpark jobsScrum jobsPandas jobsCX jobsDistributed Systems jobsBigQuery jobsData Mining jobsJira jobsdbt jobsRobotics jobsIndustrial jobsMicroservices jobsReact jobsJenkins jobsRedshift jobsUnstructured data jobsMySQL jobsNumPy jobsRAG jobsData strategy jobsELT jobs