Senior Data Engineer (Remote, US)

United States

Sayari

Get instant access to public records, financial intelligence and structured business information on over 455 million companies worldwide.

View all jobs at Sayari

Apply now Apply later

ABOUT SAYARISayari is a venture-backed and founder-led global corporate data provider and commercial intelligence platform, serving financial institutions, legal and advisory service providers, multinationals, journalists, and governments. Thousands of analysts and investigators in over 30 countries rely on our products to safely conduct cross-border trade, research front-page news stories, confidently enter new markets, and prevent financial crimes such as corruption and money laundering.
Our company culture is defined by a dedication to our mission of using open data to prevent illicit commercial and financial activity, a passion for finding novel approaches to complex problems, and an understanding that diverse perspectives create optimal outcomes. We embrace cross-team collaboration, encourage training and learning opportunities, and reward initiative and innovation. If you like working with supportive, high-performing, and curious teams, Sayari is the place for you.
POSITION DESCRIPTIONSayari provides instant access to structured business information from hundreds of millions of corporate, legal, and trade records for a variety of use cases. As a member ofSayari's data team you will work with our Product and Software Engineering to build the graph that underlies Sayari’s products. 

Job Responsibilities

  • Build and maintain ETL pipelines to process and export record data to Sayari Graph application
  • Develop and improve entity resolution processes
  • Implement logic to calculate and export risk information
  • Work with product team and other development teams to collect and refine requirements
  • Run and maintain regular data releases 

Required Skills & Experience

  • Expertise with Python and a JVM programming language (e.g., Scala)
  • Expertise with SQL (e.g., Postgres) and NoSQL (e.g., Cassandra, Elasticsearch, Memgraph, etc.) databases
  • 7+ years of experience designing, maintaining, and orchestrating ETL pipelines (e.g., Apache Spark, Apache Airflow) in cloud based environments (e.g., GCP, AWS, or Azure).

Desired Skills & Experience

  • Experience with entity resolution, graph theory, and/or distributed computing
  • Experience with Kubernetes
  • Experience working as part of an agile development team using Scrum, Kanban, or similar

Benefits

  • A collaborative and positive culture - your team will be as smart and driven as you
  • Limitless growth and learning opportunities 
  • A strong commitment to diversity, equity, and inclusion 
  • Performance and incentive bonuses 
  • Outstanding competitive compensation and comprehensive family-friendly benefits, including full healthcare coverage plans, commuter benefits, 401K matching, generous vacation, and parental leave. 
  • Conference & Continuing Education Coverage 
  • Team building events & opportunities 
Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  1  0  0
Category: Engineering Jobs

Tags: Agile Airflow AWS Azure Cassandra Elasticsearch Engineering ETL GCP Kanban Kubernetes NoSQL Pipelines PostgreSQL Python Research Scala Scrum Spark SQL

Perks/benefits: Career development Competitive pay Equity / stock options Parental leave Startup environment Team events

Regions: Remote/Anywhere North America
Country: United States

More jobs like this