AI Software Engineer - Data Platform

New York; Palo Alto; San Francisco

Apply now Apply later

At Perplexity, we've experienced tremendous growth and adoption since publicly launching the world's first fully functional conversational answer engine in 2022. We've grown from answering 2.5 million questions per day at the start of 2024 to around 20 million daily queries in December 2024. We also offer Perplexity Enterprise Pro, which counts leading companies like Nvidia, the Cleveland Cavaliers, Bridgewater, and Zoom as customers.

To support our rapid expansion, we've raised significant funding from some of the most respected technology investors. Our investor base includes IVP, NEA, Jeff Bezos, NVIDIA, Databricks, Bessemer Venture Partners, Elad Gil, Nat Friedman, Daniel Gross, Naval Ravikant, Tobi Lutke, and many other visionary individuals. In 2024, our employee base grew nearly 300%, and we're just getting started.

Perplexity is seeking an experienced Software Engineer focusing on building the next-gen AI Data Platform to help revolutionize the way people search and interact online. In this role, you'll help build Perplexity’s end-to-end AI data stack and flywheel which powers all AI products, ML use cases and language models.

Perplexity is rapidly scaling both in number of use cases and number of users. Perplexity’s data stack powers scalable, personalized and fast answers for millions of people worldwide.

Tech Stack: Spark | AWS Data Stack (S3, RDS, DynamoDB, Docker, EKS, Kinesis) | Pytorch | Docker | Databricks | Snowflake 

Responsibilities

  • Collaborating closely with AI Product, Applied ML, Post-Training and Data Science teams to design, build, and maintain scalable data pipelines and data lakes
  • Developing, deploying, and monitoring entire data lifecycle for ingestion, transformation, streaming and storage at high scale
  • Implementing tools and abstractions on top of data infrastructure for a variety of analytics, recommendations, AI product and post-training use cases
  • Working closely with product and AI teams to develop reusable data resources and design patterns

Qualifications

  • Extensive programming and data engineering skills, with proficiency in open source & distributed data processing (AWS, Spark, Flink, Iceberg)
  • Familiarity with cloud-based data services (e.g., AWS, RDS, DynamoDB), containerized infrastructure (e.g., EKS, Docker), and data streaming (Flink, Spark streaming, CDC
  • Strong quantitative and engineering skills with experience in estimating performance at high scale
  • Experience supporting various ML/AI engineering teams to build scalable platforms to accelerate R&D for frontier models and AI products
  • Self-motivated with a strong sense of ownership of systems and designs
  • 5+ years of industry experience in distributed systems or AI infrastructure

The cash compensation range for this role is $200,000 - $280,000.


Final offer amounts are determined by multiple factors, including, experience and expertise, and may vary from the amounts listed above.
 
Equity: In addition to the base salary, equity may be part of the total compensation package.
Benefits: Comprehensive health, dental, and vision insurance for you and your dependents. Includes a 401(k) plan.
 
 

Apply now Apply later
Job stats:  4  1  0

Tags: AWS Databricks Data pipelines Distributed Systems Docker DynamoDB Engineering Flink Generative AI Kinesis Machine Learning ML infrastructure Open Source Pipelines PyTorch R R&D Snowflake Spark Streaming

Perks/benefits: Equity / stock options Health care

Region: North America
Country: United States

More jobs like this