Staff Data Infrastructure Engineer
Menlo Park, CA
Full Time Senior-level / Expert USD 150K - 275K
Character.AI
Chat with millions of AI Characters anytime, anywhere. Super-intelligent chat bots that hear you, understand you, and remember you. Free to use with no ads.Overview
We are seeking a highly skilled Data Infrastructure Engineer with deep knowledge of distributed systems and extensive experience in designing and managing large-scale (5+TB/day), fault-tolerant data architectures. The ideal candidate will have expertise in cloud and big data technologies, as well as a strong understanding of compliance and privacy regulations.
Key Responsibilities
Design and Management: Architect and manage large-scale, fault-tolerant data architectures using technologies such as Hive, Spark, and Trino.
Cloud & Big Data Expertise: Utilize cloud platforms (GCP, including BigQuery, GCS, and Pub/Sub) and open-source data lake technologies (Iceberg, Parquet/ORC) to build scalable data solutions.
Compliance & Privacy: Implement data governance frameworks that comply with GDPR and CCPA, including developing data retention policies, access controls, and privacy-by-design principles.
Site Reliability Engineering: Apply SRE principles to ensure continuous uptime, including monitoring, alerting, incident response, and conducting postmortems for rapid issue resolution.
Performance & Cost Optimization: Configure partitioning, clustering, and compression strategies; tune queries and cluster resources to ensure low-latency queries and cost efficiency.
Design & Manage Streaming Pipelines: Architect and operate real-time data flows using technologies such as Kafka, Pub/Sub, Flink, or Spark Streaming to handle high-volume event streams with low latency.
Qualifications
Proven experience in distributed systems and data architecture.
Expertise in Java (Spark, Trino, Iceberg)
Strong familiarity with cloud technologies and big data tools.
Knowledge of data governance and compliance frameworks.
Experience in applying SRE principles and practices.
Expertise in performance tuning and cost optimization strategies.
Proficient in designing and managing real-time data streaming pipelines.
Location: SF Bay Area Preferred, NYC OK
About Character.AI
Character.AI empowers people to connect, learn and tell stories through interactive entertainment. Over 20 million people visit Character.AI every month, using our technology to supercharge their creativity and imagination. Our platform lets users engage with tens of millions of characters, enjoy unlimited conversations, and embark on infinite adventures.
In just two years, we achieved unicorn status and were honored as Google Play's AI App of the Year—a testament to our innovative technology and visionary approach.
Join us and be a part of establishing this new entertainment paradigm while shaping the future of Consumer AI!
At Character, we value diversity and welcome applicants from all backgrounds. As an equal opportunity employer, we firmly uphold a non-discrimination policy based on race, religion, national origin, gender, sexual orientation, age, veteran status, or disability. Your unique perspectives are vital to our success.
Tags: Architecture Big Data BigQuery Clustering Data governance Distributed Systems Engineering Flink GCP Java Kafka Open Source Parquet Pipelines Privacy Spark Streaming
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.