Staff Data Engineer
New York, NY (Hybrid)
Full Time Senior-level / Expert USD 185K - 225K
Nayya
Interacting with benefits is confusing. Transform open enrollment. Personalize decisions. Welcome to the benefit experience your employees deserve.About Nayya
At Nayya, our dreams are big. At least as big as our ambition. We’re determined to connect people’s most important information, so they can thrive across their health and wealth. We harness the existing market structures, ecosystems, and economic interests to unapologetically pursue our work and bring power to the populace.
About the Role
We are seeking a highly skilled and motivated Staff Data Engineer to join our growing team at Nayya. In this role, you will lead the design and implementation of scalable data systems and pipelines that power our data extraction and integration services, while also developing a centralized data strategy. You will work on building batch, event processing, and stream processing infrastructure, enhancing our data enrichment services, developing a robust, de-identified analytics platform for our Data Science, BI, and Analytics teams to consume, and enabling our entire organization with data by developing easy access patterns. We are looking for a data expert who thrives in an environment that values impatience, excellence, resilience, and courage—a leader ready to make an immediate impact on our data infrastructure in a fast-paced, high-growth environment.
As a Staff Data Engineer, you will play a key role in shaping our data systems' architecture, reliability, and performance while fostering innovation and collaboration across teams. This position provides an exciting opportunity to drive technical strategy and lead efforts to solidify and scale our data infrastructure.
Objectives and Responsibilities
Technical Leadership & Data Infrastructure
- Centralized Data Strategy: Develop a single source of truth for organizational data, driving data validation, governance, and improved access for analytical and operational use.
- Build, Improve, and Maintain Data Systems: Lead the development of scalable data pipelines that handle high-volume batch and streaming data.
- Data API and Eventing Development: Enhance and maintain APIs and event driven architecture to provide efficient and reliable access to internal and external data consumers.
Anonymization and Tokenization Development
- Build utilities and workflows to de-identify data, link disparate sources, and build a holistic view of entities across data sources.
- Data Enrichment & Integration: Implement data enrichment solutions at scale that interface with third-party data sources to enhance product capabilities.
- Analytics & Reporting Platform: Improve our reporting and analytics platform while treating security and compliance as a top priority.
Collaboration & Mentorship
- Cross-Functional Collaboration: Work closely with product, engineering, business, and infrastructure teams to design solutions that meet evolving business and technical needs. Advocate for data-driven decision making.
- Mentor and Develop: Provide guidance and mentorship to engineers, fostering a culture of continuous learning and growth.
- Lead by Example: Identify and evaluate our current processes, documentation, workflows and governance and make recommendations and plans for improvements. Lead with documentation.
Continuous Improvement
- Optimize Performance: Focus on tuning, performance testing, and optimization of the data platform.
- Innovate with Agility: Embrace a growth mindset, iterating on data infrastructure and processes to ensure scalability and reliability.
- Ensure Security and Scalability: Identify gaps and risk in current infrastructure to solidify the data platform.
Skills and Qualifications
- 7+ years of experience in data engineering, data infrastructure, or related roles.
- Strong experience with Python and PySpark.
- Strong experience with RDBMS.
- Proficiency with workflow orchestration tools (Airflow, Dagster, etc.).
- Experience implementing data pipelines using Apache Spark, AWS Glue, or EMR.
- Hands-on experience building data intensive applications using common API frameworks (FastAPI, NestJS, etc.).
- Expertise in SQL optimization, query performance tuning, and data warehousing.
- Experience with infrastructure as code tools such as Terraform.
- Experience with AWS suite of data engineering managed services and OSS tools.
- Familiar with Domain Driven Design.
- Experience with monitoring and observability frameworks and tools.
- Familiarity with data quality measures, tools, and frameworks.
- Ability to identify tradeoffs for warehousing vs data lake infrastructure and applying solutions to the appropriate use case.
- Ability to communicate highly technical topics to non-technical stakeholders.
- Familiar with common pitfalls in high volume, partitioned data ingestion pipelines such as orphaned records and table locks.
Preferred Qualifications
- Experience with Apache Hudi or similar data lake platforms.
- Experience with provisioning and managing Redshift.
- Experience with federated query engines.
- Experience with data catalogues.
- Experience with claims data.
- Experience with MLOps engineering and best practices.
- Experience with data governance over PHI and other sensitive information.
- Experience in fast-paced startup environments or high-growth companies.
The salary range for New York based candidates for this role is $185,000 - $225,000. We use a location factor to adjust this range for candidates that are located outside of geographic region of our New York office. Placement within the salary band is determined based on experience.
#LI-DD1
#LI-HYBRID
Nayya is proud to be an Equal Employment Opportunity employer. We do not discriminate based upon race, religion, color, national origin, gender (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics
Tags: Airflow APIs Architecture AWS AWS Glue Dagster Data governance Data pipelines Data quality Data strategy Data Warehousing Engineering FastAPI MLOps Pipelines PySpark Python RDBMS Redshift Security Spark SQL Streaming Terraform Testing
Perks/benefits: Career development Startup environment
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.