Lead Data Engineer
United States (Remote)
Applications have closed
strongDM
We make sure that the right people get access to the resources they need, exactly when they need them — no more, no less.We design products and solutions that reflect this commitment, transforming the way organizations manage privileged access across their critical infrastructure. By leading with Zero Trust Privileged Access Management (PAM), we help our customers achieve secure, dynamic, and fine-grained control over access to their most sensitive resources. This focus on security has earned us an industry-leading 98% customer retention rate.
Once a customer, forever a fan. That's our goal.
When you work at StrongDM, you join a team committed to solving today’s security challenges with technology that works and customers who trust us to protect their most critical assets.
If you ask anyone at StrongDM, you’ll find that our values truly guide everything we do—from how we innovate to how we treat each other. These values are the foundation of our culture and define who we are as a company. It may sound cliché, but we’re onto something great—and G2 agrees.
We embrace the missionWe pursue masteryWe win together
These are the principles we embody as an organization. They influence how we work as individuals and teams, and what we look for in candidates who join us. We’re glad you’re here! If this sounds like an environment where you’d thrive, read on.
We are seeking a highly skilled Principal Data Engineer with extensive experience in building cloud data lakes and architecting large-scale data platforms. You will be instrumental in designing and implementing data architectures that support diverse use cases, AI/ML to business intelligence (BI).
The ideal candidate will have deep expertise in tabular formats like Apache Iceberg, Apache Parquet, and other open standards. As a Lead Data Engineer, you will work closely with data scientists, AI teams, and business stakeholders to ensure that our data infrastructure is robust, scalable, and optimized for a variety of computational workloads. This role requires an innovative mindset and the ability to lead data engineering projects, making key architectural decisions that shape our data ecosystem.
What you'll do:
- Design and Architect Cloud Data Lakes: Lead the design and development of scalable data lake architectures on cloud platforms (e.g., AWS, Azure, GCP), optimized for both structured and unstructured data.
- Tabular Data Formats: Implement and manage tabular formats like Apache Iceberg, Parquet, and other open standards to efficiently store and process large datasets.
- Data Platform Development: Architect and build large-scale, highly available data platforms that support real-time analytics, reporting, and AI workloads.
- Compute Engines: Leverage various compute engines (e.g., Apache Spark, Dremio, Presto, Trino) to support complex business intelligence and AI use cases, optimizing performance and cost-efficiency.
- Collaboration with AI Teams: Work closely with AI and machine learning teams to design data pipelines that enable AI model training, deployment, and real-time inference.
- Data Governance: Establish best practices for data governance, ensuring data quality, security, and compliance with industry regulations.
- Lead and Mentor: Provide technical leadership to data engineering teams and mentor junior engineers, fostering a culture of continuous learning and innovation.
Requirements:
- Big Data Technologies: Strong knowledge of big data processing frameworks and data streaming technologies.
- AI/ML Data Integration: Experience collaborating with AI/ML teams, building data pipelines that feed AI models, and ensuring data readiness for machine learning workflows.
- Experience in Cloud Data Lakes: Proven experience in architecting and building data lakes on cloud platforms (AWS, Azure, GCP).
- Open Standards Expertise: In-depth knowledge of Apache Iceberg, Apache Parquet, and other open standards for efficient data storage and query optimization.
- Compute Engines: Expertise in using compute engines such as Apache Spark, Dremio, Presto, or similar, with hands-on experience in optimizing them for business intelligence and AI workloads.
- Leadership: Proven track record of leading large-scale data engineering projects and mentoring teams.
- Programming Languages: Proficiency in languages such as Python, Java, or Scala, and SQL for querying and managing large datasets.
- AI/ML Workflows: Previous experience working directly with AI or machine learning teams preferred
- Distributed Systems: A deep understanding of distributed systems and the challenges of scaling data infrastructure in large, dynamic environments preferred
- Data Warehouse Experience: Familiarity with modern data warehousing solutions such as Snowflake or Redshift preferred
Compensation:
- $190,000-$230,000 DOE + equity salary packages
- Company-sponsored benefits, including:
- Medical, dental, and vision insurance (free to employees and dependents)
- 401K, HSA, FSA, short/long-term disability coverage, life insurance
- 6 weeks of combined accrued vacation + sick time
- Volunteer days + standard holidays
- 24 weeks paid parental leave for everyone + 1 month transition time back + childcare stipend for first year
- Generous monthly and annual stipend for internet + home office
Tags: Architecture AWS Azure Big Data Business Intelligence Data governance Data pipelines Data quality Data warehouse Data Warehousing Distributed Systems Engineering GCP Java Machine Learning Model training Parquet Pipelines Python Redshift Scala Security Snowflake Spark SQL Streaming Unstructured data
Perks/benefits: Career development Equity / stock options Health care Home office stipend Insurance Medical leave Parental leave
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.