Sr. Platform and DataLake Engineer
United States - Remote
TetraScience
The Tetra Scientific Data and AI Cloud is the only vendor-neutral, open, cloud-native platform purpose-built for science. Get next-generation lab data automation, scientific data management, and foundational building blocks of Scientific AI....Who We Are
TetraScience is the Scientific Data and AI Cloud company with a mission to radically improve and extend human life. TetraScience combines the world's only open, purpose-built, and collaborative scientific data and AI cloud with deep scientific expertise across the value chain to accelerate and improve scientific outcomes. TetraScience is catalyzing the Scientific AI revolution by designing and industrializing AI-native scientific data sets, which it brings to life in a growing suite of next generation lab data management products, scientific use cases, and AI-based outcomes. For more information, please visit tetrascience.com.
Our core values are designed to guide our behaviors, actions, and decisions such that we operate as one. We are looking to add individuals to our team that demonstrate the following values:
- Transparency and Context- We execute on our ambitious mission by starting with radical data transparency and business context. We openly and proactively share all vital data and make it actionable, so our employees and stakeholders can solve any problem presented to them.
- Trust and Collaboration- We are committed to always communicating openly and honestly at every level of the organization, functionally, cross-functionally, internally, and externally. Empowering our employees will drive positive change across our entire ecosystem.
- Fearlessness and Resilience- We must be fearless and resilient to fulfill our potential. We proactively run toward challenges of all types, we unblinkingly acknowledge and confront the brutal facts - which all innovative growth companies invariably face – and we embrace uncertainty and take calculated risks.
- Alignment with Customers- We know that our customers' success is our success. We are honored and humbled by their commitment to us, and we are completely committed to ensuring they achieve their mission to unlock the world’s most important scientific innovations.
- Commitment to Craft- We take our craft seriously and seek to be best-in-class in all we do, regardless of our functional role, seniority, or tenure. We are members of one team that combines intellectual horsepower and curiosity, humility, and empathy to ensure we are always learning and evolving.
- Equality of Opportunity- We cannot imagine our journey without a workforce which reflects humanity’s diversity. We seek out the best of the best who bring with them unique and invaluable perspectives and talents and embody our common values - regardless of gender, ethnicity, race, or age.
The Role
As a Senior Platform and Data Lake Engineer at TetraScience, you will play a critical role in building and maintaining our data infrastructure. You will work closely with cross-functional teams to ensure the seamless ingestion, processing, and storage of significant volumes of scientific data. The role requires extensive experience building and supporting data pipeline infrastructure and hands-on experience in the Databricks ecosystem.
What You Will Do
- Design, develop, and optimize data lake solutions to support our scientific data pipelines and analytics capabilities.
- Design, develop, and optimize data pipelines and workflows within the Databricks platform.
- Design and architect services to meet customer data processing needs.
- Implement data quality and governance frameworks to ensure data integrity and compliance.
Requirements
- 8+ years of experience in the software development industry, preferably in data engineering, data warehousing r data analytics companies and teams.
- 3+ year of experience with the DataBricks ecosystem.
- Expert level of Python, Java, and Typescript.
- Expert level of understanding and hands-on experience with Lake House architecture.
- Expert level of experience with Spark/Glue and Delta tables/iseberg.
- Experienced in designing and implementing complex, scalable data pipelines/ETL services.
- Extensive in cloud-based data storage and processing technologies, particularly AWS services such as S3, Step Functions, Lambda, and Airflow.
- Ability to articulate ideas clearly, present findings persuasively, and build rapport with clients and team members.
Nice to Have
- Knowledge of basic DevOps and MLOps principles
- Working knowledge of Snowflake
- Experience in working with Data Scientists and ML Developers
- Experience in management and lead developer roles from technology services companies
- Hands-on experience with data warehousing solutions and ETL tools.
Benefits
- 100% employer-paid benefits for all eligible employees and immediate family members
- Unlimited paid time off (PTO)
- 401K
- Flexible working arrangements - Remote work
- Company paid Life Insurance, LTD/STD
- A culture of continuous improvement where you can grow your career and get coaching
We are not currently providing visa sponsorship for this position.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Airflow Architecture AWS Data Analytics Databricks Data management Data pipelines Data quality Data Warehousing DevOps Engineering ETL Java Lambda Machine Learning MLOps Pipelines Python R Snowflake Spark Step Functions TypeScript
Perks/benefits: Career development Flex hours Flex vacation Transparency Unlimited paid time off
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.