Senior/Principal Data Engineer

Boulder, Colorado, United States

Full Time Senior-level / Expert Clearance required USD 141K - 202K

SciTec

The world brings problems; SciTec builds solutions. Our team is committed to delivering cutting-edge advancements for defense, security, and civil affairs.

View all jobs at SciTec

Apply now Apply later

SciTec has been awarded multiple government contracts and is growing our creative Team! SciTec, Inc. is a dynamic small business with the mission to deliver advanced sensor data processing technologies and scientific instrumentation capabilities in support of National Security and Defense. We support customers throughout the Department of Defense and U.S. Government in building innovative new tools to deliver unique world-class data exploitation capabilities.

Important Notice: SciTec exclusively works on U.S. government contracts that require U.S. citizenship for all employees. SciTec cannot sponsor or assume sponsorship of employee work visas of any type. Further, U.S. citizenship is a requirement to obtain and keep a security clearance. Applicants that do not meet these requirements will not be considered.

We are seeking an experienced Data Engineer to join our Mission Data Processing program. In this role, you will design, build, and maintain scalable ETL pipelines for processing terabyte-scale streaming data and architect databases optimized for machine learning on on-premises hardware using open-source software. The ideal candidate will have expertise in data design patterns such as the Medallion Architecture and data lakehouse technologies to ensure efficient and reliable data processing. You should be skilled at handling high-throughput, low-latency data ingestion, managing data bursts, and implementing features like time-based partitioning, versioning, auditing, and rollback for historical data replay and event reproducibility. Additionally, you will bring DevOps expertise for pipeline automation, Infrastructure as Code (IaC) skills with tools like Terraform and Ansible, and a strong understanding of DevSecOps practices for maintaining secure and compliant data workflows. 

Responsibilities

  • Design and optimize ETL pipelines capable of handling high-throughput, low-latency data ingestion, especially during large data bursts 
  • Implement robust asynchronous processing systems using ZeroMQ to handle large, serialized Protobuf messages 
  • Create systems that efficiently process sudden, large volumes of data while maintaining performance 
  • Design strategies for managing backpressure to prevent system overload during high data volumes 
  • Develop fault-tolerant systems to safeguard data integrity and maintain reliability 
  • Set up monitoring and alerting mechanisms for proactive response to sudden data load changes 
  • Build and sustain high-performance databases on on-premises infrastructure, leveraging MinIO or similar object storage solutions for seamless integration with ML workflows 
  • Apply and manage data design patterns such as the Medallion Architecture to organize data into Bronze, Silver, and Gold layers 
  • Deploy Delta Lake solutions to combine the flexibility of data lakes with data warehouse performance 
  • Implement containerization and orchestration solutions using Docker and Kubernetes, and build CI/CD pipelines for automated ETL workflows 
  • Implement infrastructure provisioning and deployment automation using Terraform and/or Ansible 
  • Uphold data governance and security protocols to ensure data integrity and compliance with DoD standards, including vulnerability scans and secure configurations 
  • Lead the evaluation and adoption of open-source technologies that enhance data engineering capabilities 
  • Work with subcontractors and DoD organizations across sites, accommodating hardware limitations and ensuring seamless integration 
  • Maintain comprehensive documentation and train teams on best practices and tools in data engineering 
  • Lead and provide guidance to developers and engineers on architecture, design, and testing decisions
  • Provides thought-leadership and subject matter expertise for data engineering and data pipeline orchestration across the company
  • Regularly communicate with customers, present status, and engage in program-level meetings and processes
  • Other duties as assigned 

Requirements

  • Minimum 8 years of experience building and maintaining data pipelines/ETL solutions at scale  
  • Proficiency in Python, C++, SQL, and RDBMS (PostgreSQL or similar) 
  •  Experience with object storage (e.g., MinIO), Protocol Buffers, and ZeroMQ 
  • Familiarity with Data Version Control (DVC), Delta Lake, and the Medallion Architecture 
  • Skilled in Docker, Kubernetes, CI/CD pipelines, and infrastructure automation (Terraform/Ansible) 
  • Experience with high-throughput, low-latency systems, fault tolerance, and backpressure handling 
  •  Knowledge of data governance, versioning, auditing, rollback, and DevSecOps practices. 
  • Active DoD Secret Clearance 
  • Detail Oriented 
  • Good verbal and written communication skills 

Preferred Qualifications

  • Knowledge of Java, Rust, Scala, and NoSQL databases (e.g., Apache Cassandra). 
  • Familiar with Apache Iceberg, Yugabyte, Apache Hudi, Ceph, OpenStack Swift, Redis, and high-performance alternatives (DragonflyDB, KeyDB, Apache Ignite) 
  • Experienced with data processing tools (e.g., Apache Airflow, Prefect, Dagster, Apache NiFi, Apache Spark, Flink, Beam, Dask) and data quality tools (e.g., Great Expectations, Soda Core) 
  • Familiar with performance optimization and observability tools (e.g., Prometheus, Grafana, Loki) 
  • Experience with data management, compliance, and security platforms (e.g., AWS Secrets Manager) 

Education: 

  • Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field. 
  • Relevant certifications are a plus. 

Benefits

SciTec offers a highly competitive salary and benefits package, including:

  • Employee Stock Ownership Plan (ESOP)
  • 3% Fully Vested Company 401K Contribution (no employee contribution required)
  • 100% company paid HSA Medical insurance, with a choice of 2 buy-up options
  • 80% company paid Dental insurance
  • 100% company paid Vision insurance
  • 100% company paid Life insurance
  • 100% company paid Long-term Disability insurance
  • Short-term Disability insurance
  • Annual Profit-Sharing Plan
  • Discretionary Performance Bonus
  • Paid Parental Leave
  • Generous Paid Time Off, including Holiday, Vacation, and Sick Pay
  • Flexible Work Hours

The pay range for this position is $141,000 -$202,000/ year. SciTec considers several factors when extending an offer of employment, including but not limited to the role and associated responsibilities, a candidate's work experience, education/training, and key skills. This is not a guarantee of compensation.

SciTec is committed to hiring and retaining a diverse workforce and is proud to be an Equal Opportunity/Affirmative Action employer.

Apply now Apply later
Job stats:  0  0  0
Category: Engineering Jobs

Tags: Airflow Ansible Architecture AWS Cassandra CI/CD Computer Science Dagster Data governance Data management Data pipelines Data quality Data warehouse DevOps Docker Engineering ETL Flink Grafana Java Kubernetes Machine Learning NiFi NoSQL Open Source OpenStack Pipelines PostgreSQL Python RDBMS Rust Scala Security Spark SQL Streaming Swift Terraform Testing

Perks/benefits: 401(k) matching Career development Competitive pay Equity / stock options Flex hours Flex vacation Health care Insurance Medical leave Parental leave Salary bonus

Region: North America
Country: United States

More jobs like this